Login
Username:

Password:

Remember me



Lost Password?

Register now!

Sections

Who's Online
51 user(s) are online (41 user(s) are browsing Forums)

Members: 3
Guests: 48

Detordiggei, Hypex, saimo, more...

Support us!

Headlines

 
  Register To Post  

« 1 2 3 (4) 5 6 7 8 »
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


See User information
@balaton
Quote:
You are basing your analysis on unproven assumptions.
Wrong, I was one of the main AmigaOS4 developers and know how it works.
Although I wasn't involved since about 15 year in it any more some things are impossible to implement in a completely different way.
Unless you remove compatibility to ancient AmigaOS 1.x-3.x/m68k software, which I'd have preferred, but nearly all other OS4 developers preferred compatibility to AmigaOS 1.x-3.x software over better implementations only usable in new AmigaOS 4.x/PPC software.
I very much doubt anything has changed in that regard, and there are next to no competent developers left still working with Hyperion.

Quote:
Without knowledge on how AmigaOS accesses the card I'm not sure yet the problem is really because of fine grained access to VRAM so this is still something to verify.
Maybe you don't know, but I do

Quote:
I'm also not sure the PPC440 DMA engine on sam460ex is used for this.
I am. If you don't trust me simply ask @m3x from ACube, he implemented several Sam440/460 AmigaOS 4.x parts, incl. the DMA improved CopyMem(Quick)()/bcopy()/memcpy() (used for graphics.library (Read|Write)PixelArray() and several other AmigaOS 4.x parts as well), etc.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Home away from home
Home away from home


See User information
@joerg

Pritty sure Hans added GART support, but I bet its only for PCIe not PCI.

Random google seartch.

http://www.amiga-news.de/en/news/AN-2024-04-00014-EN.html


Edited by LiveForIt on 2024/6/17 22:25:38
(NutsAboutAmiga)

Basilisk II for AmigaOS4
AmigaInputAnywhere
Excalibur
and other tools and apps.
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@joerg
Do you also know how the Radeon RX driver works? I thought only @Hans knows that as he keeps the sources so nobody else knows. If the issue is that other parts of AmigaOS acesses VRAM directly which is slow on these cards then this would also be a problem on real hardware. How is that solved? Maybe it does something that works on real machine to avoid this problem which does not work well with QEMU but since I don't know what it does I also can't fix it. Maybe @Hans could make a test case that is doing VRAM access the way the OS or driver does in a small C program that can be compiled and tested so we can check what host code it is compiled to by QEMU and do some tests with that. Otherwise it would be difficult to fix. It's not reasonable for QEMU to emulate a card on top of vfio to add DMA to VRAM access that the OS or driver should do in the first place.

If this is the problem then maybe looking for other cards to pass through which are faster for this could be a solution. Some cards support unified memory or are integrated graphics cards where accessing the VRAM should not be a problem so maybe those could work better? It should be possible to pass through even an integrated GPU but then you either need another card for the host or live with the host graphics shutting down when you run QEMU (or only have output from host via serial). I've found this guide vfio-single-amdgpu-passthrough that shows something like that. It also mentions at the end that AMD cards have some reset issues that may need an external Linux kernel module to fix which may be useful for some trying to pass these cards through.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton @Hans

A minor update. When you disable compositing effects and downgrading resolution to 16 bits, 64k colors, then "Enable Interrupts" in "Screen Mode" works (as slow as without it).

While the workbench windows are struggling to draw, when I use Amistore (written in Hollywood, as I read) it's very fast. I login, navigate through the menu, no visible drawing or delays. Here, using RadeonRX has the "same" speed as the sm501 emulated card. (did not measured anything to say precisely, mentioning only as a user experience).

I don't know if this has to do with anything, just reporting it.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


See User information
@LiveForIt
GART only helps if the GPU does all rendering and the copies between VRAM and DRAM. That's the case on all other OSes, except for AROS and MorphOS maybe.
On AmigaOS for example Warp3D may benefit from GART, but not the 2D graphics.library where the CPU does a lot of the rendering and all copies between VRAM and DRAM.

@balaton
Quote:
If the issue is that other parts of AmigaOS acesses VRAM directly which is slow on these cards then this would also be a problem on real hardware. How is that solved?
On the old systems (G2 classic Amiga, G3+G4 AmigaOne/Pegasos2) not at all, it's not possible. VRAM copies are tiny, slow 64 bit (G2+G3) or 128 bit (G4) CPU accesses as well.
PPC CPUs seem to be faster accessing VRAM over ZorroIII/PCI/PCIe than x64 CPUs are, but not very much.
On the newer NG systems, Sam4x0, X1000, X5000 and A1222, the CPU DMA engines are used for the copies between VRAM and DRAM which is much faster because of the much larger parts copied at a time (= much less PCI(e) overhead).

Quote:
Some cards support unified memory or are integrated graphics cards where accessing the VRAM should not be a problem so maybe those could work better?
There are AMD x64 CPUs with integrated Radeon gfx, mostly used for laptops, but I don't know if there is any on which the gfx part is compatible with the AmigaOS Radeon HD or RX drivers.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
With a bit of luck, you can buy ATI cards suitable for testing for a few euros in China.
Since Vladimir Vladimirovič is practically selling off his homeland to China.
I foresee Chinese restaurant chains in Vladimir Vladimirovič country.
Even oil is sold at controlled prices to Chinese residents in their homeland.
This is simple finance.
On the other hand, how can you blame China? This is the biggest deal they've had in many years to the detriment of their neighbors led by Vladimir Vladimirovič.

hey, a little humor doesn't hurt every now and then.

I hope a suitable card for pass through will be found.

Thanks for your tests.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@white

I am currently conducting a Special Tracing Operation in order to invade QEMU PPC translation blocks and see what is running differently with and without VFIO. Sadly, I think it will take the same or even longer than the "Special Military Operation".

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas

Thank's for your job
As soon as I understand what I can buy, I will also buy an ATI card for my tests.

Thanks again.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Amigans Defender
Amigans Defender


See User information
i've just bought an old HP laptop with RX550.. i could do some tests soon

i'm really tired...
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@balaton

Using qemu tracing on specific trace-events would be useful in our case? In order too see what TBs are running with & without VFIO. If yes, what events need to be traced?

I'm trying to study some QEMU tracing backends but I'm still missing knowledge to understand what exactly have to do. Anyway, I'm eager to learn QEMU internals as much as I am capable of, of course.

Also, currently, I passed through a real HD drive for my QEMU PegasosII AOS4 VM. I don't know if this can have any positive/negative impact on performance.

Regarding the RadeonRX metrics needed by @Hans, I'm consciously avoiding it, because it is a little boring to De-Vfio the GPU, remove amdgpu blacklist and any other thing might needed.. And then VFIO the GPU again etc.
But if it is going to help, I'll have to do it.

You also insisted on using pci.1 instead of pci.0. Indeed, I told you that with VFIO, the pci.1 GPU did not work for me... Pegasos loads the Kickstart and then gets stuck forever.

Has anybody managed to pass through the RX GPU via pci.1 (PegasosII machine)?


Edited by nikitas on 2024/6/18 21:41:31
Edited by nikitas on 2024/6/19 4:12:28
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Home away from home
Home away from home


See User information
@balaton
Quote:
@joerg
Do you also know how the Radeon RX driver works? I thought only @Hans knows that as he keeps the sources so nobody else knows.

A-EON owns the driver, so the source code is on their repository. Multiple people have access to it, although it's rare for anyone other than me to touch the code.

The driver will use GART for sending command buffers to the GPU, and also for transfers to/from data buffers used by 3D drivers. Sadly, the Picasso96 API doesn't allow the driver to do the same for the graphics.library.

This is only for motherboards/CPUs that are confirmed to have working memory coherency: X1000, X50x0, A1222. The driver falls back to CPU-based copies to/from VRAM for all other motherboards.

Quote:
If the issue is that other parts of AmigaOS acesses VRAM directly which is slow on these cards then this would also be a problem on real hardware. How is that solved? Maybe it does something that works on real machine to avoid this problem which does not work well with QEMU but since I don't know what it does I also can't fix it.

The graphics.library will use the CPU's DMA engines for copies on platforms where it has those routines (e.g., Sam 4x0, X1000, X50x0, A1222). If that's not available, then it has copy routines that are optimmized on a per-CPU basis. So, it'll use altivec based copy routines on CPUs with altivec, and 64-bit float based routines elsewhere. It'll even use the A1222's SPE.

One trick that these copy routines do, is to ensure that the VRAM side is always correctly aligned. Mis-aligned accesses are expensive. and don't even work on some platforms.

Just in case anyone missed it: yes, the copy routines use 64-bit and even 128-bit (altivec) transfers to maximize the copy speed.


Quote:
Maybe @Hans could make a test case that is doing VRAM access the way the OS or driver does in a small C program that can be compiled and tested so we can check what host code it is compiled to by QEMU and do some tests with that.

I already have: it's called GfxBench2d. It includes some of the same copy routines used by the graphics.library and RadeonHD/RX. It benchmarks multiple copy algorithms, although they're NOT all uploaded to the HDRLab website. Check the text output.

Quote:
Otherwise it would be difficult to fix. It's not reasonable for QEMU to emulate a card on top of vfio to add DMA to VRAM access that the OS or driver should do in the first place.

Yeah, that would be ridiculous. My top questions are:
- What's the VFIO performance when running Linux (PowerPC) inside QEMU? Do they also have a big performance hit?
- What does the generated host code look like? If it isn't a tight loop, then the CPU/PCIe-controller won't be able to batch up transfers into larger packets
- How does VFIO even work? Can the generated host code access the hardware directly? Or is there some kind of abstraction layer in-between. If there is a layer in-between, what's its overhead? For example, do accesses run through an exception handler?

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Home away from home
Home away from home


See User information
@nikitas

I just compared your GfxBench2D results with Geennam's and something strange is going on.

Your VRAM copy results aren't too different from his, but the hardware accelerated performance certainly is. Have a look at the FillRect raw data:

Geennaam's FillRext 16x16 result: 37,549.90 operations/s
Your FillRect 16x16 result: 101.81 operations/s

Huh? What on earth is going wrong there? Why is your card's FillRect ops/s maxing out at a level roughly 368x slower than his? That is a huge drop in performance.

I don't think that the memory copy results can explain that. Your RAM=>VRAM copy speed is actually ~49% faster, and your VRAM=>RAM speed is ~43% slower.

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just popping in
Just popping in


See User information
@Hans

Hm.. Wait a minute. I have another VM with Windows 11 Guest in which I have VFIO'd exactly the same GPU card and I downloaded there your Windows version of Gfx2Bench2D. Your tool is a bit outdated compared to Amiga version but it produced the following results:

The Amiga version (on QEMU AOS4.1) has 54 tests and takes 3 hours.
The Windows version (on QEMU/KVM Windows 11) has 38 tests but takes... seconds to finish.

I don't know if this is a checkmate, but the question remains for me. The results are showing:
Better QEMU VFIO handling on x64?
Or better handling of the Windows AMD Driver?

------------------------------------------------------------
GfxBench2D 2.9
A benchmark tool 
for graphics cards.
Written by Hans de Ruiter.

Copyright (C2011by Hans de Ruiterall rights reserved
------------------------------------------------------------

System Information:
OSWindows 11 Home (build 22631)
Motherboard/DeviceStandard PC (Q35 ICH92009), manufactured byQEMU
CPU
13th Gen Intel(RCore(TMi5-13400,  @ 2.496 GHz
L1 Cache Size
65536L2 Cache Size16777216L3 Cache Size16777216
Total RAM
15.9808 GiB
External Bus 
(FSBSpeed0 Hz


Clock granularity is
1030us
Tests will take at least 1.030 seconds each
.

Initialising Direct2D test.

Board nameRadeon RX550/550 Series
Product ID
0x699f Vendor ID0x1002 SubProduct ID0xe468 SubVendor ID0x1da2
Card driver
C:\Windows\System32\DriverStore\FileRepository\u0402263.inf_amd64_1366da2d694c570c\B400781\aticfx64.dll,C:\Windows\System32\DriverStore\FileRepository\u0402263.inf_amd64_1366da2d694c570c\B400781\aticfx64.dll,C:\Windows\System32\DriverStore\FileRepository\u0402263.inf_amd64_1366da2d694c570c\B400781\aticfx64.dll,C:\Windows\System32\DriverStore\FileRepository\u0402263.inf_amd64_1366da2d694c570c\B400781\amdxc64.dll (31.0.21912.14)
VRAM4 GiB
Display mode
1920x1080@59 (32 bpp)
WritePixelArray2664.938 MiB/(took 1.142000 seconds). 
ReadPixelArray3189.887 MiB/(took 1.122000 seconds). 

FillRect:
Size            Time (s)           Ops/s        MPixel/s
(1616)           1.109     2896694.319         707.201
(3232)           1.126     2819097.691        2753.025
(6464)           1.261     2393024.584        9347.752
(128128)           1.050     1079047.619       16860.119
(256256)           1.158      376311.744       23519.484
(512512)           1.106      301297.468       75324.367
(10241024)           1.142      161941.331      161941.331

BltBitMap
:
Size            Time (s)           Ops/s        MPixel/s
(1616)           1.138     2039821.617         498.003
(3232)           1.140     2047721.053        1999.728
(6464)           1.139     1953935.909        7632.562
(128128)           1.066     1062851.782       16607.059
(256256)           1.086      267506.446       16719.153
(512512)           1.120       67892.857       16973.214
(10241024)           1.133       15455.428       15455.428

OverlappedBltBitMap
:
Size            Time (s)           Ops/s        MPixel/s
(1616)           1.125       79613.333          19.437
(3232)           1.124       78139.680          76.308
(6464)           1.100       76865.455         300.256
(128128)           1.086       73211.786        1143.934
(256256)           1.139       47367.867        2960.492
(512512)           1.133       19743.160        4935.790
(10241024)           3.742        5344.735        5344.735

Composite
:
Size            Time (s)           Ops/s        MPixel/s
(1616)           1.224     2314133.987         564.974
(3232)           1.135     2312889.868        2258.682
(6464)           1.130     2223106.195        8684.009
(128128)           1.200      944166.667       14752.604
(256256)           1.179      246405.428       15400.339
(512512)           1.176       63383.503       15845.876
(10241024)           1.137       14174.142       14174.142
NOTE
Compositing (or alpha blendingused premultiplied alpha mode.

CompositeSrcMask:
Size            Time (s)           Ops/s        MPixel/s
(1616)           1.135     1742954.185         425.526
(3232)           1.130     1724620.354        1684.200
(6464)           1.153     1638477.884        6400.304
(128128)           1.073      959925.443       14998.835
(256256)           1.118      247174.419       15448.401
(512512)           1.164       54682.990       13670.747
(10241024)           1.148       11861.498       11861.498
NOTE
The source mask's alpha channel was multiplied by the source bitmap's alpha channel.
NOTEThe source bitmap's alpha channel was premultiplied.

Random:
Time (s)           Ops/s        MPixel/s
       4.078       19617.460        6568.958



Also Geennham's guest CPU syncs at:
Motorola MPC 7447/7457 Apollo, 1.2 @ 1.53 GHz

While my guest CPU:
Motorola MPC 7447/7457 Apollo, 1.2 @ 1000 MHz

My host's Intel chipset takes the blame here?


Edited by nikitas on 2024/6/19 7:23:19
Edited by nikitas on 2024/6/19 7:25:23
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Home away from home
Home away from home


See User information
@nikitas
Quote:
@Hans

Hm.. Wait a minute. I have another VM with Windows 11 Guest in which I have VFIO'd exactly the same GPU card and I downloaded there your Windows version of Gfx2Bench2D. Your tool is a bit outdated compared to Amiga version but it produced the following results:

The Amiga version (on QEMU AOS4.1) has 54 tests and takes 3 hours.
The Windows version (on QEMU/KVM Windows 11) has 38 tests but takes... seconds to finish.

I don't know if this is a checkmate, but the question remains for me. The results are showing:
Better QEMU VFIO handling on x64?
Or better handling of the Windows AMD Driver?

The drivers and graphics APIs are definitely a lot better. I don't know if QEMU VFIO handling is better with x64 client OSes, though.

GfxBench2D can't test CPU RAM<=>VRAM transfers on Windows, because the API doesn't allow it. The WritePixelArray/ReadPixelArray results are DMA accelerated.

The driver is managing 2896694.319 ops/s with FillRect 16x16, which is over 28900x faster than what you're getting with AmigaOS (and about 77x Geennaam's results).

The Windows drivers have multiple advantages. The modern Windows graphics APIs have the concept of command queues. So, it can send draw operations in batches for maximum performance. AmigaOS' graphics.library API draws everything immediately (so, no queue). This means that every draw operation is sent to the GPU individually, which is the worst thing to do performance-wise. We're continually hitting the bottleneck of how many command-buffers/s (or batches/s) can be handled (because we're submitting batches of one).

So, I'd expect Windows & Linux drivers to outperform our drivers just based on the batching.

I've always wanted to batch the draw ops, but see no safe way of doing so with the graphics.library, because it has no Flush() or Finish() function to flush the queue (i.e., apps/games expect draw ops to happen immediately, and never signal a flush/finish).

There's still something going wrong on your system vs Geennaam's. His system has a bottleneck of about 37500 command-buffers/s, whereas yours is maxing out at just 100 command-buffers/s. Being able to draw max 100 things per second is painfully slow.

Hans

http://hdrlab.org.nz/ - Amiga OS 4 projects, programming articles and more.
https://keasigmadelta.com/ - more of my work
Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@afxgroup
Even if it's in a laptop it could be a discrete GPU not using the same memory as the CPU so it might not be different from a GPU on a graphics card but it would be interesting to see the results anyway. Setting up pass through on a laptop might be more difficult though because you would lose graphics output on the host while the guest is running and it should be scripted to get it set up and tear down vfio on guest start stop to work. You probably will need serial or ssh access to the host while experimenting setting it up as the screen may not work and will only display the guest.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
Quote:
Using qemu tracing on specific trace-events would be useful in our case? In order too see what TBs are running with & without VFIO. If yes, what events need to be traced?

It would be difficult to identify the interesting code section in the lot of code the OS runs otherwise. You can enable some logs, see -d help but I don't see how to identify the part that's relevant for this unless we can cut the code down. See reply to @Hans on this below.
Quote:
I'm trying to study some QEMU tracing backends but I'm still missing knowledge to understand what exactly have to do. Anyway, I'm eager to learn QEMU internals as much as I am capable of, of course.

The tracing backends are only determine the way the logs are saved, the 'log' backend just prints them to stdout/stderr which can be easily redirected so for simple debugging it is usually enough. Other backends can save to binary files or send to syslog. What is logged is determined by the -d options independent of the log backend.
Quote:
Also, currently, I passed through a real HD drive for my QEMU PegasosII AOS4 VM. I don't know if this can have any positive/negative impact on performance.

You can measure it to find out. There were some file system benchmarks for AmigaOS I think. My guess is that for small files (like normal OS usage) it may not be a big difference as the IDE overhead would be bigger. The host filesystem overhead is probably not too big compared to the guest side overhead.
Quote:
Regarding the RadeonRX metrics needed by @Hans, I'm consciously avoiding it, because it is a little boring to De-Vfio the GPU, remove amdgpu blacklist and any other thing might needed.. And then VFIO the GPU again etc.
But if it is going to help, I'll have to do it.

I had the idea that you could also try running a Linux guest but saw you've done that but with Windows guest.
Quote:
You also insisted on using pci.1 instead of pci.0. Indeed, I told you that with VFIO, the pci.1 GPU did not work for me... Pegasos loads the Kickstart and then gets stuck forever.

I wonder why. Are there any logs from kernel/driver using the debug kernel and setting -append 'serial debuglevel=12' or similar? (I never remember which is a good debuglevel. Setting it too high will crash and won't boot, too low will not log enough but can't remember what is the still acceptable highest level.)

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@Hans
Quote:
Sadly, the Picasso96 API doesn't allow the driver to do the same for the graphics.library. ...
The driver falls back to CPU-based copies to/from VRAM for all other motherboards.

So it looks like we should try to optimise this case as it seems to be used frequently.

Quote:
it has copy routines that are optimmized on a per-CPU basis. So, it'll use altivec based copy routines on CPUs with altivec, and 64-bit float based routines elsewhere. It'll even use the A1222's SPE.

Do you know what AltiVec ops are used? I've seen vperm in one profile which I think also can swap bytes around that may be used for endian conversion. I don't know how well it's emulated but this sounds like it could fall back to byte access in some cases. If this has any effect could be tested by using -cpu 750cxe where then FPU regs would be used (which might also be slow but as long as no FPU ops are done maybe not) but then we would need benchmark results with both -cpu 750cxe and without -cpu option that defaults to 7457 but I haven't seen those yet. There are some patches on the list that change these AltiVec instructions (although only the way they are implemented not the results) so I'll wait until they are merged then can try to find where it's implemented and what it does. (But maybe those instructions that are used by AmigaOS are already converted, I don't know.) If you want to look at it it's somewhere in qemu/target/ppc probably in a vsx.impl file. QEMU has some support for using 128 but quantities that could be translated to host code but it's possible the VSX/AltiVec instructions don't use them and could be optimised.

Quote:
I already have: it's called GfxBench2d. It includes some of the same copy routines used by the graphics.library and RadeonHD/RX. It benchmarks multiple copy algorithms, although they're NOT all uploaded to the HDRLab website. Check the text output.

OK, then can you make a small C program that just measures these copy routines that can be compiled on Linux? QEMU has two modes, one is full system emulation with qemu-system-ppc but it also has user mode with qemu-ppc that can run on Linux or BSD executables compiled for different architecture. With that we could easily check the compiled code, because we won't have to find it between all the other OS code. So if you have a test of just these copy routines that can be compiled for PPC Linux you could use -d options with qemu-ppc running on x86_64 Linux to get the guest and host asm and compare them to see how these are transformed. Finding it in the qemu-system-ppc output is probably impossible so we'd need a seprate small test case for that.

Quote:
- What's the VFIO performance when running Linux (PowerPC) inside QEMU? Do they also have a big performance hit?

I've asked to test this, haven't seen an answer yet but we got Windows results now. Getting the same tests that @Georg did or getting AmigaOS results from @Georg so we have both host and guest results from same hardware might help though so we don't have to compare numbers from different machines.
Quote:
- What does the generated host code look like? If it isn't a tight loop, then the CPU/PCIe-controller won't be able to batch up transfers into larger packets

See above, I think we would need a small Linux test executable for that then you can get generated code from qemu-ppc. Or you could try to find the implementation of the VSX/AltiVec ops and work out how they are translated but that may be harder than making a test case.
Quote:
- How does VFIO even work? Can the generated host code access the hardware directly? Or is there some kind of abstraction layer in-between. If there is a layer in-between, what's its overhead? For example, do accesses run through an exception handler?

We haven't seen guest exceptions (I've asked for info irq results which we got but showed no excessive guest exceptions), I don't know about host exceptions. I don't know how vfio works, you could ask on QEMU list or find some docs but I think what it does is set up the IOMMU on the host to map the card into the guest's address space so guest code would access the card directly. This may depend on host IOMMU and may fall back to some slower ways when that's not available or not work at all. So theoretiaclly it could also depend on host and host firmware (BIOS/UEFI).

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Quite a regular
Quite a regular


See User information
@nikitas
When testing with x86 guests it might use KVM so guest code runs on the CPU which might give different results. You can add -accel tcg to force it to translate x86 code the same way as it does PPC code which will be slower but might be closer to the PPC guest case. Although PPC and x86 ops are different so the translation is also different but at least it goes through more the same process as with PPC guest.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


See User information
@Hans
Quote:
I've always wanted to batch the draw ops, but see no safe way of doing so with the graphics.library, because it has no Flush() or Finish() function to flush the queue (i.e., apps/games expect draw ops to happen immediately, and never signal a flush/finish).
struct BoardInfo *bi;
bi->WaitBlitter(struct BoardInfo *bi);
?

This P96 *.chip or *.card function should be used by IGraphics->WaitBlit().

It's not just flush/finish but additionally waiting for completion, but since it has to be used between any GPU rendering and CPU rendering, without using it the CPU based rendering functions may work on wrong data, it may help if several GPU functions are called in a row.

Quote:
A-EON owns the driver, so the source code is on their repository. Multiple people have access to it, although it's rare for anyone other than me to touch the code.
I may have access to it, not sure, but I never checked your driver sources.
Knowing how the P96 API, incl. the OS4 extensions, works is enough to know what a gfx card driver can do and what it can't.

Go to top
Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow
Just can't stay away
Just can't stay away


See User information
@balaton
Quote:
Do you know what AltiVec ops are used?
IIRC only the Vector Load and Store (Indexed) instructions.

Additionally the Vector Data Stream Touch instructions for pre-fetching data (read) and allocating cache lines (write) (similar to DCBA on CPUs without AltiVec, or on CPUs without DCBA support DCBZ as replacement), but those may only help on cached DRAM, not on cache-inhibited VRAM.

Even without using the streaming instructions using 2 vector store instructions in a row will skip the usual "read 32 byte cache line"/"modify cache line"/"copy back cache line" used on PPC and just writes the 32 bytes.
For anything smaller, for example 4 64 bit double or 8 32 bit integer stores in a row in a memcpy() loop that's not the case and it's using the slower read/modify/copy back cache line method.
May only help on cached DRAM, not on cache-inhibited VRAM, as well.

Go to top

  Register To Post
« 1 2 3 (4) 5 6 7 8 »

 




Currently Active Users Viewing This Thread: 5 ( 0 members and 5 Anonymous Users )




Powered by XOOPS 2.0 © 2001-2023 The XOOPS Project