Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 10:23 #201

Home away from home

@Georg

Quote:

You mentioned MicroDelay() and it could be caused by it as it will be some kind of busy loop checking some powerpc timer register. If qemu emulation of it is not very precise (may depend on host or even host (kernel) configuration = there may be difference between running Linux distribution A vs distribution B) then this will slow things down as it will cause the delay to last (possibly much) longer than expected.

Could be tested with a little AOS4 program which for example calls MicroDelay(10) 100000 times in a loop. Should complete in 1 second. If it takes (much) longer -> problem.

Yes, that would be worth testing. Libauto automatically opens the timer.device, so linking with -lauto will set up ITimer. Then something like this should work (NOTE: untested):


#include <proto/timer.h>

#include <stdio.h>



int main(int argc, const char **argv) {

    unsigned usDelay = 1;

    unsigned count = 1000000;

    printf("Calling MicroDelay(%u) %u times\n", usDelay, count);

    for(unsigned i = 0; i < count; ++i) {

        ITimer->MicroDelay(usSize);

    }

    printf("Done! This should have taken %.2f seconds. How long did it actually take?\n", ((double)usDelay * count) / 1000000.0);

}

Edited by Hans on 2024/7/2 12:36:29

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 10:25 #202

Quite a regular

@Hans
Nikitas did not have a USB vfat device in the commands but may try removing the network (but I think also tried that and it didn't help). On pegasos2 all these may share the same interrupt so maybe there's still an issue with these after all the patches and level sensitive setting in BBoot? Anyway trying to reproduce with Linux could avoid all these and check if the problem is only with how AmigaOS does things or independent of that.

kas1e

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 10:31 #203

Home away from home

@Hans
I can test it on real peg2 and on qemu's peg2, but in snipped you post usSize is undeclared and usDelay = ; (no value).

Join us to improve dopus5!
AmigaOS4 on youtube

joerg

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 10:43 #204

Just can't stay away

@kas1e


#include <proto/timer.h>

#include <stdio.h>



int main(int argc, const char **argv) {

    uint32 usDelay = 1;

    uint32 count = 1000000;

    printf("Calling MicroDelay(%lu) %lu times\n", usDelay, count);

    for(uint32 i = 0; i < count; ++i) {

        ITimer->MicroDelay(usDelay);

    }

    printf("Done! This should have taken %f seconds. How long did it actually take?\n", ((double)usDelay * count) / 1000000.0);

}

But it may be better to use for example usDelay = 10 and count = 100000, or usDelay = 100 and count = 10000.

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 10:47 #205

Just popping in

@Hans

Quote:

I just got a reminder of some of geennaam's older discoveries. According to him, his Radeon R9 270x worked well provided that he didn't share part of his hard-drive with AmigaOS as a USB drive, and he also had to shut down ethernet. With either of those enabled, he got massive slowdown.

Okay, so do I have to remove the ethernet from the QEMU command only, or should I also shut down the ethernet on the host? Regarding the USB drive, I have a secondary real SSD drive on "/dev/sdb" that I use in the QEMU command. Is this OK? For the mouse/keyboard, I use bochs-display. Is this OK, too?

@balaton
Indeed, virsh makes it more complex. So, I will create a new QEMU setup for this when I have time.

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 12:07 #206

Quite a regular

@nikitas
Only remove from the QEMU command. It was said that if the guest has a USB disk as with the ufat shared folder then it ran slower with vfio for some reason. I don't know if @geennaam ever talked about a network card. It does (or should) not matter what you have on the host, the theory is that maybe having other PCI devices in the guest like USB or network card may interfere with interrupts from the graphics card. Now they are on different bus and we had several patches to fix this but who knows. This was with an RadeonHD card and used pci.1 so it's different than what you've tried but we have not better idea at the moment. So remove all -device usb-* and -device rtl8139 from QEMU command line and see if that changes anything.

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 12:25 #207

Quite a regular

@Hans
Quote:

I also noticed that everyone using VFIO, is using QEmu in KVM mode, which means that the guest OS can execute code on the host CPU directly instead of via emulation. I found nothing about VFIO usage with the TCG based emulator. Looks like we're in uncharted territory.

Most of the people who do this want to play games in a VM that run on Windows while they want to run Linux. So they want to have the most performance and use KVM and vfio. This may mean that using it with TCG is not tested that much and with PPC at all but that does not mean it should not work. Of course we're on uncharted territory, not many people run AmigaOS on QEMU and even less tried vfio GPU pass through so it's not something that was tested and known to work. Some people tried it before for MacOS but gave up because there the firmware is needed to run the FCode ROM of the Mac graphics card (or a suitable ROM for a PC card) for MacOS to even recognise the card but QEMU's OpenBIOS can't run FCode ROMs and real Mac ROMs don't run with QEMU. (I had patches to fix both of these but they aren't upstream so one can only experiment with it with applying patches from different places so only a few people even tried. Somebody once managed to get a Rage128Pro working but don't know if it was usable.)

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 12:39 #208

Home away from home

@kas1e
I've corrected the code.

@nikitas
Quote:

Okay, so do I have to remove the ethernet from the QEMU command only, or should I also shut down the ethernet on the host? Regarding the USB drive, I have a secondary real SSD drive on "/dev/sdb" that I use in the QEMU command. Is this OK? For the mouse/keyboard, I use bochs-display. Is this OK, too?

Remove it from the QEmu command line, and use your Radeon R7 240 for testing (geennaam said that it had no effect on his RX 5x0 cards).

I have no idea bout using the secondary real SSD drive, or the bochs-display. If you can boot to AmigaOS without them, then try removing both from the QEmu args.

@balaton
VFIO is obviously working with TCG. I was hoping to get some idea of what the overhead was when used with TCG instead of KVM, and maybe some tips on what to try.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

Georg

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/2 20:41 #209

Just popping in

@balatonQuote:

balaton wrote:@Georg
To help testing, could you please share your Linux kernel options and xorg.config to show how to set up vesafb and the x11perf command again so others can reproduce that test without having to find out the right config?

Could be wrong, but I don't think the x11 "vesa" driver needs any special Linux kernel options. There's another X11 driver "fbdev" which does use that Linux kernel framebuffer stuff.

In theory to use "vesa" driver it's just a matter of editing xorg.conf (in /etc/X11) (or save a modified version whereever you want) and look in the "Device" section in there and edit it to say:

Driver "vesa"
Option "ShadowFB" "0"

Many years ago that was enough. But nowadays if you try to start X11 (startx -- -xf86config myxorg.conf) it may fail and the log (var/log/Xorg.0.log) says "vesa: Ignoring device with a bound kernel driver". That seems to be because of the still existing normal gfx card (in my case "nvidia") kernel modules in memory.

So here what I do is to first log out of desktop, use CTRL ALT F1 to switch to virtual console, run "init 3" to get rid of X11 (KDE) display manager, then "lsmod | grep nvidia", then "rmmod" the modules (you need to find the right order, ie. which ones to remove first, otherwise it says "module is in use by ...") and then "startx -- -xf8config myxorg.conf". For some reason here the screen first appears somewhat broken (don't know if it's just the monitor), ~zoomed, ~like_wrong_modulo, so I also have to do some CTRL ALT F1 -> CTRL ALT F7 forth and back switching and then it displays fine.

If the thing is slow and you see flickering mouse sprite (because of disabled shadow framebuffer) in front of gfx updates (like "glxgears" window) it worked.

Google how to disable "compositing" on your desktop. There may be some shortcut key for it. To verify that it's disabled run "xcalc" or "xclock" from a terminal. Press CTRL+Z to freeze the program. Then drag it's window out of screen and back in. If this creates gfx trash or gfx disappering (like text/numbers) then it worked. (Happens because program is frozen and cannot update/refresh areas of window which became hidden and then visible again. With enabled compositor this does not happen, because the windows contents are backed up in their own pixmaps=bitmaps and the contents don't get lost when dragged out of view or behind things).

x11perf -shmput500
x11perf -shmget500

It's unlikely that it is not running in 4 byte per pixel screenmode (so that you can interpret x11perf results/sec as million_bytes/sec) but if you want to check then look if "xdpyinfo" says "32" for "bitmap unit". Tough I'm not 100 % sure that really reflects the "bytes per pixel". (don't know or remember why but AROS hosted X11 driver even creates a dummy test XImage and then picks the bytes per pixel from it).

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/3 1:20 #210

Just popping in

@Hans

Quote:

Remove it from the QEmu command line, and use your Radeon R7 240 for testing

No, I tried all the possible combinations, and it didn't go any faster. The only thing that maybe helped a little was removing bochs-diplay and using Evdev for USB devices.

Also, the command:


cpufreq-set -g performance

Made a visible difference. But just a bit.

I also tried a funny thing using a real vga-to-vga on a small old monitor I found. I got the error "Couldn't create screen mode." I think this monitor supports 640x480.

Edited by nikitas on 2024/7/3 7:05:53

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/3 2:03 #211

Home away from home

@nikitas

It's a pity that disabling the ethernet & other things made no difference.

Could you try compile and run the code I gave in this post above? Remember to link with -lauto.

Edit: It should compile with gcc -o MicroDelayTest MicroDelayTest.c -lauto

It'll let us know if there's a problem with MicroDelay() or not.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/3 4:13 #212

Just popping in

@Hans

Yes, of course. I'll try it with both GPUs, and I'll get to you with the results.

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/3 4:52 #213

Just popping in

@Hans

Running this script with:
- R7 240 attached, took about: 24 seconds.
- RadeonRX 550 attached, took about: 1.50 or 2 seconds
- RadeonRX 550 attached with Screenmode --> Enable Interrupts = Checked, took about: 1.0 or 1.5 seconds
- RadeonRX 550 attached with Screenmode --> Enable Interrupts = Checked and Ethernet attached and using bochs-display, took about: 1.0 or 1.5 seconds (same as the test above)

When I enable interrupts on Screemode, the systems seem to run slower overall, though this test appears to execute faster.

(Nobody can switch hardware on the fly and run the test faster than me

)

Edited by nikitas on 2024/7/3 5:10:10

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/3 8:46 #214

Just popping in

@balaton @Hans

Before running the QEMU process, the command sudo lspci -vv -s 0000:01:00.0 shows this:


0000:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7) (prog-if 00 [VGA controller])

    Subsystem: Sapphire Technology Limited Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X]

    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-

    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

    Latency: 0, Cache Line Size: 64 bytes

    Interrupt: pin A routed to IRQ 255

    IOMMU group: 16

    Region 0: Memory at 60e0000000 (64-bit, prefetchable) [size=256M]

    Region 2: Memory at 60f0000000 (64-bit, prefetchable) [size=2M]

    Region 4: I/O ports at 7000 [disabled] [size=256]

    Region 5: Memory at 85f00000 (32-bit, non-prefetchable) [size=256K]

    Expansion ROM at 85f40000 [disabled] [size=128K]

    Capabilities: [48] Vendor Specific Information: Len=08 <?>

    Capabilities: [50] Power Management version 3

        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)

        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

    Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00

        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited

            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-

        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-

            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+

            MaxPayload 256 bytes, MaxReadReq 512 bytes

        DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-

        LnkCap:    Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <1us

            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+

        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk+

            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-

        LnkSta:    Speed 8GT/s, Width x8

            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

        DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+

             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1

             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-

             FRS-

             AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-

        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,

             AtomicOpsCtl: ReqEn-

        LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-

        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-

             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

             Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot

        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+

             EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-

             Retimer- 2Retimers- CrosslinkRes: unsupported

    Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+

        Address: 0000000000000000  Data: 0000

    Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>

    Capabilities: [150 v2] Advanced Error Reporting

        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+

        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+

        AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-

            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-

        HeaderLog: 00000000 00000000 00000000 00000000

    Capabilities: [200 v1] Physical Resizable BAR

        BAR 0: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB

    Capabilities: [270 v1] Secondary PCI Express

        LnkCtl3: LnkEquIntrruptEn- PerformEqu-

        LaneErrStat: 0

    Capabilities: [2b0 v1] Address Translation Service (ATS)

        ATSCap:    Invalidate Queue Depth: 00

        ATSCtl:    Enable-, Smallest Translation Unit: 00

    Capabilities: [2c0 v1] Page Request Interface (PRI)

        PRICtl: Enable- Reset-

        PRISta: RF- UPRGI- Stopped+

        Page Request Capacity: 00000020, Page Request Allocation: 00000000

    Capabilities: [2d0 v1] Process Address Space ID (PASID)

        PASIDCap: Exec+ Priv+, Max PASID Width: 10

        PASIDCtl: Enable- Exec- Priv-

    Capabilities: [320 v1] Latency Tolerance Reporting

        Max snoop latency: 15728640ns

        Max no snoop latency: 15728640ns

    Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI)

        ARICap:    MFVC- ACS-, Next Function: 1

        ARICtl:    MFVC- ACS-, Function Group: 0

    Capabilities: [370 v1] L1 PM Substates

        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+

              PortCommonModeRestoreTime=0us PortTPowerOnTime=170us

        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-

               T_CommonMode=0us LTR1.2_Threshold=184320ns

        L1SubCtl2: T_PwrOn=170us

    Kernel driver in use: vfio-pci

    Kernel modules: amdgpu

After running the command:


sudo cpufreq-set -g performance &&

sudo taskset -c 4 /home/niki/qemu/build/qemu-system-ppc \

-machine pegasos2 \

-m 2G \

-kernel /home/niki/qmiga-PigasosII/bboot -initrd /home/niki/qmiga-PigasosII/Kickstart.zip \

-rtc base=localtime \

-drive if=none,id=DH0,file=/dev/sda,format=raw -device ide-hd,drive=DH0 \

-device vfio-pci,id=RadeonRX550-VGAController,host=0000:01:00.0,x-vga=on,bus=pci.0 \

-device vfio-pci,id=RadeonRX550-AudioController,host=0000:01:00.1,bus=pci.0 \

-device bochs-display \

-device rtl8139,netdev=ETH0 -netdev user,id=ETH0 \

-vga none \

-serial stdio \

-d guest_errors,unimp

The command sudo lspci -vv -s 0000:01:00.0 shows this:


0000:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] (rev c7) (prog-if 00 [VGA controller])

    Subsystem: Sapphire Technology Limited Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X]

    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-

    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-

    Latency: 0, Cache Line Size: 64 bytes

    Interrupt: pin A routed to IRQ 16

    IOMMU group: 16

    Region 0: Memory at 60e0000000 (64-bit, prefetchable) [size=256M]

    Region 2: Memory at 60f0000000 (64-bit, prefetchable) [size=2M]

    Region 4: I/O ports at 7000 [size=256]

    Region 5: Memory at 85f00000 (32-bit, non-prefetchable) [size=256K]

    Expansion ROM at 85f40000 [disabled] [size=128K]

    Capabilities: [48] Vendor Specific Information: Len=08 <?>

    Capabilities: [50] Power Management version 3

        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold+)

        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

    Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00

        DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited

            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-

        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-

            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+

            MaxPayload 256 bytes, MaxReadReq 512 bytes

        DevSta:    CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-

        LnkCap:    Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L1 <1us

            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+

        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk+

            ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-

        LnkSta:    Speed 2.5GT/s (downgraded), Width x8

            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

        DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+

             10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1

             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-

             FRS-

             AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-

        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,

             AtomicOpsCtl: ReqEn+

        LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-

        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-

             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

             Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot

        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+

             EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-

             Retimer- 2Retimers- CrosslinkRes: unsupported

    Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+

        Address: 0000000000000000  Data: 0000

    Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>

    Capabilities: [150 v2] Advanced Error Reporting

        UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

        UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

        UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

        CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+

        CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+

        AERCap:    First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-

            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-

        HeaderLog: 00000000 00000000 00000000 00000000

    Capabilities: [200 v1] Physical Resizable BAR

        BAR 0: current size: 256MB, supported: 256MB 512MB 1GB 2GB 4GB

    Capabilities: [270 v1] Secondary PCI Express

        LnkCtl3: LnkEquIntrruptEn- PerformEqu-

        LaneErrStat: 0

    Capabilities: [2b0 v1] Address Translation Service (ATS)

        ATSCap:    Invalidate Queue Depth: 00

        ATSCtl:    Enable-, Smallest Translation Unit: 00

    Capabilities: [2c0 v1] Page Request Interface (PRI)

        PRICtl: Enable- Reset-

        PRISta: RF- UPRGI- Stopped+

        Page Request Capacity: 00000020, Page Request Allocation: 00000000

    Capabilities: [2d0 v1] Process Address Space ID (PASID)

        PASIDCap: Exec+ Priv+, Max PASID Width: 10

        PASIDCtl: Enable- Exec- Priv-

    Capabilities: [320 v1] Latency Tolerance Reporting

        Max snoop latency: 15728640ns

        Max no snoop latency: 15728640ns

    Capabilities: [328 v1] Alternative Routing-ID Interpretation (ARI)

        ARICap:    MFVC- ACS-, Next Function: 1

        ARICtl:    MFVC- ACS-, Function Group: 0

    Capabilities: [370 v1] L1 PM Substates

        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+

              PortCommonModeRestoreTime=0us PortTPowerOnTime=170us

        L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-

               T_CommonMode=0us LTR1.2_Threshold=184320ns

        L1SubCtl2: T_PwrOn=170us

    Kernel driver in use: vfio-pci

    Kernel modules: amdgpu

Before:


LnkSta:    Speed 8GT/s, Width x8

After:


LnkSta:    Speed 2.5GT/s (downgraded), Width x8

GRUB Configuration:


GRUB_CMDLINE_LINUX="default_hugepagesz=2MB intel_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 rd.driver.pre=vfio-pci rd.driver.blacklist=amdgpu modprobe.blacklist=amdgpu vfio-pci.disable_idle_d3=1 isolcpus=4 nohz_full=4 rcu_nocbs=4 irqaffinity=0-3,5,7 pcie_aspm=off pcie_port_pm=off"

Is this a problem, or is it expected?

Edited by nikitas on 2024/7/3 10:13:03

Capehill

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/3 13:09 #215

Just can't stay away

@nikitas

For the science:


// gcc MicroDelay.c -Wall -O3 -lauto



#include <proto/timer.h>



#include <stdio.h>



static void DoMicroDelay(uint32 microseconds, uint32 count)

{

    struct TimeVal a, b;



    ITimer->GetSysTime(&a);



    for (uint32 i = 0; i < count; i++) {

        ITimer->MicroDelay(microseconds);

    }



    ITimer->GetSysTime(&b);



    double duration = (b.Seconds * 1000000 + b.Microseconds) -

                      (a.Seconds * 1000000 + a.Microseconds);



    printf("%lu * %lu microseconds took %f seconds\n", count, microseconds, duration / 1000000.0);

}



int main()

{

    DoMicroDelay(1, 1000000);

    DoMicroDelay(10, 100000);

    DoMicroDelay(100, 10000);

    DoMicroDelay(1000, 1000);



    return 0;

}

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/3 13:22 #216

Home away from home

@nikitas

Quote:

Is this a problem, or is it expected?

Not a problem, but also not expected.

Quote:

Running this script with:
- R7 240 attached, took about: 24 seconds.
- RadeonRX 550 attached, took about: 1.50 or 2 seconds
- RadeonRX 550 attached with Screenmode --> Enable Interrupts = Checked, took about: 1.0 or 1.5 seconds
- RadeonRX 550 attached with Screenmode --> Enable Interrupts = Checked and Ethernet attached and using bochs-display, took about: 1.0 or 1.5 seconds (same as the test above)

When I enable interrupts on Screemode, the systems seem to run slower overall, though this test appears to execute faster.

You're RX 550 results aren't too far off, but the R7 240 result is 24x slower than it should be. I didn't expect there to be a dramatic difference depending on which graphics card is plugged in. That doesn't make sense. It does confirm that MicroDelay() can indeed be a problem, although it's not necessarily the cause of the massive graphics slow-down.

Hans

Join Kea Campus' Amiga Corner and support Amiga content creation
https://keasigmadelta.com/ - see more of my work

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/3 14:44 #217

Just popping in

@Capehill
Quote:

For the science:

I'll do it and inform you.

@Hans
Quote:

Looks like we're in uncharted territory.

I'm the poor canary flying into the mineral mine tunnel to see if toxic gas exists further inside. Let's see if I die...

I even cleaned the connectors and PCIe slot with Isopropyl 90. I found the (hidden) mvme SSD, removed it, and placed it in another slot away from the CPU. (in case it was using PCIe lanes as read). What else can somebody do, I wonder.

Could be a problem that I don't do Single-GPU passthrough? I use the integrated GPU for my host and I pass the RX550 through QEMU/VFIO, plugged in a second monitor for the guest OS.

Re: Qemu + VFIO GPU RadeonRX 550 + AmigaOS4 extremely slow

Posted on: 7/3 15:19 #218

Quite a regular

@nikitas
Here's some more motivation on what we might be able to achieve:
Resized Image

While QEMU uses one thread for vcpu it is multithreaded and has another thread for other tasks and maybe also an IO thread. So confining this to a single CPU core may not be a good idea. What if you drop all the tweaks of isolating CPU cores using taskset and setting irq affinity and just run QEMU normally? The host OS should be able to schedule the threads on its own.

Using other cards on the host should not interfere as long as they are in different vfio groups. Did you check vfio groups and established that the graphics card you're passing through is in its own group (with its sound function) and you pass all devices in that group? Also re-reading @geennaam's experiment in the long thread he used multifunction=on for the graphics function to create it as multifunction device as the sound part is at the same ID and another function of the card. I don't think that matters but have no better idea now.