My serial output goes crazy by repeating the message below forever. The frequency at which these two messages are repeatedly printed in my console must be almost every clock cycle! (Okay, that's an arbitrary assumption, but they are printed extremely fast.) Does this thing probe my GPU so hard? And that's why I see a 100% load on all Radeontop metrics? It stays at 100% forever without dropping until I stop the process. Then, all the Radeontop metrics going back to 0%.
Of course, I don't understand what these messages mean. Is there an incompatibility in the guest PCI mapping, maybe?
The driver is getting into an infinite loop while going up the PCI bridge chain. It only does that, because some motherboard firmware doesn't set up the PCI bridges properly. The logs don't tell me why it gets stuck, though.
Quote:
Of course, as we already know, when I explicitly define: bus=pci.0
Then AmigaOS4.1 FE boots normally, but as the thread title says, it's extremely slow.
These are the logs of the successful boot when using pci.0: ...
Have you tried setting the RadeonRX power management to high? One of the last things I see in the log is that the GPU's locks are all set to the lowest that they can go. The GPU's sclk never goes above 551MHz.
It's normal for the GPU to go to minimal clocks when it's idle, but I'm wondering whether forcing it to go to maximum speed will improve the performance.
I changed the GPU Power Setting from Dynamic to High (on the AOS4.1 guest).
On the host UEFI the ASPM is disabled. On the host OS (Ubuntu 24.04), I added directives in the grub command that disable any PCIe power-saving mode.
Unfortunately, I don't see any improvement.
Despite using the pci.0 slot in QEMU, the AOS4.1 Ranger shows that the passed-through PCI devices are attached to the 0x01 bus. But I guess you already know this.
At least reading the following gives me a small idea of what I'm seeing in the logs. That's a plus...
@nikitas I think I now understand the issues. With pci.1 it might work but there's the issue with the driver getting stuck trying to find a bridge. With pci.0 the interrupts from that bus are not corrected so it will not get interrupts and only works with interrupts disabled. I don't know if that causes it to be slow but could be. I'll need to find a way to connect the interrupts of pci.0 in QEMU. On real machine INTA and INTB are connected from the AGP port, C and D are ignored. In QEMU only the pci.1 bus is connected. With pci.1 are there any logs before the repeating lines? You can redirect it to a file like -serial stdio >output.txt 2>&1 then you see the beginning that scrolls out with the repeating lines. Maybe that has more info for @Hans to see why this happens.
The driver is getting into an infinite loop while going up the PCI bridge chain. It only does that, because some motherboard firmware doesn't set up the PCI bridges properly.
Is this because you get a PCIe card on a PCI bus without a bridge? Or what is the proper setup of the bridge? BBoot does not do anything with the bridge and pegasos2.rom might not be able to do it as the bridge/host device 0 on the PCI bus is just a dummy device with most functions not implemented so if the driver looks for some setting it might not be stored. So what does the driver expect to avoid this loop and why this does not happen on real PegasosII? Maybe in this case we really don't have a bridge as the PCIe card is just connected as PCI and there is no PCI to PCIe bridge in the guest. The host has a CPU to PCIe bridge so that can't be passed through either. Do we need to emulate some dummy bridge device (but bridges did not work on real machine so I'm not sure that would work) or can the driver be changed to avoid this problem?
Also why this does not happen with pci.0? Is that because AmigaOS thinks that's a PCIe bus or because there are no other devices on that bus?
@balaton But radeonhd/radeonrx do not work on real pegasos for now: This is unpossible directly (due having only agp and pure pci), but over bridge it also currenly dont work: RTAS way (which peg2's os4 kernel use to read pci registers, etc) not works, so Hans tried to deal with it by direct reading of regs, test of which show that this way works, and once Hans will find time to update kernel with new code, i will be able to test it, and only then we can see if we have or not have same issue you talk about on real Pegasos.
It starts getting very funny. I attached the Radeon R7 240:
- It doesn't work at all with bboot v.07. It works with pegasos.rom. - The RadeonRX 550 refused to be attached to pci.1 and works only on pci.0. But in AOS4, Ranger reports that it is attached on bus 0x01. - Now, the Radeon 7 240 refuses to work on pci.0 slot and works only on pci.1. But in AOS4, Ranger reports that it is attached on bus 0x00(!)
Despite being inferior to RadeonRX 550, Radeon R7 240 performs much better(!) but still very slow.
It starts getting very funny. I attached the Radeon R7 240:
- It doesn't work at all with bboot v.07. It works with pegasos.rom.
That suggests that unlike the RX driver the HD driver does not init the card from AtomBIOS or this card does not have suitable BIOS so it needs its ROM to be run by the BIOS emulator in firmware. BBoot does not have a BIOS emulator so it won't run the card BIOS. AmigaOS has a BIOS emulator resource but only the Classic kernel invokes it so won't work even if added to pegasos2. So the only way is to use pegasos2.rom to get the card BIOS executed. However the pegasos2.rom and pegasos2 AmigaOS kernel don't properly init interrupts on pegasos2 so you also need to use BBoot from the firmware to fix that up. That is you need to use both pegasos2.rom and BBoot. To do that copy bboot, bboot.fth and Kickstart.zip to your boot volume where amigaboot.of is and from the pegasos2.rom ok prompt do 'boot hd:0 bboot.fth' (assuming hd:0 is your boot volume, otherwise change this in bboot.fth too). Then this should fix the interrupt settings after the firmware ran the card's BIOS.
Quote:
- The RadeonRX 550 refused to be attached to pci.1 and works only on pci.0. But in AOS4, Ranger reports that it is attached on bus 0x01.
Don't care about those numbers. QEMU numbers them as the chip has it but on pegasos2 these are used in the opposite order (because on real chip pci.0 is 66MHz and used for the AGP port and pci.1 is 33MHz and used for PCI) and AmigaOS numbers them that way but this does not matter it's still the same bus just numbered differently.
Quote:
- Now, the Radeon 7 240 refuses to work on pci.0 slot and works only on pci.1. But in AOS4, Ranger reports that it is attached on bus 0x00(!)
I don't know why unless you tell what does refuse to work mean. Any errors? Maybe it needs the missing interrupt. The firmware should init a GPU in the AGP port too so running the BIOS should not be a problem.
Quote:
Despite being inferior to RadeonRX 550, Radeon R7 240 performs much better(!) but still very slow.
Any numbers on that so we can compare to the RX benchmark results?
That is you need to use both pegasos2.rom and BBoot. To do that copy bboot, bboot.fth and Kickstart.zip to your boot volume where amigaboot.of is and from the pegasos2.rom ok prompt do 'boot hd:0 bboot.fth'
Yes, you had told me about this before, so that's the way I boot the system using pegasos.rom and then blindly typing hd:0 bboot.fth.
Quote:
Don't care about those numbers. QEMU numbers them as the chip has it but on pegasos2 these are used in the opposite order (because on real chip pci.0 is 66MHz and used for the AGP port and pci.1 is 33MHz and used for PCI) and AmigaOS numbers them that way but this does not matter it's still the same bus just numbered differently.
Understood.
Quote:
I don't know why unless you tell what does refuse to work mean. Any errors? Maybe it needs the missing interrupt. The firmware should init a GPU in the AGP port too so running the BIOS should not be a problem.
It does not refuse to work actually. It freezes everything. Even the host system, and I have to hard reset the host. When I first tried with BBoot the host froze and in addition after the hard reset he "kickstart.zip" was "fried". It got corrupted. 0 kilobytes.
Quote:
Any numbers on that so we can compare to the RX benchmark results?
Yes, this is the first thing I tried, but the overall score is misleading because during the Gfx2DBench test, the drawings were interrupted very frequently. When it is drawing, it is faster than RX550. But again, it gets interrupted by something—maybe some other process. So, it took 3 hours with an overall score of 60. But, for example, (as @Hans will notice, I'm sure) FillRect is a lot faster. This means it is not just my system or QEMU, but it seems RadeonHD.chip performs better than RadeonRX.chip, doesn't it? https://hdrlab.org.nz/benchmark/gfxbench2d/OS/AmigaOS/Result/2812
Quote:
Does the Invalid write at addr 0xFE000080 / 0x80 show up with bus=pci.0 as well or only with pci.1?
Yes, this message appears regardless of the PCI bus used. I see it very often, and I tend to believe that it is unrelated.
Yes, this is the first thing I tried, but the overall score is misleading because during the Gfx2DBench test, the drawings were interrupted very frequently. When it is drawing, it is faster than RX550. But again, it gets interrupted by something—maybe some other process. So, it took 3 hours with an overall score of 60. But, for example, (as @Hans will notice, I'm sure) FillRect is a lot faster. This means it is not just my system or QEMU, but it seems RadeonHD.chip performs better than RadeonRX.chip, doesn't it? https://hdrlab.org.nz/benchmark/gfxbench2d/OS/AmigaOS/Result/2812
Your FillRect and other hardware acceleration results are better, but the memory copy speed has collapsed. They're below 1 MiB/s! Copying to/from VRAM does vary between graphics card chipset series, even with DMA. I noticed this with different transfer rates when using the same motherboard with different graphics cards.
The "interruptions" in the Random test are actually RAM <=> VRAM copy operations that are part of the test. Your memory copy speeds are so slow that it looks like the benchmark tool stops.
Something is definitely going wrong on your machine. The native performance of both graphics cards is multiple orders of magnitude higher. Heck, the same card on an X5000 is also orders of magnitude faster, and that's without DMA assisted RAM <=> VRAM copies:
Is this because you get a PCIe card on a PCI bus without a bridge?
I don't know yet. There was a bug in older versions of the driver that could cause an infinite loop in some situations. I'm not sure if the version that nikitas is using has that fixed or not.
And I can't test it on Debian PPC, because despite recognizing the Radeon R7 GPU when running lspci -nnk, (I have installed the firmware-linux-nonfree package that contains the radeon driver.), during the boot/reboot, it throws an exception (sometimes continues the boot process and I get the terminal but startx doesn't work):
qemu-system-ppc: -device vfio-pci,host=3b:00.0,bus=pci.1,x-vga=on: vfio 0000:3b:00.0: failed getting region info for VGA region index 8: Invalid argument
device does not support requested feature x-vga
It seems an error you have when no external monitor is attached. But I have it
@all Can you test this QEMU patch This should fix interrupts on pci.0 while it should make no difference for pci.1 so only could iprove things when using pci.0 and should not break things when using pci.1 (or no bus=pci.0 option as default for pegasos2 is pci.1). You should be able to apply to QEMU git master with git am command.
"A workaround is available in qemu, by adding the parameter "x-igd-gms=1" to the according IGD device line."
So it could be?
x-vga=on,x-idg-gms=1
I don't know what that is but IGD is Intel integrated graphics so this would do nothing for a Radeon device. With a laptop it might be difficult to configure pass through as the Radeon device should be removed from host and configured to load no host drivers only vfio so I don't know if that was done correctly. Getting some more logs or trying with a Linux guest that can give more diagnostics could help to debug it.
Quote:
You also use:
-display sdl,gl=on
I don't think you need this at all.
It should not hurt either as this just sets the grapchics backend QEMU uses. But since you have no window to display (guest should use the passed through card) it also does not matter.
Quote:
For more debugging output you can add:
-d guest_errors,unimp
For getting logs from the guest, I think you can add:
-append "os4_commandline serial debuglevel=3"
You don't need os4_commandline in -append. That's the variable name when you set it from pegasos2.rom but for -append it only needs the values as the kernel command line is the only variable -append can set. So it's just -append 'serial debuglevel=7' You can experiment with different debug levels but too high won't boot so maybe up to 12-13 still works but less may be enough. Running with -d unimp,guest_errors is useful if it crashes otherwise it might not print an error.
I don't know what your Linux kernel Oops means but at least it has some stack trace to some code that could show what operations it tried to do so could give some idea about the problem but have to check the kernel code to find out. This is why testing with Linux might help because with AmigaOS we don't know what it does and we can't get much info from the errors so those might only help if @Hans can find out what's wrong based on those. With Linux guest we have more insight what it does.
@nikitas In any case si_dpm_init and radeon_pm_init in the Linux backtrace as well as the line before the Oops talking about fan control suggests it's some issue with power management of the card. Hans also suspeted the card may be running at lower clocks that could also be because of missing power management so maybe that's where we should look. Is there a way to disable power management in Linux and set the card to run at some fixed value? Also may meed some options to Linux guest to make it skip power management for this card but I don't know what are those options or if they exist.
Can you test this QEMU patch This should fix interrupts on pci.0 while it should make no difference for pci.1 so only could iprove things when using pci.0 and should not break things when using pci.1 (or no bus=pci.0 option as default for pegasos2 is pci.1). You should be able to apply to QEMU git master with git am command.
Once more, thank you for your work and your insistence on improving QEMU PPC. I will try it and I will inform you.
Quote:
In any case si_dpm_init and radeon_pm_init in the Linux backtrace as well as the line before the Oops talking about fan control suggests it's some issue with power management of the card.
What in the world could be happening with my machine... Every power management is disabled in UEFI. And I followed instructions to disable every PCIe power management on Ubuntu 24.04. I'll investigate further though.
In the worst case, I will keep the cards and I'll throw the PC into the sea. I might get one with a Ryzen CPU.
I would like to advise me if you know what mobos + CPUs combinations serve best QEMU/KVM, QEMU PPC, and virtualization in general.