QEMU GPU-PCIe AmigaONE

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/20 14:37 #41

Quite a regular

@joerg
Quote:

On some real AmigaNG systems, like the Sam4x0 and X5000 (maybe the X1000 and A1222 as well, not sure), the embedded CPU DMA engine can be used for copies between VRAM and DRAM, which is faster, but not as fast as GPU based DMA transfers, than CPU copies.
AFAIK on QEmu that's not used, any maybe even can't be.

The DMA engine is emulated on sam460ex but it's not used that frequently by AmigaOS. At least with sm501 it boots without it but some activity does use it. We've seen this when first got AmigaOS working on sam460ex and needed the DMA engine to avoid some crashes. The DMA in common cases will end up in a memmove (see in qemu/hw/ppc/ppc440_uc.c) which is probably optimised by the host libc but I don't know if it's used for VRAM access and we can't easily test that as PCIe does not work on sam460ex emulation and the sam460ex firmware thinks that on PCI only Radeon cards can appear so it won't init a newer card on PCI.

On amigaone/pegasos2 there's no DMA engine so the kernel may use tricky copy methods with FPU, AltiVec or the worst is the dcbz which we measured before. This was improved a bit but may still need more optimisation as only parts of my patch was merged. The RageMem tests show that not doing any tricks is fastest so if this can be disabled that may help unless VRAM access really needs some wider load/store. This could be tested and optimised independent of GPUs but would need somebody to do it as I don't have time for everything myself.

Quote:

QEmu probably doesn't even emulate 128 bit AltiVec assesses using 128 bit host CPU accesses, but uses slower 64 or even only 32 bit ones instead.

QEMU does have the ability to compile AltiVec to host SIMD instructions but it could be that not all of them are optimal so again this could be tested and improved but I don't even know what to test as I don't know if this is really needed or used by the RadeonHD/RX drivers so all this is just shooting in the dark. I think we would need better understanding of what causes the slowness first before trying to improve QEMU to solve it. It may not even be just slow VRAM access as some results were faster so there's at least one other factor somewhere.

Edited by balaton on 2025/5/11 13:50:30

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/20 16:11 #42

Home away from home

@balaton
Quote:

Second, some results with vfio-pci seems to be fast (geennaam, smarkusg) while others slower (white, nikitas and others). What does that depend on? Motherboard, GPU, BIOS, what else?

That's basically all, the speed of the motherboard and gfx card PCIe interfaces, which may be changed by BIOS or host OS settings.
VRAM access speed itself through the PCIe interface may be different on different gfx cards, even ones using exactly the same GPU chip but the rest of the gfx card from different manufactures, as well.

If you compare completely different hardware you can get completely different and surprising results, for example
X5000 (533 MB/s write, 40 MB/s read using the CPU, 1,071 MB/s write and 995 MB/s read with RPA/WPA*)
X1000 (448 MB/s write, 26 MB/s read using the CPU, 1,407 MB/s write and 920 MB/s read with RPA/WPA*)
Sam460EX (266 MB/s write, 52 MB/s read using the CPU, 572 MB/s write and 52 MB/s read using RPA/WPA*).
Theoretically the X5000 should be much faster than the X1000 since it has faster PCIe slots and the Sam460EX the slowest, but in the results that's not the case, for CPU reads (Copy From VRAM) the Sam460EX is even the fastest of those 3 results.

*) (Read|Write)PixelArray use CPU/SoC specific features like DMA engines and therefore the results aren't comparable between different systems.
Copy To/From VRAM only uses standard CPU read/write instructions instead, like a simple memcpy() for example, and is more comparable, but not 100% either as it's using AltiVec on systems supporting it.

Quote:

Or does virtio-gpu actually map VRAM?

Would be quite useless if it doesn't

Quote:

If not then maybe 3D drivers do something that's very inefficient and create a bottleneck where SM502 does not have that.

Real SM502 is very slow, and it doesn't support anything which requires 3D features of a GPU, for example the Composite and CompositeSrcMask tests of GfxBench2D.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/20 16:31 #43

Home away from home

@balaton
Quote:

The DMA engine is emulated on sam460ex but it's not used that frequently by AmigaOS.

There is probably much more, but the 3 parts that I know it's used are:
- DMA transfers of the SoC SATA controller.
- CopyMemQuick()/memcpy() for very large copies.
- Read/WritePixelArray() (maybe it's simply using CopyMemQuick(), no idea if it includes different/special code instead).

Quote:

and we can't easily test that as PCIe does not work on sam460ex emulation and the sam460ex firmware thinks that on PCI only Radeon cards can appear so it won't init a newer card on PCI.

There was a recent U-Boot update for Sam4x0 which should add support for Radeon HD and RX.

Quote:

The DMA in common cases will end up in a memmove (see in qemu/hw/ppc/ppc440_uc.c) which is probably optimised by the host libc but I don't know if it's used for VRAM access [...]The RageMem tests show that not doing any tricks is fastest so if this can be disabled that may help unless VRAM access really needs some wider load/store.

Not optimizing it may be faster for DRAM accesses, but it's the opposite for VRAM.
The main reason PCIe VRAM access is slow is because of the PCIe protocol overhead, basically a 8 bit byte read/write is the same speed as a 64 bit FPU read/write and a 128 bit AltiVec read/write, but the latter transfer more data in the same time, and the larger a single transfer is the faster it gets.
On CPUs/SoCs with a DMA engine the size of a single transfer can be very much larger than the max. 128 or 64 bit of CPUs without, which can make CPU based accesses to VRAM over PCIe much faster.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/20 17:56 #44

Just popping in

Quote:

joerg wrote:@kas1e
Other gfx systems, for example X11, avoid most VRAM accesses using a shadow frame buffer in DRAM.

I think that's almost never so. That should be the case only if the X11 driver is not accelerated (like vesa) or driver option is added to xorg config file to disable acceleration. Also this special "modesetting" driver seems to default to use "glamor" as acceleration (implement X11 functions using GL), so no shadow buffer by default.

Having 3d accelerated gfx (even gui libs use GL nowadays) with shadow frame buffer in RAM: how would you do that (fast)?
Quote:

AmigaOS doesn't support anything like that.

Thomas Richter has done some P96 gfx drivers for AOS 3.x which do it - I think sometimes with MMU tricks -, but it would be better if there would not be this (P96) gfx system limitation, that the gfx system itself insistes on being allowed to have direct access to VRAM. There should be an option for drivers to allow them to handle all themselves and the gfx system then would interact with VRAM only through driver calls (like driver->readpixels, driver->writepixels for fallback gfx functions that the driver does not "accelerate").

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/20 18:08 #45

Just popping in

@balaton

Quote:

I think we would need better understanding of what causes the slowness first before trying to improve QEMU to solve it. It may not even be just slow VRAM access as some results were faster so there's at least one other factor somewhere.

If the emulated AOS4 has access to passed through gfx card VRAM, so does qemu. I would try to hack a little VRAM benchmark into qemu itself if you know how to find out the real address to use for this (it's not going to be the same VRAM address as seen in the emulated AOS4, is it?)

To see what theoretical max speed is. Maybe pass-trough-magic itself slows things down.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/20 19:34 #46

Home away from home

@Georg
Quote:

I think that's almost never so. That should be the case only if the X11 driver is not accelerated (like vesa) or driver option is added to xorg config file to disable acceleration.

Of course no sane OS, less than 30 years old, does anything like that.
But the AmigaOS graphics.library was created more than 40 years ago for the "Amiga" computer, later relabled to "A1000", with only 256 KB RAM ("Chip RAM", but the only available RAM on that systems, "Fast RAM" extensions may have existed for the A1000 as well, but only got common with the A500 and newer systems), and the A1000 was more or less comparable to ancient PCs with UMA (unified memory architecture), i.e. the same RAM is used by both the CPU and GPU, instead of separate DRAM and VRAM, and some bitter, copper, agnus, paula, etc. co-processors which were faster than the 68000 CPU.
Most AmigaOS(-like) RTG systems, at least all which are still in use, P96 and CGFX, just added some little support for gfx cards, but didn't reimplement the graphics.library core from scratch, like for example EGS and pOS did.

Quote:

Having 3d accelerated gfx (even gui libs use GL nowadays) with shadow frame buffer in RAM: how would you do that (fast)?

On AmigaOS you can't. Most 2D gfx isn't done by the GPU but by the CPU, incl. sub-pixel text rendering for example.
Nearly all BOOPSI classes (gadgets, images, datatypes, etc.), no matter if ReAction/ClassAct or MUI/Zune, use CPU based rendering, and not the GPU, as well.

Edited by joerg on 2025/4/20 20:18:27

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/20 21:17 #47

Quite a regular

@joerg
You're still bringing up numbers from real SM502 and real machines. I don't care about those. What I care about is why it's slow in QEMU with vfio when it's fast with SM502 on the same machine and if it's really just because of overhead of sending data through PCIe with 32 bit ops or there's additional overhead somewhere because of emulation or using things like dcbz that slow things down. Even with just a simple 32bit loop PCIe should be faster but to confirm that we would need tests from same machine from both host and guest which we only could get once so far. So I don't care if it works better on real machine or uses DMA or not on real machine just to understand what happens with QEMU and how is it possible that same setup is faster for some people than others. What are the factors that makes it usable for some while slow for others. We got a test case for copy routines before which now should run fast at least with user emulation (qemu-ppc) but system emulation (qemu-system-ppc) may still have issues. This would need further testing but I had no time for that. It's hard to follow all the results and missing details, maybe if we had a table with all the test so far showing what host motherboard, GPU, BIOS settings, Linux distro and QEMU version were used but getting all these details seems quite hopeless.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/20 21:28 #48

Quite a regular

@joerg
Quote:

There is probably much more, but the 3 parts that I know it's used are:
- DMA transfers of the SoC SATA controller.

Not emulated on QEMU so this won't happen.
Quote:

- CopyMemQuick()/memcpy() for very large copies.
- Read/WritePixelArray() (maybe it's simply using CopyMemQuick(), no idea if it includes different/special code instead).

These may be the only source of DMA use on sam460ex and as I said it does not happen often as far as I remember. One could add logs in the DMA controller emulation to see but I think it booted without needing any of it and only needed it for some apps but could be that with a Radeon driver it would be used more.

Quote:

There was a recent U-Boot update for Sam4x0 which should add support for Radeon HD and RX.

I've looked at that but it still seems to check which bus the device is connected to to decide if it's RadeonHD or older so it would think a card on PCI cannot be RadeonHD and take the wrong path. I think it should check device IDs instead but maybe this was simpler as there are so many IDs and they may not be sorted.

Quote:

Not optimizing it may be faster for DRAM accesses, but it's the opposite for VRAM.
The main reason PCIe VRAM access is slow is because of the PCIe protocol overhead, basically a 8 bit byte read/write is the same speed as a 64 bit FPU read/write and a 128 bit AltiVec read/write, but the latter transfer more data in the same time, and the larger a single transfer is the faster it gets.
On CPUs/SoCs with a DMA engine the size of a single transfer can be very much larger than the max. 128 or 64 bit of CPUs without, which can make CPU based accesses to VRAM over PCIe much faster.

This is again true for real hardware but may not be true for QEMU. A simple loop is translated to host code and can run fast, something using special registers could end up calling into emulation a lot and run slower. Without knowing exactly what happens I can't tell which path to trace to check if that happens but it's possible this could also play a role. Even with DRAM with RageMem on QEMU the Tricky test using dcbz is the worst so if the same is used for VRAM it may not help.

But all that does not explain why is it faster on some machines and slower on other machines so there might be something else too which I'd like to identify. Maybe some GPUs are better for this than others and especially newer ones seem worse than older ones so we need more tests with more GPUs and document the circumstances so we can compare them.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/20 21:36 #49

Quite a regular

@Georg
Quote:

If the emulated AOS4 has access to passed through gfx card VRAM, so does qemu. I would try to hack a little VRAM benchmark into qemu itself if you know how to find out the real address to use for this (it's not going to be the same VRAM address as seen in the emulated AOS4, is it?)

To see what theoretical max speed is. Maybe pass-trough-magic itself slows things down.

QEMU can also know the guest addresses but all the "pass-through-magic" is really just calling Linux to pass through the BAR addresses to set up the IOMMU to map the card's resources to the guest's address space so there's not much QEMU does with it. QEMU does not map the card itself so it can't really do a benchmark without breaking the guest but it would be possible to set up a Linux guest and run benchmark from there to see if this is something specific to AmigaOS or happens with all guests so could be in QEMU.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/21 6:54 #50

Home away from home

@balaton
Quote:

This is again true for real hardware but may not be true for QEMU.

It is for QEmu using real hardware, vfio pci and virtio gpu.

Quote:

Even with DRAM with RageMem on QEMU the Tricky test using dcbz is the worst so if the same is used for VRAM it may not help.

It's not done as it would be much slower on VRAM.
On AmigaOS VRAM is mapped with CPU data caches disabled and using dcbz or dcba on cache inhibited memory either causes an exception (it's emulated by the kernel exception handler, but that's extremely slow), or may be done by some PPC CPUs without exception, but even then it's slower than using 8 32 bit integer, 4 64 bit FPU/SPE or 2 128 bit AltiVec writes for a 32 byte cache line.

Quote:

But all that does not explain why is it faster on some machines and slower on other machines so there might be something else too which I'd like to identify. Maybe some GPUs are better for this than others and especially newer ones seem worse than older ones so we need more tests with more GPUs and document the circumstances so we can compare them.

It's the same on real hardware, on some gfx cards it's faster than on others, and as I wrote already even gfx cards using the same GPU chip but from different manufacturers.
Most ATI/AMD gfx cards aren't built by ATI/AMD but by Gigabyte, MSI, ASUS, etc. using the same ATI/AMD GPUs, but different components for the rest of the gfx card.
If you want to make a database of gfx cards, which ones are slow and which ones faster for CPU VRAM accesses, don't use the PCI vendor and product IDs (ATI/AMD IDs of the GPU) but the sub-vendor and sub-product IDs (ASUS, Gigabyte, MSI, ...).

Additionally on very old PCI and AGP gfx cards, for example Radeon 9250, CPU VRAM access was faster than it's on the newer Radeon HD/RX ones. Maybe not absolute speeds, but relative to the speed of the PCI, AGP or PCIe bus they are using.

Probably no other OS is still using CPU VRAM accesses. 20-25 years ago, when Radeon 7000 and 9000 series gfx cards were common, it may have been different and therefore CPU VRAM access was still optimized on such gfx cards.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/21 6:59 #51

Just popping in

@balaton

Quote:

QEMU can also know the guest addresses but all the "pass-through-magic" is really just calling Linux to pass through the BAR addresses to set up the IOMMU to map the card's resources to the guest's address space so there's not much QEMU does with it.

Yes, but who says that it's not the activation/setup/usage of the IOMMU that introduces slow down when this addresses are then accessed.

Quote:

QEMU does not map the card itself so it can't really do a benchmark without breaking the guest ...

Guest doesn't matter. What I meant is something like this.


#include 



int benchmark(void *p, int size)

{

    printf("Benchmark addr %p size %d\n", p, size);

}



int main(void)

{

   for(;;)

   {

   }

}

You put benchmark function somewhere in (qemu) sources. Run it with "gdb". Use CTRL-Z to break into the debugger and in the debugger do "call benchmark(0x12345678, 10000)".

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/21 7:32 #52

Just popping in

@joerg

Quote:

Of course no sane OS, less than 30 years old, does anything like that.

Anything like what? You said other gfx systems like X11 avoid most vram accesses by using shadow framebuffer in dram. I said that most of the time this shadow framebuffer is not used in X11, only for things like unaccelerated vesa x11 gfx driver. Or if user specifially changes x11 config to disable acceleration (which then may cause the driver to defaulting to use a shadow framebuffer, or user may specifically add another option to force it to use a shadow framebuffer).

Regarding AOS or graphics.library. It's only a design choice if you allow the RTG system to have more or less freedom in the gfx driver interface. Whether gfx drivers can handle gfx card memory themselves (if they want to) or not to. And how you deal with fallback gfx functions for things that the driver does not implement (or accelerate) itself.

In AROS the gfx system (= graphics.library with hidd stuff) does not insist on having access to gfx card memory (vram or whatever). So the fallback functions are implemented with getimage/putimage (~readpixelarray, ~writepixelarray). And those themselves can fallback to getpixel/putpixel. Theoretically the fallback functions could still be written otherwise/more optimized, like by first trying direct access (LockBitmap), and if that fails, fall back to getimage/putimage method.

So in AROS hosted on Linux a "AOS screen bitmap" or friend bitmap of it can end up being a X11 window or X11 pixmap (could also be a GL texture if driver was written to work on top of OpenGL). And a graphics.library/RectFill() call can end up as xlib/XFillRectangle().

"Can", not "has to". Driver could also have been written completely differently. Like elsewhere (UAE). Just a chunky buffer in RAM.

white

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/21 8:41 #53

Just can't stay away

@Balaton
@Joerg

I installed Ubuntu 25.04

Over time I am understanding the Asus TUF GAMING PLUS B550 motherboard by selecting ACS (ENABLE) directly from the motherboard
IOMMU groups split without needing any specific kernel.
Putting the Radeon PCI GPU in the right slot.

Even simpler is the configuration with ACS Activated on the motherboard

Ubuntu 25.04 :

/etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt vfio-pci.ids=1002:6798,1002:aaa0

then:

/etc/modprobe.d./vfio:

options vfio-pci ids=1002:6798,1002:aaa0
softdep drm pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
blacklist radeon
blacklist amdgpu

sudo update-initramfs -c -k $(uname -r)

Perfect at reboot the Radeon GPU is ready to be used.

now I also tried beyond that AC97 also ES1370 neither of them works.

Compilation with UBUNTU 25.04 occurs without problems of FUSE or other.

Obviously the installation of AmigaONE goes well but the audio is not heard in any way either with ES1370 doing for example:
-device es1370,addr=0x02 and removing all the other drivers in DEVS etc.
doing so only sb128 remains.

or with AC97 as it should be without putting anything of the qemu startup syntax.

Now I don't remember anymore I wanted to make a last attempt.

What can I add to the compilation of:
./configure --target-list=ppc-softmmu --enable-lto --enable-sdl --enable-gtk --disable-dbus-display

in this regard what is the purpose of compilation
--disable.dbus-display ?

.
I would like to add something like to compile the AUDIO with something like:

--enable-snd-audio-pa (something like that to compile the audio directly.

I don't know the exact syntax of the command

What could I put in ./configure to compile the audio specifically.

Thanks.

@smarkusg

It would be useful to know the steps you take
to modify the AmigaONE CD iso
just to be sure that the steps I take are correct.
Thanks.

It seems normal to ask for information of this type.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/21 12:59 #54

Quite a regular

@white
Unless vfio-pci is loaded from initrd maybe you don't need vfio-pci.ids in GRUB_CMDLINE_LINUX_DEFAULT, it's enough to put it in /etc/modprobe.d./vfio and you also don't need blacklist lines in the latter as the softdep pre: lines should take care of that. But it does not hurt to have these just not needed.

Your firmware seems to have the ACS override built in so no need for such kernel patch then.

For sound options see configure --help and also the output from configure which lists audio_drv_list at the end showing what it detected. If the wanted audio backend is missing you may need to install the dev package for it before running configure.

The --disable-dbus disables -display dbus backend. I don't know what that is, never used it so not needed. Maybe you also had a problem with compiling that that's why it's there or it reduces some dependencies.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/21 13:01 #55

Quite a regular

@Georg
Quote:

Regarding AOS or graphics.library. It's only a design choice if you allow the RTG system to have more or less freedom in the gfx driver interface. Whether gfx drivers can handle gfx card memory themselves (if they want to) or not to. And how you deal with fallback gfx functions for things that the driver does not implement (or accelerate) itself.

For AmigaOS maybe that design choice was made 30-40 years ago and nobody wanted to rewrite it to change it, so it's stuck with that design.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/21 13:10 #56

Quite a regular

@joerg
Quote:

It is for QEmu using real hardware, vfio pci and virtio gpu.

Does virtio-gpu use real VRAM? I thought Hans said mapping VRAM blobs to guest is not upstream yet so what virtio-gpu does now is just sending OpenGL stream to host which then forwards it to host 3D stack. Yet the slowness was seen with virtio-gpu too if I understood correctly so maybe it's something with the guest code and not what memory it accesses.

Quote:

It's not done as it would be much slower on VRAM.
On AmigaOS VRAM is mapped with CPU data caches disabled and using dcbz or dcba on cache inhibited memory either causes an exception (it's emulated by the kernel exception handler, but that's extremely slow), or may be done by some PPC CPUs without exception, but even then it's slower than using 8 32 bit integer, 4 64 bit FPU/SPE or 2 128 bit AltiVec writes for a 32 byte cache line.

I don't remember any more but sometime last year when we discussed it Hans sent a benchmark to this forum that did similar to what is done in AmigaOS kernel for WPA or whatever is used by his graphics benchmark and I did some experiments with that and looked like it used something like dcbz that's why I tried to optimised that a bit. But then this was not finished and an alternative was committed that helped a bit but maybe not fully optimised yet and I did not get back to that yet. For testin if AltiVec is used one could test with -cpu g3 and see if that changes anything. That could shed some light if AltiVec is used and if it's better or worse than not using it.

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/21 13:21 #57

Quite a regular

@Georg
Quote:

Yes, but who says that it's not the activation/setup/usage of the IOMMU that introduces slow down when this addresses are then accessed.

It's unlikely because that setup only maps the BARs with the IOMMU to host addresses where the guest is so the guest can access it with IOMMU virtual address that it thinks is the physical address. Or something like that, I don't know the details of it. So unless it's a general problem with IOMMU this should not cause slow down and a lot of people use this to pass through GPUs for Windows guests for gaming so this should generally work. What's different in out scenario is that we use emulated vCPU and the guest drivers. So it's either something the guest drivers do that would be slow on real hardware too or something in emulation that breaks the otherwise optimised guest code and does access in a way that's not optimal. Or both. But it's hard to know as it's at the bottom of a closed source stack of graphics driver and kernel so I don't know what it does and would need to find out from tracing QEMU but I don't know what to look for.

Quote:

Guest doesn't matter. What I meant is something like this.

code

You put benchmark function somewhere in (qemu) sources. Run it with "gdb". Use CTRL-Z to break into the debugger and in the debugger do "call benchmark(0x12345678, 10000)".

How is that different from running the benchmark from the host? We don't need QEMU for that. But we would need results from the same machine with the same benchmark on host, Linux guest and AmigaOS guest to be able to compare and I don't know who could get those results.

smarkusg

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/21 17:37 #58

Not too shy to talk

@white

Quote:

It would be useful to know the steps you take
to modify the AmigaONE CD iso
just to be sure that the steps I take are correct.

Test system:
I installed Aos4 for amigaone without using "vfio-pci.
I did a system update and added a RadeonHD driver.
I still had to add a monitor for my card and change the INTERRUPT.

Maybe try running qemu-system-x86_64 vfio-pci with ubuntu iso and see if you have the same sound problems.

Edited by smarkusg on 2025/4/21 18:04:44

white

Re: QEMU GPU-PCIe AmigaONE

Posted on: 4/21 19:22 #59

Just can't stay away

@smarkusg
@balaton

im use UBUNTU latest version

I have never tried this way,
I have always modified the iso.

At the moment I am using silicon with AmigaONE with all the updates already done and the AC97 sound works without problems.
Obviously without the RadeonHD 3.7 drivers installed ES 2 otherwise would not start.
And I am only using the Nvidia 4060 card.

So now I should mount the Radeon R9 580x
And install the ES 2 directly
if I understood correctly ?
Once booted with the Silicon drivers ?