@joerg Yes I know, maybe my wording was not clear. What GPL prevents is one party turning GPL software into closed source software which is what I really meant by "commercial code", so maybe that's what causes misunderstanding. BSD license has no such restriction so somebody can take it and use it in part or change it and make it closed source, they only cannot claim they wrote it and have to keep copyright messages to acknowledge original developers. GPL also requires to keep the sources free and available for everybody and not use it in code that does not allow the same. So when you substitute commercial code with closed source code that's what I meant but failed to describe properly.
By the way it's not the GPL code that's sold commercially because the code itself is freely available (and cannot be charged more than the fair amount needed to transfer it) but additional services using that code like making distros or providing services based on that code and so on. But GPL software can be developed commercially for sure.
I think you should only execute config-l@ once, not twice. When you execute it the second time it takes the value that the first execution left on the stack (the result you're trying to see) as the address to report on, which is not what you want (and may be why you're getting strange results). You should just do:
16 config-l@ .
Quote:
(Also realised where did you get 16 for BAR address. Forth is hex by default do dec# 16 is wriiten as 10 in Forth.)
Good point. Forth itself is decimal by default, but OpenFirmware may not be. The value '646011AB' is clearly in hex, so the number '16' is probably taken as hex as well. You could try something like:
20 1 - .
If you get 13 (the value 19 in hex) then the numbers are being interpreted as decimal. If you get 1F then they're being interpreted as hex.
I think you should only execute config-l@ once, not twice. When you execute it the second time it takes the value that the first execution left on the stack (the result you're trying to see) as the address to report on, which is not what you want (and may be why you're getting strange results). You should just do:
16 config-l@ .
It's strangely return 0, does not matter where i am in : in /pci/ , or in /pci/pci@7 (in bridge), see:
ok cd /pci
ok pwd
/pci@80000000
ok 16 config-l@ .
0
ok cd pci@7
ok pwd
/pci@80000000/pci@7
ok 16 config-l@ .
0
ok
At first i think that maybe it's exactly issue in bridge, but then the same happens if i go to any pci based directory, always zero..
Quote:
If you get 1F then they're being interpreted as hex.
@Joerg Same 0 as expected :( (there should be values anyway, even with wrong 16). Question is : what BAR we tried to read with this "16", i mean first BAR of what, if it the same does not matter in what directory (be it root /pci, or /pci/pci@7, or any other pci) happens to be. Like it just general offset of 16 of whole PCI thing, but we need first BAR of the card behind the bridge : how to say/calculate that ?
@Sailor It takes a while, but i at last received this AGP-to-PCI(e) adapter we talk about on the first page from there: https://www.ebay.com/itm/125723908233
Takes 2 just in case, but then i probably will test them once Hans deal with casual bridge , as there are changes that with this kind of adapters and tests around it my pegasos2 will burn and die, so firstly want to be able to finish testing of what Hans is working on, and then will test this adapter.
@All Hans progressing pretty well with replacing non-working-with bridges RTAS way os4 kernel uses for pegasos2, to the direct PCI reading/writing way, and currently there few bits to fix before it can come up with something, but at least we surely have correct addresses of video memory now, things which remain is to fix some registers reading, and then there high chance it will work! Rise of Frankenstein !
Tried it with both bridges : pericom and pex ones, and in both cases in firmware when i go to the pci@C0000000 (agp area) all i have is pci@8 , properties on which show that this is bridge (so both bridges detects correctly via adapter), but , both didn't see a graphics card in.
One time i was lucky (probably was some bad attachment of adapter or something), and instead of just pci@8 in pci@C0000000, i did have about 20 different pci's (pci@1, pci@2, pci@3, etc), in which card were detected ! (both audio and video parts). But that was just one time, and does not matter how hard i tried to reproduce it, i always can't. While when bridge just in pure PCI without adapters all fine and detects by firmware fine.
So probably conclusion is : this missed "lock" signal is what made it not works. The one time detecting was probably some bug in this lock signal handling or something.
Did i understand correctly, that on pegasos2 we have PCI slots which is 32-bit ones on 33MHZ, and an AGP one which in reality the same PCI 32-bit one, just not on 33MHZ, but on 66 MHZ, and that all difference ?
If so, then did i get it right, that maximum limit of the PCI bus is 133.33 MB/s , while AGP (in our case PCI 66MHZ one) is 266 MB/s ? I.e. with PCI to PCIe bridge, we can only reach the limits of the PCI bus, which is 133 mb/s ?
What i mean, that i tested for now via gfxbench my Radeon9250 in AGP (so 32bit PCI 66mhz one) slot, and have those results:
As far as i can see there, only WritePixelArray almost hit the limit of our AGP (216 mib = ~226mb, while limit is 266). But copy32, copy64 and all that are 2 times slower than a limit.
Is it mean Radeon9250 just can't reach AGP's bus maximum then in some cases ?
This one absolutely not hit the limits of AGP, as all the values in 5 times less that the AGP limits.
Is it again, because of Radeon9250 which can't reach AGP limits, or, it's just AmigaOS itself and it's kernel/driver/graphics.library cause issues there ?
Basically, if i got it right, with the PCI bridge in PCI (33mhz) slot we can reach at maximum with does not matter what graphics card we will use, a WritePixelArray of ~130MIB/s maximum , but then, copy from VRAM to RAM can be or the same at worst, or faster till 130mb/s in all tests, as even with Radeon9250 they didn't hit the limits.
Reads are slower because they involve sending a request to the card, and then receiving the response (i.e., the returned value) from the card. This is inherently slower than shoveling data to the card.
DMA transfers can reduce the overhead by sending data in large blocks, so you need much fewer requests.
As far as i can see there, only WritePixelArray almost hit the limit of our AGP (216 mib = ~226mb, while limit is 266). But copy32, copy64 and all that are 2 times slower than a limit.
If you have a G4 CPU WritePixelArray and useExecCopyMem use AltiVec transferring 128 bits at a time, copy64f the FPU with 64 bits at a time, copy64 probably 2 * 32 bit integers and copy32 32 bit integer accesses. AFAIK copyToVRAM and copyFromVRAM use AltiVec as well. Each access to VRAM has PCI overhead, more bits transferred per access results in faster speeds. The useMemcpy and useExecCopyMem results are much slower than they should be, but I don't know what the problem is.
@Hans Quote:
DMA transfers can reduce the overhead by sending data in large blocks, so you need much fewer requests.
On Classic Amigas, AmigaOne and Pegasos2 there is no OS DMA copy (graphics (Read|Write)PixelArray(), exec CopyMemQuick(), etc.), only Sam4x0, X1000, X5000 and maybe A1222 have DMA based copy functions. AFAIK GART is disabled in your drivers on AmigaOne and Pegasos2 as well, therefore no DMA at all.
Depending on the CPU the OS copy functions may use AltiVec on AmigaOne and Pegasos2, but for gfx card VRAM accesses that can only be about twice as fast as FPU based copy functions, if at all. The DRAM read part of an AltiVec based copy between DRAM and VRAM should be more than twice as fast as a FPU one, but DRAM writes shouldn't make a difference (using DCBA or DCBZ with FPU writes is about the same speed as AltiVec writes using the streaming instructions).
@All For first, very good news: Hans did it ! After replacing RTAS way of working with PCI registers in pegasos2 kernel to the direct way and fixing some issues in process, we were able to get both RadeonHD and RadeonRX to work via PCI-2-PCIe bridge!
So, good news first : everything works. Hardware video acceleration via VA library, GL4ES, Warp3DNova, ogles2.library, etc, etc. While i made some big video about, see the short one just for little bit of tease:
Then the bad news : while copy from RAM to VRAM are slow (2-3 times slower than Radeon9250 in AGP slot), the from VRAM to RAM is abnormaly bad: slower in 25(!) times than Radeon9250 with AGP.
Yes, what you see on the video, it's usage of VA library and Spencer game, which seems to be programmed in "large enough" blocks (or so), and it didn't surfer much from those small operations, but at least when you use workbench you can see that in some operations (like scrolling the icons in the directory) slower pretty much (while, moving the window with transaprency very fast).
The Pericom - PI7C9X111SL while also suffer from those speed issues, still, few times better and faster than PEX8112 based one. Dunno what the reassons, but that it. It feels almost "OK", but not enough for to be called fluid, but a PLX's one, this really pain.
I made a small graph with copy to-from ram-vram so to see visually how it all looks like in one table, but you can take 3 gfxbench files directly too to see them all, too:
graph (click open image in new tab to expland for full size)
So..now question : wtf and how we can improve the situation. That not necessary need to be gazillion faster, but at least it need to be suffer from visual pauses. Did anyone know how bridges need to be programmed ? Maybe they had some features we can enable/disable to speed things up ?
Looks like the driver's PEX bridge code needs updating. The driver enabled blind prefetching. But, if your little code snippet can increase the read transfer rates, then the driver's bridge code is sub-optimal.
BTW, what exactly does this code do:
pci_write_config_byte(dev, 0x48, 0x11);
pci_write_config_byte(dev, 0x84, 0x0c); // index = 0x100c
pci_write_config_dword(dev, 0x88, 0xcf008020); // data
Did i understand right, that we have somewhere in the OS4 a driver for PLX's bridge ? (that surprise for me). Or a driver for Pericom's bridge ? I were under impression, that "bridge driver" in os4, it's just some kernel based code, which handle bridge on "generic" way : same code for any pci-to-pcie bridge, and this one is the kernel's code, and only matter of how the firmware (?) configure a bridge ?
What is more strange for me, is how much slower the PLX's bridge are in compare with Pericom's one. It's like, or we have no any code for any bridge in terms of configuring, and doing in kernel some "usual" stuff, and by default Pericom just have better default values, while PLX one are not.
Did i understand right, that we have somewhere in the OS4 a driver for PLX's bridge ? (that surprise for me). Or a driver for Pericom's bridge ?
The Sam440ep and Sam440ep-flex include a Pericom 8150B PCI bridge. I don't know if any code for it is required in the AmigaOS kernel (expansion.library) PCI functions, or if the initialization in U-Boot is enough, but in case special code for it is required in AmigaOS it's of course as usual only included in the kernel versions for hardware including it, i.e. the Sam4x0 kernels, not in any other kernel versions like the Pegasos2, classic Amiga, X1000, X5000, etc. ones without such hardware. The 8150B is a PCI-to-PCI, not a PCI-to-PCIe, bridge, but if you have code for something similar already adding support for the 8112 PCIe one probably wasn't much additional work. The AmigaOne SE/XE/µA1 has PCI brides as well, the only hardware without any bridge is probably the Pegasos2.
I guess what Hans meant with "Looks like the driver's PEX bridge code needs updating." is bridge code in his Radeon HD/RX drivers, not something in the AmigaOS kernel.