I am not sure what you are trying to say here but my driver uses DMA all over the place. And manual cache flushing/invalidating is required for the SAM460. It doesn't work without it.
The Sam460 Sata driver for the internal Sata interface is PIO only. UDMA is not implemented. The PCI Sata drivers have to do manual cache flushing of DMA buffers as well if they are in main memory. Or mark DMA buffers as non-cachable.
Nobody is claiming that DMA doesn't work on a SAM4x0. On the contrary. It does work.
As long as the driver flushes the cache after a cpu write to a DMA buffer and invalidates the cache before reading from a DMA buffer. On a cache coherent system like the X5000, you don't have to worry about this. The bussnooping mechanism makes sure that caches are up to date.
The DDR controller utilizes 34-bits of the 36-bit PLB address. The most significant 2 bits of the 36-bit PLB address (bits 28 and 29 of the upper 32 bit address) allow the 34-bit address to have an alias on the low latency and high bandwidth PLB slave segments. Accesses with the upper two bits set to 0b00 are made over the low latency (LL) slave interface and accesses with the upper two bits set to 0b10 are made over the high bandwidth (HB) slave interface.
The internal SATA controller under AOS4.1 works in PIO mode because we didn't find any developer willing to write a DMA driver for it (it requires gather/scatter lists) Under Linux it works in DMA mode.
@geennaam Maybe there is a difference between 440 and 460, but on the SAM440ep the PCI SATA drivers (SII3112, SII3114 and SII3152) do work with DMA, without manual cache flushes/invalidates in the drives.
The internal SATA controller under AOS4.1 works in PIO mode because we didn't find any developer willing to write a DMA driver for it (it requires gather/scatter lists) Under Linux it works in DMA mode.
Sorry about that, I got a board from you and tried to implement it, but couldn't even get PIO working.
The gather/scatter lists weren't a problem at all, I helped Ignatios to get that working with the CyberStormPPC AmigaOS 4.x SCSI driver.
Also GART seem to be enabled but ring test for hardware acceleration failed .
[ 3.121797] [drm] radeon: dpm initialized [ 3.128222] [drm] Found VCE firmware/feedback version 50.0.1 / 17! [ 3.134469] [drm] GART: num cpu pages 524288, num gpu pages 524288 [ 3.151139] [drm] probing gen 2 caps for device aaa1:bed1 = 18cc41/0 [ 3.210767] [drm] PCIE GART of 2048M enabled (table at 0x00000000002E8000). [ 3.218077] radeon 0000:81:00.0: WB enabled [ 3.222320] radeon 0000:81:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xee962c00 [ 3.232441] radeon 0000:81:00.0: fence driver on ring 1 use gpu addr 0x0000000040000c04 and cpu addr 0xee962c04 [ 3.242562] radeon 0000:81:00.0: fence driver on ring 2 use gpu addr 0x0000000040000c08 and cpu addr 0xee962c08 [ 3.252683] radeon 0000:81:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xee962c0c [ 3.262805] radeon 0000:81:00.0: fence driver on ring 4 use gpu addr 0x0000000040000c10 and cpu addr 0xee962c10 [ 3.304324] radeon 0000:81:00.0: fence driver on ring 5 use gpu addr 0x0000000000075a18 and cpu addr 0xf5135a18 [ 3.341458] radeon 0000:81:00.0: fence driver on ring 6 use gpu addr 0x0000000040000c18 and cpu addr 0xee962c18 [ 3.351618] radeon 0000:81:00.0: fence driver on ring 7 use gpu addr 0x0000000040000c1c and cpu addr 0xee962c1c [ 3.361759] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 3.368404] [drm] Driver supports precise vblank timestamp query. [ 3.374523] radeon 0000:81:00.0: radeon: MSI limited to 32-bit [ 3.380446] genirq: Setting trigger mode 3 for irq 45 failed (uic_set_irq_type+0x0/0x160) [ 3.388716] radeon 0000:81:00.0: radeon: using MSI. [ 3.393673] [drm] radeon: irq initialized. [ 4.152902] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD) [ 4.161744] radeon 0000:81:00.0: disabling GPU acceleration
Also GART seem to be enabled but ring test for hardware acceleration failed .
Depending on which GPU series, that may be due to lack of endinness conversion. IIRC, the SI series is the last one where the command processor can be programmed to handle big-endian. AMD had already started removing bi-endianness at that stage, and some things already needed to be manually byte-swapped.
@geennaaam Quote:
As long as the driver flushes the cache after a cpu write to a DMA buffer and invalidates the cache before reading from a DMA buffer. On a cache coherent system like the X5000, you don't have to worry about this. The bussnooping mechanism makes sure that caches are up to date.
That's the correct procedure. Amiga computers only recently started getting cache coherency.
I don't understand why manual cache flushing isn't working for GART. I flush/invalidate the cache everywhere where I think it's necessary followed by a sync instruction. Yet, it'll still locking up. The sync instruction should wait until the flush/invalidate is done. Maybe there are other problems (e.g., a cache in the PCIe controller). Or maybe I'm doing something incorrectly in the driver.
The sync instruction should wait until the flush/invalidate is done.
IIRC there are different sync instructions on the different Power(PC) CPUs and not all of them may wait for cache flush/invalidate to complete, or aren't even implemented at all (no-op) on some CPUs, but the HAL parts used by the IExec->Cache*() and IExec->*DMA() functions should work correctly on all CPUs incl. the required sync.
Are you using the exec functions, or dcbf/dcbi and msync+isync manually?
Are you using the exec functions, or dcbf/dcbi and msync+isync manually?
I'm using the IExec->Cache*() and IExec->*DMA() functions.
It's also possible that the PCIe controller may have a cache that goes stale. PCIe appears to have some form of snooping capability (there's a "no-snoop" configuration bit).
This reminds me a bit of a problem I had with putting the command buffers in VRAM. There was a chance that the data wouldn't arrive in VRAM in time before the GPU started reading it. AMD's engineers said that was why they don't put the command buffers in VRAM; there's no mechanism to delay the GPU read until the data is in. I managed to get it working by waiting for the GPU to be idle before committing the next buffer.
The ring buffer is likely the first place where you'd hit problems with lack of cache coherency. However, if you're using an SI card or newer, then the problem could also be that their drivers can't handle big-endian.
In this case it was a 9200SE so certainly old enough to support big endian.
Quote:
IIRC, ACube did have a Linux driver that worked with GART enabled. I don't know for which cards, or how it worked.
That's interesting. Rarely are Linux drivers superior to OS4 drivers. In this kind of way.
Quote:
I can think of one way to get GART working, and that's to mark all memory used for GART as non-cacheable (or disable the data cache entirely). The L2 cache might need to be disabled too. Needless to say, doing so would come with a serious performance penalty.
I can imagine so. It would be impractical to keep it off. But also if it was disabled for each transfer.
@Hans Maybe instead of disabling the caches completely using write-through instead of copy-back mode for the caches could help?
I don't know if that's possible at all with the IMMU functions, nor if the 440/460 CPUs support it. I did that change in beta versions of the A1200/A4000 603/604 CPU kernel for my classic Amiga beta testers. Not because of DMA, there isn't any DMA on classic Amigas in AmigaOS 4.x anyway, except for the CyberStormPPC SCSI driver, but because it was faster.
Just now tested new Radeon HD v.5 for AmigaOne X1000 and is significantly faster than v.3:
Could you do me a favor and try playing videos with MPlayer? When use v5 driver on my X1000 with a 7550 grapohics card, MPlayer will eventually freeze when it's playing a video. I haven't been able to get anyone else to try it and see if that happens for them as well. I've going back to the older version because of this problem. Please....
@ktadd tried RadeonHD v.5 + MickJT, LiveforitHD and LiveforitNG MPlayers. all works, no freezes.
Only this players have no UVD support yet, CPU is utilized upto 80-100% with HD (and some fullHD) videos (altivec versions). With Emotion or DVPlayer CPU is 15-20%. Unfortunatelly both DVPlayer+Emotion though saves CPU, still not capable of higher resolution videos with X1000+RadeonHD.
Do you have specific video which not works for you? I can test it.
AmigaOS3: Amiga 1200 AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000 MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad