My problem with cw mk4 driver, was internal buffering, issue becouse of different floppy disk sizes, the problem had to do with handling buffers, getting it working was easy, getting it to not corrupt data and mess up the system was the hard part. (That was what broke me).
Mk4 driver I had a challenges because the firmware had to be uploaded, when driver was initiated, Multitasking got in the way, but I found work around.
I also tried to port a scsi adaptec driver, problem I had was amount of code, the driver support a lot different cards, and had lot of integration with Linux kernel, taking it apart and putting it back together was tricky because it had lot kernel dependencies, and it had lots of macro code.
Amiga devices and libraries, can’t have C++ code, they need to be pure C code, that’s also another challenge.
If you port a game, there might already be lot glue libraires already ported, we don’t have same glue code for Linux kernel in AmigaOS.
Porting things often the build system is a problem, well I guess depends on your cross compiling or not.
Form my point of view, I don’t think you can say one thing is harder then other, because it depends on the driver or the game your porting.
(NutsAboutAmiga)
Basilisk II for AmigaOS4 AmigaInputAnywhere Excalibur and other tools and apps.
Amiga devices and libraries, can’t have C++ code, they need to be pure C code, that’s also another challenge.
Why can't you use C++ code? Of course there is no _start() function used in libraries and devices and you have to set up everything yourself like calling constructor functions, but it's the same with C libraries and devices for example opening other AmigaOS libraries and -lauto can't be used. You can't use any C++ I/O functions, but you can't use C nor dos.library I/O functions either. C++ exceptions may be a problem as well. But except for that using C++ for a library or device should work!?
So it's a good thing that I have no knowledge about linux nor C++
I think you mean the CatWeazle MK4? Don't own one but it looks like its offered as a legacy PCI device only. And your driver has to upload the FPGA loadfile.
Loading firmware for a modern PCIe card is pretty easy these days. Just create a DMA buffer and dump the firmware in it. Point the PCie device to the DMA buffer and ring the doorbell. PCIe card takes care of the upload.
This is done for the Soundcore 3D based soundblasters to program the onboard DSP. But also for firmware upgrade of NVMe cards. Unfortunately those NVMe manufacturers offer update utlities instead of raw update images. Otherwise, I would have created a firmware update tool for AmigaOS4.
I don't understand why it wouldn't be possible to use C++ internally in some library. I thought Warp3D Nova was written in C++, using even boost smart pointers etc.
I guess symbol “libInit”, “init_table”.. It’s long time ago I tried, I have not tested with latest G++, or maybe there is something I need to persevere symbol correct.
I have noticed often junk is added to the symbols, I see in stack traces. like: namespace__function__34dAd
(NutsAboutAmiga)
Basilisk II for AmigaOS4 AmigaInputAnywhere Excalibur and other tools and apps.
I don't understand why it wouldn't be possible to use C++ internally in some library. I thought Warp3D Nova was written in C++, using even boost smart pointers etc.
Yeah, the same as bunch of other libraries and devices written on C++ for OS4. The games i port in past also on 50% on C++ and use 3d party dependent libraries written on C++ such as boost and stuff.
@LiveForIt For functions referenced by C code you have to use extern C, check for example https://isocpp.org/wiki/faq/mixing-c-and-cpp The "junk" you get in the function names are the usual C++ naming and include the arguments of C++ functions. That's required because in C++ you can have different functions with the same name but using different arguments.
This was a bit surprising to me because until now I could read on several forums that the X5000 would have slow DRAM and PCIe performance
Quote:
PCI DMA transfers could exceed the speed of CPU RAM reads/writes. Maybe it's the same on the X1000 and X5000?
It's not surprised, as the conclusion about X5000 RAM performance was built upon RageMem benchmark, which: - Measures CPU-driven RAM accesses only. - Predates X5000 by few years and - AFAIK - it never was updated for X5000. My guess is that it benchmarks P5020 as a legacy PPC603e.
I had some fights about this burt the "conclusion" was only that “even if you’re right, no applications are using DMA so it’s irrelevant”. Hopefully, this driver and other advancements, are making it relevant – so let’s not use Ragemem results to score X5000 RAM performance😉 Even Criscot mentioned it in the readme not to compare different architectures with this tool...
I'm afraid that it's not ragemem issue. My own ram Benchmark tool shows similar results as ragemem. Also Linux memory Benchmark tools shows similar results for each core. Two threads shows double performance for the P5020. So apparently it is a cpu limitation. It looks like a cache miss is very expensive. It's also possible that 64bit mode will result in higher performance. There are specific full cacheline control instruction which might improve performance but these are not supported by our compilers.
But the NVMe Benchmark clearly shows that Ram or PCIe itself is not the bottleneck
That's the case on all PowerPC/POWER CPUs, at least all supported by AmigaOS. Especially for writes to RAM. With integer and float/double accesses it's impossible to "just" write something to RAM, it's always a read cache line, modify cache line contents and copy back the cache line instead.
Only exceptions: - On G4 (and probably the X1000 CPU) using 2 (on G4 CPUs with 32 byte cache line, 4 on the X1000 with 64 byte cache line) contiguous AltiVec stores and using the AltiVec streaming instructions are skipping the reads and only do the writes to RAM. - Using DCBA/DCBZ together with integer or float/double stores skips the reads and only writes to RAM as well. But as you wrote, the 64 bit DCBA/DCBZ instructions aren't supported by the compilers/assemblers.
So finally I've measured the time between consecutive IO commands which are send by the filesystem to my driver. It turns out that this time between IO commands is huge and scales linearly with the transfer size. So the overhead percentage is more or less equal for each different transfer size. As a result the theoretical maximum transfer speed with SFS/02 is limited to ~425MB/s (when you calculate with zero overhead for processing and executing the actual IO command).
The biggest problem in usual implementations of copy tools like C:Copy is that they simply create a new file and write to it, which is 1. slow and 2. causes fragmentation. That way for each single IDOS->Write() the file system first has to increase the file size, incl. searching for free space on the partition, etc.
What should be done instead is: Create a new (empty) file using file=IDOS->Open("filename", MODE_NEWFILE), use IDOS->ChangeFileSize(file, final_size_of_the_file, OFFSET_BEGINNING) and then do the IDOS->Write()s. That way the file system has increase the file size only once and only search once for free space on the partition, ideally a single, non-fragmented space. The IDOS->Write()s then are just writes, without the file increasing overhead.
Quote:
And last but not least Don’t buy a Solidigm P44 Pro NVMe SSD. [...]On top of that, the maximum transfer size of the drive for each NVMe IO command is just 256kBytes (Compared to 2MB for the Samsung 970). While this is not an issue for NGfilesystem with its 128kB limit, it means more overhead for SFS2.
If the limit is at most 2 MB can't you split the reads/writes in the AmigaOS device part and send several commands at once to the NVMe IO handler? For example if you get a 16 MB read or write send 8 2 MB reads/writes to the NVMe IO handler queue and wait until all 8 are completed.
If the limit is at most 2 MB can't you split the reads/writes in the AmigaOS device part and send several commands at once to the NVMe IO handler? For example if you get a 16 MB read or write send 8 2 MB reads/writes to the NVMe IO handler queue and wait until all 8 are completed.
Of course. That is how my driver works. But when the queue is full you'll have to split in multiple passes. And this is additional overhead. SFS will start preparing the next transfer only after the current one has finished. So more time spend in the handler means more time for each transfer.