NVMe device driver

Re: NVMe device driver

Posted on: 2023/6/1 9:32 #201

Quite a regular

@Rolar

I've seen some reports of drives that fail to work with my driver.

I've tried to search if other platforms have similar issues and it looks like the issue might be related to legacy interrupts (emulated pin interrupts). I've encountered multiple websites which recommend to use MSI/MSI-X interrupts only with NVMe. Windows even offers a tool (LOGO) which checks which type of interrupt work for an attached NVMe drive and which don't.
The publically available OS4 kernels supports legacy interrupts only. But the latest SDK contains traces of MSI support. So if a new kernel will ever be released, this might solve that issue.

The driver on os4depot is really interrupt based. If an interrupt is not received within the timeout window then it will generate an error. In your case, this error likely occurs during initialisation and therefore shows the symptoms as if there was no NVMe drive found at all (bug in cleanup routine).
My current beta driver checks for an active interrupt inside the NVMe drive itself. But since the NVMe completion is much faster then the interrupt response, I might as well simply poll the completion queues. So stay tuned

Edit1: The good news is, it works to ignore interrupt and check the completion queue. The bad news is that it has a negative impact on performance because I need to flush caches during polling.

Edited by geennaam on 2023/6/1 13:23:38

Re: NVMe device driver

Posted on: 2023/6/1 22:49 #202

Quite a regular

@geennaam

I have it on the best authority that the kernel does not, and never has, supported MSI interrupts. Nothing has changed in that regard with newer kernels.

cheers
tony

Re: NVMe device driver

Posted on: 2023/6/2 7:39 #203

Quite a regular

@tonyw

Weird. This is what I can find in pci.h inside the latest SDK:


enum enCapMSIRegs

{

    PCI_MSI_CONTROL            = 2,                /* Message Control Register */

    PCI_MSI_ADDRESS_LOW        = 4,                /* Message Address Register, low 32 bit */

    PCI_MSI_ADDRESS_HIGH    = 8,                /* Message Address Register, high 32 bit */

    PCI_MSI_DATA_32            = 8,                /* Message Data Register for 32 bit structure */

    PCI_MSI_DATA_64            = 12,                /* Message Data Register for 64 bit structure */

    PCI_MSI_MASK_32            = 12,                /* Message Mask Register for 32 bit */

    PCI_MSI_MASK_64            = 16,                /* Message Mask Register for 64 bit */

    PCI_MSI_PENDING_32        = 16,                /* Message Pending Register for 32 bit */

    PCI_MSI_PENDING_64        = 20,                /* Message Pending Register for 64 bit */

};



enum enCapMSIBits

{

    PCI_MSI_CONTROL_ENABLE    = 0x0001,            /* Enable MSI */

    PCI_MSI_CONTROL_MCAP    = 0x000e,            /* Multi Message Capable Mask */

    PCI_MSI_CONTROL_MEN        = 0x0070,            /* Multi Message Enable Mask */

    PCI_MSI_CONTROL_64        = 0x0080,            /* Structure is 64 bit */

    PCI_MSI_CONTROL_MASK    = 0x0100,            /* Individual masking allowed */

};



/* Message Signaled Interrupt CAP */

struct PCICapability_MSI

{

    struct PCICapability      CapHeader;



    BOOL                      Is64Bit;                /* True if the device is capable of 64 bit MSI addresses */

    uint64                     MessageAddress;        /* The message target address. Note that the interrupt controller code

                                                     * has to set this up accordingly. 0 means MSI is disabled for this device

                                                     */



};

I thought that this was the ground work for MSI support in new kernel versions. But apparently not.

Rolar

Re: NVMe device driver

Posted on: 2023/6/2 12:44 #204

Just popping in

@geennaam
Quote:

I've tried to search if other platforms have similar issues and it looks like the issue might be related to legacy interrupts (emulated pin interrupts). I've encountered multiple websites which recommend to use MSI/MSI-X interrupts only with NVMe. Windows even offers a tool (LOGO) which checks which type of interrupt work for an attached NVMe drive and which don't.
The publically available OS4 kernels supports legacy interrupts only. But the latest SDK contains traces of MSI support. So if a new kernel will ever be released, this might solve that issue

Ah, my usual luck to choose a model which has unexpected issues... Unfortunately it's too late to return it. Is there for Linux some tool similar to 'LOGO'?

Quote:

The driver on os4depot is really interrupt based. If an interrupt is not received within the timeout window then it will generate an error. In your case, this error likely occurs during initialisation and therefore shows the symptoms as if there was no NVMe drive found at all (bug in cleanup routine).
My current beta driver checks for an active interrupt inside the NVMe drive itself. But since the NVMe completion is much faster then the interrupt response, I might as well simply poll the completion queues. So stay tuned

Edit1: The good news is, it works to ignore interrupt and check the completion queue. The bad news is that it has a negative impact on performance because I need to flush caches during polling.

Ok, let me know when there is a new version available... And if you need betatesters for the prerelease versions, just drop me a PM

Re: NVMe device driver

Posted on: 2023/6/3 23:32 #205

Quite a regular

@geennaam

Yes, those #defines are in the headers but not implemented in the code, I have been told.

cheers
tony

Re: NVMe device driver

Posted on: 2023/6/5 0:37 #206

Quite a regular

I have updated DiskSpeed (not SCSISpeed) with the changes suggested by Joerg (to fix the counter overflow problem) and added a 1 MB buffer setting for test.

I have submitted DiskSpeed V4.5 to OS4Depot for upload, should be available soon.

Meanwhile, here are results of Geennaam's driver with a 512 GB Kingston "device" on my X5000-20:

>DiskSpeed drive NVMeDrive: all
DiskSpeed 4.5, OS4 version
Copyright © 1989-92 MKSoft Development
Copyright © 2003-04 Daniel J. Andrea II & Stéphane Guillard
Modified June 2023 by A. W. Wyatt for VP DOS API

------------------------------------------------------------
CPU: X5000-20 AmigaOS Version: 54.56 Normal Video DMA
Device: NVMeDrive: Buffers: <information unavailable>

Testing directory manipulation speed.
File Create: 5610 files/sec
File Open: 56.79 kfiles/sec
Directory Scan: 66.18 kfiles/sec
File Delete: 7578 files/sec

Seek/Read: 423.75 kseeks/sec

Testing with a 512 byte, LONG-aligned buffer.
Create file: 30.60 MiB/sec
Write to file: 42.26 MiB/sec
Read from file: 190.60 MiB/sec

Testing with a 4096 byte, LONG-aligned buffer.
Create file: 74.44 MiB/sec
Write to file: 288.38 MiB/sec
Read from file: 709.14 MiB/sec

Testing with a 32768 byte, LONG-aligned buffer.
Create file: 95.62 MiB/sec
Write to file: 1.15 GiB/sec
Read from file: 1.05 GiB/sec

Testing with a 262144 byte, LONG-aligned buffer.
Create file: 98.28 MiB/sec
Write to file: 1.58 GiB/sec
Read from file: 1.08 GiB/sec

Testing with a 1048576 byte, LONG-aligned buffer.
Create file: 95.12 MiB/sec
Write to file: 932.00 MiB/sec
Read from file: 986.50 MiB/sec

cheers
tony

eliyahu

Re: NVMe device driver

Posted on: 2023/6/5 2:05 #207

Not too shy to talk

@tonyw

Thanks for the DiskSpeed updates, Tony! I am a little surprised there isn't a bigger delta in performance between your NVMe drive and my SSD. The numbers posted below are from a X5000/20 with a Samsung EVO SSD attached to the on-board SATA interface. The volume under test is a 400GB NGFS\01 partition on that disk:


6.RAM Disk:DiskSpeed> diskspeed drive Scratch: all

DiskSpeed 4.5, OS4 version

Copyright © 1989-92 MKSoft Development

Copyright © 2003-04 Daniel J. Andrea II & Stéphane Guillard

Modified June 2023 by A. W. Wyatt for VP DOS API



------------------------------------------------------------

CPU: X5000-20  AmigaOS Version: 54.56  Normal Video DMA

Device: Scratch:    Buffers: <information unavailable>



Testing directory manipulation speed.

File Create:         4109  files/sec

File Open:          72.24 kfiles/sec

Directory Scan:    297.27 kfiles/sec

File Delete:        14.45 kfiles/sec



Seek/Read:         530.67 kseeks/sec



Testing with a 512 byte, LONG-aligned buffer.

Create file:        18.18 MiB/sec

Write to file:      45.39 MiB/sec

Read from file:    223.65 MiB/sec



Testing with a 4096 byte, LONG-aligned buffer.

Create file:        24.90 MiB/sec

Write to file:     323.09 MiB/sec

Read from file:    785.62 MiB/sec



Testing with a 32768 byte, LONG-aligned buffer.

Create file:        27.61 MiB/sec

Write to file:       1.17 GiB/sec

Read from file:      1.10 GiB/sec



Testing with a 262144 byte, LONG-aligned buffer.

Create file:        27.69 MiB/sec

Write to file:       1.62 GiB/sec

Read from file:      1.10 GiB/sec



Testing with a 1048576 byte, LONG-aligned buffer.

Create file:        26.88 MiB/sec

Write to file:     959.12 MiB/sec

Read from file:      1.00 GiB/sec

-- eliyahu

"Physical reality is consistent with universal laws. When the laws do not operate, there is no reality. All of this is unreal."

Re: NVMe device driver

Posted on: 2023/6/5 4:55 #208

Just can't stay away

@eliyahu
Quote:

Thanks for the DiskSpeed updates, Tony! I am a little surprised there isn't a bigger delta in performance between your NVMe drive and my SSD. The numbers posted below are from a X5000/20 with a Samsung EVO SSD attached to the on-board SATA interface. The volume under test is a 400GB NGFS\01 partition on that disk:

DiskSpeed is a tool to compare different FileSystems using the same driver and hardware, not for comparing different drivers/hardware, that's what SCSISpeed is for.
In tonyw's results the most important details are missing: Which filesystem is used? Which BlockSize is used on the test partition? In case of NGFS: Is it a beta version with the strange 128 KB transfer size limit fixed already, or an old version with this limit which makes fast reads and writes impossible with any driver and hardware?
Adding 1 MB default buffer size is better than the old versions (512 byte - 256 KB only), but still way too small to get fast read/write speeds on any current hardware. For example the C:Copy tests geennaam did were using a 16 MB buffer. For any usable test, no matter if DiskSpeed, ScsiSpeed or C:Copy, the buffer size has to be larger than the disk cache used, or all you'll get is the performance of the IExec->CopyMemQuick() implementation on your system instead of anything related to the disk speed.

Re: NVMe device driver

Posted on: 2023/6/5 8:04 #209

Quite a regular

@eliyahu

It looks like there's some DDR3 memory cache benchmarking going on.

I can assure you that the X5k is SATA2. Hence a 300MByte/s theoretical limit. However the raw read speed is about 250MB/s with my SATA SSD. With larger transfer sizes, the P50x20sata.device starts chopping up the transfer to smaller sized chunk (the driver informs this with debug ouput on terminal). As a result, the read speed drops a little. I will post the benchmarks later today with my scsispeed alternative.

EDIT1:

X5000 Sata:


SSDBenchmark V0.3



device: p50x0sata.device,1 



--------------------------------------

Read size 512 bytes: 9 Mbyte/s

Read size 1024 bytes: 26 Mbyte/s

Read size 2048 bytes: 48 Mbyte/s

Read size 4096 bytes: 80 Mbyte/s

Read size 8192 bytes: 127 Mbyte/s

Read size 16384 bytes: 165 Mbyte/s

Read size 32768 bytes: 200 Mbyte/s

Read size 65536 bytes: 222 Mbyte/s

Read size 131072 bytes: 232 Mbyte/s

Read size 262144 bytes: 242 Mbyte/s

Read size 524288 bytes: 246 Mbyte/s

Read size 1048576 bytes: 249 Mbyte/s

Read size 2097152 bytes: 249 Mbyte/s

Read size 4194304 bytes: 247 Mbyte/s

Read size 8388608 bytes: 248 Mbyte/s

Read size 16777216 bytes: 248 Mbyte/s

Read size 33554432 bytes: 243 Mbyte/s

Read size 67108864 bytes: 241 Mbyte/s

Read size 134217728 bytes: 240 Mbyte/s

--------------------------------------

DONE!

X5000 NVMe:


SSDBenchmark V0.3



device: nvme.device,0 



--------------------------------------

Read size 512 bytes: 10 Mbyte/s

Read size 1024 bytes: 20 Mbyte/s

Read size 2048 bytes: 31 Mbyte/s

Read size 4096 bytes: 68 Mbyte/s

Read size 8192 bytes: 131 Mbyte/s

Read size 16384 bytes: 54 Mbyte/s

Read size 32768 bytes: 95 Mbyte/s

Read size 65536 bytes: 153 Mbyte/s

Read size 131072 bytes: 281 Mbyte/s

Read size 262144 bytes: 454 Mbyte/s

Read size 524288 bytes: 626 Mbyte/s

Read size 1048576 bytes: 768 Mbyte/s

Read size 2097152 bytes: 857 Mbyte/s

Read size 4194304 bytes: 861 Mbyte/s

Read size 8388608 bytes: 875 Mbyte/s

Read size 16777216 bytes: 1103 Mbyte/s

Read size 33554432 bytes: 1282 Mbyte/s

Read size 67108864 bytes: 1389 Mbyte/s

Read size 134217728 bytes: 1484 Mbyte/s

--------------------------------------

DONE!

Edited by geennaam on 2023/6/5 8:49:35

eliyahu

Re: NVMe device driver

Posted on: 2023/6/5 10:47 #210

Not too shy to talk

@joerg

Thanks for the background. I appreciate the explanation.

@geennaam

I'd be interested in running your SSDBenchmark tool on my systems at some point. Any possibility of a public release?

-- eliyahu

"Physical reality is consistent with universal laws. When the laws do not operate, there is no reality. All of this is unreal."

Re: NVMe device driver

Posted on: 2023/6/5 11:41 #211

Quite a regular

@eliyahu

This is it for now. There will be no new public release in the near future.

Edited by geennaam on 2023/6/16 17:57:33
Edited by geennaam on 2023/6/16 17:58:27

Re: NVMe device driver

Posted on: 2023/6/5 23:35 #212

Quite a regular

@joerg

The transfer size limit is set by the size of the disk cache, the read-ahead cache and the number of available "buffers". Since NGFS has a write-through cache, all Reads and Writes go through the cache. Also, since it is a journalling file system, all Writes to disk (of meta data) take three Write operations, not just one.

The cache "buffers" are permanently allocated from the system and controlled by internal allocation code. Allocating and de-allocating cache buffers from the Exec imparts a heavy speed penalty. For a partition of 100 GB+, 4096-byte blocks are used, which requires 16 MB of cache for each such partition. I have 23 such partitions on my X-5000, so the cache is no bigger than necessary.

Many years ago, when I spent a lot of time optimising performance, I played with cluster sizes, number of cache buffers, etc. The FS was optimised (at the time) for overall speed *of my test suite*, not for the speed of individual transfers.

I have a test suite that runs all sorts of different tests and takes about 12 minutes to complete. The optimisation work was performed on a Sam 460 with a mechanical hard drive (the mid-range machine at the time). The 32-block cluster that limits read/write transfer sizes gave the best *overall* performance at the time.

Now that I have Geennaam's driver working, I can revisit the speed optimisations and check to see if there is anything to be gained by changing the settings. I doubt that any great increase can be achieved.

PS. Naturally, the test results I published were taken using the current version of NGFS. It would be unfair to publish the results of tests performed on other file systems. The partition size in this case was about 120 GiB.

cheers
tony

Re: NVMe device driver

Posted on: 2023/6/6 7:25 #213

Quite a regular

@tonyw

Small sized, single command transfers is the Achilles' heel of NVMe. Small sizes are fine as long as you overload the drive with them (the more IOs, the better). Alternatively, large transfers are fine because they are broken down in multiple small transfers (Size depends on NVMe controller) and fed to the submission queue.

Currently, my driver is optimised for large transfers. A future release will include independant submission and retire queues in order to create a true pipelined flow. But this will also require a filesystem which is capable of sending multiple IOs.

Re: NVMe device driver

Posted on: 2023/6/6 13:39 #214

Just can't stay away

@tonyw
Quote:

The transfer size limit is set by the size of the disk cache, the read-ahead cache and the number of available "buffers". Since NGFS has a write-through cache, all Reads and Writes go through the cache.

Read-ahead and copy-back caches only help for small transfers, not for large ones (slower than without a cache) and caching everything doesn't make sense either.
For meta-data blocks SFS has "buffers", which are something completely different than the diskcache.library (or SFS internal if diskcache.library isn't used) caches.
For transfers larger than the cache line size, IIRC 64 KB in diskcache.library, I just just invalidate the caches of the transfer, in case some of the sectors were in the cache and the contents change, and do a single device read or write of the size the file system got from the application if it's start address and size are multiples of the block size. If it's not block aligned only the first and/or last part(s) smaller than a block are done through the read-ahead/copy-back cache, but the largest part is bypassing the cache.
The disk cache used in the AmigaOS port of NTFS, and probably all FUSE/FileSysBox file systems, does the same as I do in diskcache.library: Only small transfers use the cache, large ones don't.

Quote:

Also, since it is a journalling file system, all Writes to disk (of meta data) take three Write operations, not just one.

It's the same in SFS, (at least) 3 writes and a CMD_UPDATE, but delayed by the flush timeout.

Quote:

For a partition of 100 GB+, 4096-byte blocks are used, which requires 16 MB of cache for each such partition. I have 23 such partitions on my X-5000, so the cache is no bigger than necessary.

The diskcache.library cache is much larger, some percent of the installed RAM, but it's a single cache shared by all partitions using diskcache.library.

Re: NVMe device driver

Posted on: 2023/6/6 23:00 #215

Quite a regular

@joerg

Thanks for that discussion. I think I tried (years ago) bypassing the cache for large transfers but it did not benefit the overall speed of the test suite, so I removed the extra code (don't like special cases). Of course, the test suite does not use a lot of huge transfer sizes such as we are testing with Geennaam's driver.

I'll try re-enabling the bypass-cache code and see if it improves DiskSpeed's performance.

I keep asking myself: "Why are we striving for maximum benchmark performance if it won't make much difference to real-world operation? What sort of application will benefit from an increase of transfer speed for buffer sizes > 1 MiB?"

I can't help thinking that this whole investigation is a solution looking for a problem.

cheers
tony

Re: NVMe device driver

Posted on: 2023/6/7 4:54 #216

Just can't stay away

@tonyw
Quote:

I keep asking myself: "Why are we striving for maximum benchmark performance if it won't make much difference to real-world operation? What sort of application will benefit from an increase of transfer speed for buffer sizes > 1 MiB?"

Some examples:
- Compiling software, 16 MB is too few for keeping all executables (make, gcc, gas, ld, etc.) in the cache and loading the large executables bypassing the cache should be faster. Small files like the includes will stay in the cache, and if the large executables aren't cached much more of them.
- Playing large audio or video files.
- Editing or converting audio or video files.
- Copying files.

Usual benchmarks are faster if you put everything into the cache (only if the benchmark uses files <= the cache size you are using), but real world software is usually faster if you bypass the cache for large transfers.
Most software using large transfers uses the data of the large transfers only once and putting it into the cache removes a lot of other cached data which is accessed more often.

Re: NVMe device driver

Posted on: 2023/6/8 8:02 #217

Just can't stay away

@geennaam
Quote:

Currently, my driver is optimised for large transfers. A future release will include independant submission and retire queues in order to create a true pipelined flow. But this will also require a filesystem which is capable of sending multiple IOs.

The only AmigaOS file system which might still be able to do that, if it wasn't ported to the new AmigaOS 4.1 FS API yet, is FFS2, using the ACTION_(READ|WRITE)_RETURN packets for device I/O.
In FileSystems using the new AmigaOS 4.1 FS API that's no usable option, and in my AmigaOS 4.x SFS/JXFS implementations, which neither use the old TRIPOS/AmigaOS 0.x-3.9 packet API nor the new AmigaOS 4.1 FS API but a custom one, it's not possible either.

Re: NVMe device driver

Posted on: 2023/6/8 10:28 #218

Quite a regular

@joerg

That's why it's not high on my priorities list. Actually, nothing is. I have an Amiga summer dip with temperatures approaching 30 degC atm.

Re: NVMe device driver

Posted on: 2023/6/9 10:30 #219

Quite a regular

So I modified NGFS' ReadData() function so that for a Read request size larger than MAX_CACHE_READ, it bypasses the cache and reads the device directly into the caller's buffer. I haven't made any changes to Write yet.

Result is surprising: read speeds fall by a factor of 4 or 5. I then tried breaking up the long Read into several shorter Reads, but the overall speed doesn't change much with different sub-read sizes.

I think what is happening is this:

In the current version, everything goes through the cache. So the first read is slow, then all later reads are much faster, leading to an average that is pretty good. But when you ignore the cache and read directly from the disk each time, it's going to be much slower than reading from the memory-resident cache.

The code in DiskSpeed measures the overall time to Read() and Seek() to the beginning again (repeated many times). The actual times of the first disk Read() and the subsequent cache Reads are all averaged, so the difference between them is not visible. Bypass the cache and you see only slow transfers.

In the case of Writes, they all write into the memory-resident cache, which is written to disk some time later, so short Write() operations appear fast. They only slow down when the Write() length exceeds the cache size. A 1 MiB test size operates at full Write speed, although the reported speed is going to be slower than the maximum because of the included Seek() times.

I will add some longer test transfers to DiskSpeed and see what happens.

cheers
tony