NVMe device driver

Re: NVMe device driver

Posted on: 2023/4/30 12:23 #161

Quite a regular

@Rolar
This command will make the shell print the execution speed of a shell command:

prompt "%N.%S [%E s] >"

Re: NVMe device driver

Posted on: 2023/4/30 17:36 #162

Just popping in

@joergQuote:

To compare the Linux results from the Hyperion forum you'd have to use something like
SCSISpeed nvme.device:0 LONG FAST BUF1 10485760 BUF2 104857600 BUF3 1073741824 BUF4 1073741824

I got this kind of results with SCSISpeed in X5040:


Buffer             Kingston         OWC              Hitachi

******             *********         *****              *******

10 MB              53.3              25.7               60.5

100 MB            60.8              30.2               61.9

1 GB                97.5              46.5               60.2     (MB/s)



Kingston= NV2 NVMe 500GB (Axagon PCIe x4 adapter)

OWC= Mercury Electra 3G SSD 120GB (onboard Sata2 controller)

Hitachi = mechanical Sata HD 160GB (onboard Sata2 controller)

When I run at Linux side the 'whole disk' test with 'Disks' tool, it gave in average for Kingston 990 MB/s, OWC 180 MB/s and Hitachi 50 MB/s read speeds (the sample sizes were the same as buffers above),

It surprices me how well the old mechanical Hitachi performed with SCSISpeed, compared to the SSDs. In real life it is dog-slow, if used e.g. for loading Linux programs!

Re: NVMe device driver

Posted on: 2023/4/30 17:47 #163

Just can't stay away

@Rolar
The results you get are very strange, nvme should be much faster, for example check https://www.amigans.net/modules/newbb/ ... id=139127#forumpost139127
(>300 MB/s with C:Copy, which should equal >600 MB/s with something like SCSISpeed)

I don't understand why Hitachi SATA is faster than OWC SATA on AmigaOS, but it's the other way round on Linux.
The absolute results will of course always be different between AmigaOS and Linux, but the relative differences shouldn't be.

Re: NVMe device driver

Posted on: 2023/4/30 18:03 #164

Just popping in

@joergQuote:

I don't understand why Hitachi SATA is faster than OWC SATA on AmigaOS, but it's the other way round on Linux.
The absolute results will of course always be different between AmigaOS and Linux, but the relative differences shouldn't be.

Can the basic formating affect anyhow SCSISpeed? OWC and Hitachi has both RDB (I think Hitachi has been formatted with MorhOS), but Kingston has GPT (I wish I remembered the right combination ;) as it is used primarily for Linux.

Re: NVMe device driver

Posted on: 2023/4/30 18:07 #165

Just can't stay away

@Rolar
No, SCSISpeed doesn't use file systems nor partition information, it's basically just something like "cp /dev/hda0 /dev/null" on Linux.

Re: NVMe device driver

Posted on: 2023/5/1 6:35 #166

Just popping in

@geennaamQuote:

This command will make the shell print the execution speed of a shell command:

prompt "%N.%S [%E s] >"

Thanks for the tip! Very handy feature...

I tested both AOS4 and Enhancer Copy (16MB buffer, 1 GB, file, NVMe NTFS->RAM):

74,35 MB/s (AOS4)
75.93 MB/s (Enhancer)

So, the Enhancer Copy seems to be only slightly faster.

Re: NVMe device driver

Posted on: 2023/5/1 15:05 #167

Just can't stay away

@Rolar
C:Copy with 16 MB buffer is faster than SCSISpeed with 10 and 100 MB ones!?
That's even more strange than your SCSISpeed results (SATA HD faster than SATA SSD). SCSISpeed only uses reads, while C:Copy does reads + writes, i.e. has to transfer twice as much data, and with the slow CPU memory speed I wouldn't be surprised at all if RAM: is slower than a partition on a NVMe.

Edit: Or did you calculate the C:Copy speeds with 2 GB (1 GB read + 1 GB write)?

Re: NVMe device driver

Posted on: 2023/5/1 18:40 #168

Just popping in

@joergQuote:

C:Copy with 16 MB buffer is faster than SCSISpeed with 10 and 100 MB ones!? That's even more strange than your SCSISpeed results (SATA HD faster than SATA SSD). SCSISpeed only uses reads, while C:Copy does reads + writes, i.e. has to transfer twice as much data, and with the slow CPU memory speed I wouldn't be surprised at all if RAM: is slower than a partition on a NVMe.

Edit: Or did you calculate the C:Copy speeds with 2 GB (1 GB read + 1 GB write)?

No, it was only one-way copy. And I did not do a calculation error as the 'Speed' option of Enhancer-Copy gives practically the same result ;). It is also strange that I get somewhat higher speed with the default 128K buffer:


3.Linux-NTFS:> copy-en Test.mp4 ram:  speed

SPEED option - being quiet during copy...



Copied 963.61 MB in 11.411 seconds at 84.44 MB/s

Copy buffer size: 128 KB

Version: 54.5

I begin to suspect this is somehow related to the memory management of X5040... I have to wait a few weeks before I have an opportunity to test this with a X5020. But I hope geennam can soon tell what kind of results he gets with a NTFS partition.

Re: NVMe device driver

Posted on: 2023/5/1 19:38 #169

Quite a regular

@Rolar

Quote:

But I hope geennam can soon tell what kind of results he gets with a NTFS partition.

I actually just did the performance analysis

My result is ~85MB/s with NTFS. This is way slower than SFS2.
The net write speed isn't that all bad at ~500MB/s. But still only 1/3th of SFS2.
The 1024MB file is split in 20x ~50MB transfers when I use BUF=65536.
But for some reason, there's a small read (64kb each) before and after such a ~50MB write with a total command delay of ~500ms. So in total 20 * ~500ms = ~10 seconds. That's why the write actions doesn't take 2 but ~12 seconds to complete. Hence the very slow performance.

As I understand it, AmigaOS4 includes a port of NTFS-3G which has to go through an additional library called FUSE/FileSysBox. The sources are available in the download section of Hyperion (login required). So if anyone has some spare time and motivation to find out where the 100ms+400ms command delay for those two reads comes from then feel free to do so

EDIT: I suppose that Linux on your X5040 will use all 4 cores when performing that benchmark. Unlike SFS2, NTFS-3G gives a 100% CPU load. So this is also a factor to consider when comparing NTFS results.

Edited by geennaam on 2023/5/1 20:04:12

Re: NVMe device driver

Posted on: 2023/5/2 6:32 #170

Just popping in

@geennaamQuote:

I actually just did the performance analysis

My result is ~85MB/s with NTFS. This is way slower than SFS2.
The net write speed isn't that all bad at ~500MB/s. But still only 1/3th of SFS2.

Thank you for your analysis :)! Your results are anyway in line with those I got. My Kingston NV2 is not the fastest model and does not have DDR4 cache, so I think it's normal if your Samsung shows somewhat better values.

Quote:

As I understand it, AmigaOS4 includes a port of NTFS-3G which has to go through an additional library called FUSE/FileSysBox. The sources are available in the download section of Hyperion (login required). So if anyone has some spare time and motivation to find out where the 100ms+400ms command delay for those two reads comes from then feel free to do so

I wish I had the skills for that...;).

Quote:

EDIT: I suppose that Linux on your X5040 will use all 4 cores when performing that benchmark. Unlike SFS2, NTFS-3G gives a 100% CPU load. So this is also a factor to consider when comparing NTFS results.

That's true... By default Linux uses the NTFS driver in kernel, but there is also a separate NTFS-3G driver. The advantage of the latter is that it allows to use 'fstrim' on NTFS partitions, but you have to mount the partion on fstab for it to work. There is also the newer NTFS3 driver (in kernel) but that is broken in PPC kernels.

The CPU load was very low (in average < 10%/core) when I tested a NTFS partition with 'Disks' tool. It seems that the AmigaOS NTFS-3G driver indeed needs some further work...

EDIT: Have you tried the SCSITools on your X5000? So far the only reference results I have seen are from Sailor and she has a X1000. It is said to have much better memory management, so the results cannot be directly compared with X5000.

Re: NVMe device driver

Posted on: 2023/5/2 16:05 #171

Just can't stay away

@RolarQuote:

So far the only reference results I have seen are from Sailor and she has a X1000. It is said to have much better memory management, so the results cannot be directly compared with X5000.

CPU memory interface may by faster on the X1000, but the PCIe speed is only half of the X5000, which should be the limiting factor for NVMe (and any other PCIe DMA).

@geennaamQuote:

As I understand it, AmigaOS4 includes a port of NTFS-3G which has to go through an additional library called FUSE/FileSysBox. The sources are available in the download section of Hyperion (login required). So if anyone has some spare time and motivation to find out where the 100ms+400ms command delay for those two reads comes from then feel free to do so

I don't have a Hyperion account, but checked the AROS + AmigaOS 3.x/m68k sources instead.
The disk cache it's using uses check sums to make sure the memory used for the caches wasn't trashed by some other software, and it's using a very slow method for it. That can explain some slowdowns and high CPU usage, but just like in my diskcache.library caches it's skipped for "large" (>= 64 KB) transfers.
(diskcache.library used by SFS doesn't use check sums for the cache data but uses IMMU functions to write protect it's caches instead, which should be faster.)
I only checked the device I/O and disk cache system and didn't find anything there which could explain the 64 KB reads, but not the actual file systems (NTFS-3G) code. The 64 KB reads are probably something in the NTFS file system, for example any file system has to search for free space on the partition before it can append data to a file. In SFS the required data for that should nearly always be inside a block in it's "buffers", i.e. it doesn't have to (re-)read it from the disk (again).

Edited by joerg on 2023/5/2 16:29:15
Edited by joerg on 2023/5/2 16:29:42

Georg

Re: NVMe device driver

Posted on: 2023/5/3 6:48 #172

Just popping in

Looked around a bit in filesysbox sources, and it's a bit suspicious that it has a 100 ms timer (FBX_TIMER_MICROS) and in FbxHandleTimerEvent() it then may possibly call FbxFlushAll(). Susicious because 500 and 400 is multiple of 100.

sailor

Re: NVMe device driver

Posted on: 2023/5/3 7:20 #173

Quite a regular

@joergQuote:

joerg wrote:@RolarQuote:

So far the only reference results I have seen are from Sailor and she has a X1000. It is said to have much better memory management, so the results cannot be directly compared with X5000.
CPU memory interface may by faster on the X1000, but the PCIe speed is only half of the X5000, which should be the limiting factor for NVMe (and any other PCIe DMA).

It will be great to define unified benchmark platform for this tests. For example: scsispeed + xyz buffers, copy (aos/enhancer) + xyz buffs or whatever

Can anybody ( who understand the internals) define it please?

AmigaOS3: Amiga 1200
AmigaOS4: Micro A1-C, AmigaOne XE, Pegasos II, Sam440ep, Sam440ep-flex, AmigaOne X1000
MorphOS: Efika 5200b, Pegasos I, Pegasos II, Powerbook, Mac Mini, iMac, Powermac Quad

Re: NVMe device driver

Posted on: 2023/5/3 16:19 #174

Just can't stay away

@sailor
Quote:

It will be great to define unified benchmark platform for this tests.

Depends on what you want to test...
- Only nvme.device driver (and PCIe hardware) speed: SCSISpeed
- FileSystem speed, only really usable to compare different file systems on the same hardware and driver: DiskSpeed
- Speed of the different C:Copy implementations, speed of RAM-Handler (more or less a speed test of the different IExec->CopyMem[Quick]() implementations for the different CPUs), speed of the file system (incl. the speed of a disk cache the file system may be using) + speed of the nvme driver and hardware combined: C:Copy
i.e. using C:Copy is the most useless test for comparing only one of the 4-5 involved parts, for example to compare SATA with NVMe if everything else (hardware, file system, RAM-Handler and C:Copy versions and number of C:Copy BUFFERS) is the same, but OTOH the DiskSpeed and C:Copy tests are more close to what you can get in real life usage than a test for a specific part only like SCSISpeed.

The default buffer/transfer sizes of C:Copy and especially SCSISpeed and DiskSpeed (only 512 bytes to 256 KB) are way too small, using 16 MB for C:Copy is OK, for SCSISpeed and DiskSpeed you have to use 4 buffer sizes (I don't remember if you can use less, for example only 2 by using something like BUF3=0 BUF4=0), for that I'd suggest to use sizes between 128 KB and 128 MB, for example


SCSISpeed/DiskSpeed ... BUF1=131072 BUF2=1048576 BUF3=16777216 BUF4=134217728

.
Although Rolar got different results, except for the HD, using more than 16 MB as transfer/buffer size should't make any noticeable difference with a tool like SCSISpeed, no matter on which hardware and device driver (SCSI, PATA, SATA, NVMe, ...).

Re: NVMe device driver

Posted on: 2023/5/3 16:36 #175

Just can't stay away

@Georg
That could explain additional writes, for example updating the pointers to the 50 MB of data blocks appended to the file on each transfer and making sure they are on the disk before the 50 MB file data writes are done (CMD_UPDATE), but geennaam is getting 64 KB reads before and after each 50 MB write.

Re: NVMe device driver

Posted on: 2023/5/3 18:07 #176

Quite a regular

@joerg

Something looks seriously broken with scsispeed.

I've used the following command:


scsispeed DRIVE=nvme.device:0 FAST BUF1=16777216 BUF2=16777216 BUF3=16777216 BUF4=16777216

(Yes, I know, 4 times 16MB. But it was for diagnostics purposes)

I've tested the same command on my latest driver both with and without debug printouts to the debug shell.

The debug driver prints a string with the following information for each read action:
- The timing of the IO command
- The timing of the NVME read handler
- Time between issued IO commands
- Lenght of read size in bytes

So that's a debug terminal full of data

Then scsispeed reports 220MB/s for all four tests. This low speed is understandable due to the additional time it takes for the debug strings.

Debug printout show that the readspeed is actually happening at close to maximum PCIe speed. This was expected given the small overhead at these ind of transfers. Also the time between commands about 8 us.

But when I run the same benchmark while the debug printouts are disabled then the reported speed drops to just 50MB/s

So something is clearly off here.
A quick scan through the scsispeed source shows that there's some timer trickery happening. According to the source because of a low timer resolution. But a microseconds resolution is sufficient to time these kind of transfers. So apparently this trickery is not working properly or some overflow occurs.

Anyways, the benchmark is just a low level timed CMD_READ loop. I will create one myself without the trickery and up to date Exec calls against the latest SDK.

A simple ITimer->GetSysTime() before and after the DoIO() and then a ITimer->SubTime() should do the trick.

Georg

Re: NVMe device driver

Posted on: 2023/5/3 18:29 #177

Just popping in

@geennaamQuote:

geennaam wrote:
But a microseconds resolution is sufficient to time these kind of transfers.

If it uses UNIT_VBLANK it will not have that resolution ...

Re: NVMe device driver

Posted on: 2023/5/4 5:33 #178

Just can't stay away

@geennaam
Quote:

A simple ITimer->GetSysTime() before and after the DoIO() and then a ITimer->SubTime() should do the trick.

Maybe no longer the case, but IIRC ITimer->GetUpTime() did have a higher resolution, the same as ITimer->ReadEClock() but without having to convert the results to microseconds, than ITimer->GetSysTime().