Storage Benchmarking
This document covers persistent storage benchmarking tools and guides for both filesystems and physical drives.
dd
Disk/Data Duplicator (dd) is a tool for copying raw data from one source to another and measuring performance.
Example: Copy 50GiB of zeroes, written in 4KiB blocks, to networked Lustre filesystem /mnt/lustre/testfile
:
[root@mawenzi-06 fio]# dd if=/dev/zero of=/mnt/lustre/testfile bs=4k iflag=fullblock,count_bytes count=50G
13107200+0 records in
13107200+0 records out
53687091200 bytes (54 GB, 50 GiB) copied, 48.2738 s, 1.1 GB/s
Depending on your disk and filesystem speed, this may take a minute or so to complete. |
Example: Run the same benchmark as before, this time sending the output to the local ext4 filesystem under /root/
:
[root@mawenzi-06 fio]# dd if=/dev/zero of=$HOME/testfile bs=4k iflag=fullblock,count_bytes count=50G
13107200+0 records in
13107200+0 records out
53687091200 bytes (54 GB, 50 GiB) copied, 36.7742 s, 1.5 GB/s
See how the speed increased since we’re just
fio
Flexible I/O (fio
) is a benchmarking tool to simulate a given I/O workload, and measure metrics on the performance.
fio
Installation
Clone the Git repo:
git clone https://github.com/axboe/fio.git
then, build the source and install:
./configure
make
make install
Verify the installation:
[root@mawenzi-06 fio]# fio --version
fio-3.35-116-gb311
fio
Usage
In our first example we’ll test the read speed of our 8GiB testfile
, with the following options:
-
--name=benchmark1
: Call this benchmarkbenchmark1
-
--rw=read
: We’re doing a read benchmark, not write -
--size=8g
: Read 8GiB of the file. -
--blocksize=4096
: Read the file in 4096-byte blocks -
--direct=1
: Use non-buffered I/O, usuallyO_DIRECT
. Instead of reading cached data from RAM, this forces direct disk access. -
--numjobs=16
: Spawn 16 independent threads or processes.
fio --name=benchmark1 --filename=$HOME/benchmarks/testfile --rw=read --size=8g --blocksize=4096 --direct=1 --numjobs=16
Example:
[root@mawenzi-06 benchmarks]# fio --name benchmark1 --filename=$HOME/benchmarks/testfile --rw=read --size=8g --blocksize=4096 --direct=1 --numjobs=16
benchmark1: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.35-116-gb311
Starting 16 processes
Jobs: 12 (f=12): [_(2),R(5),_(1),R(4),_(1),R(3)][100.0%][r=600MiB/s][r=154k IOPS][eta 00m:00s]
benchmark1: (groupid=0, jobs=1): err= 0: pid=86161: Mon Aug 21 15:55:15 2023
read: IOPS=12.7k, BW=49.6MiB/s (52.0MB/s)(8192MiB/165226msec)
clat (usec): min=31, max=536, avg=78.31, stdev=18.74
lat (usec): min=31, max=536, avg=78.37, stdev=18.74
clat percentiles (usec):
| 1.00th=[ 64], 5.00th=[ 65], 10.00th=[ 65], 20.00th=[ 67],
| 30.00th=[ 67], 40.00th=[ 68], 50.00th=[ 71], 60.00th=[ 81],
| 70.00th=[ 82], 80.00th=[ 84], 90.00th=[ 101], 95.00th=[ 116],
| 99.00th=[ 153], 99.50th=[ 167], 99.90th=[ 208], 99.95th=[ 229],
| 99.99th=[ 265]
bw ( KiB/s): min=49760, max=52976, per=6.33%, avg=50800.87, stdev=471.92, samples=330
iops : min=12440, max=13244, avg=12700.21, stdev=117.97, samples=330
lat (usec) : 50=0.01%, 100=89.61%, 250=10.37%, 500=0.02%, 750=0.01%
cpu : usr=0.92%, sys=4.80%, ctx=2097155, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=2097152,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
...
...
...
...
...
...
benchmark1: (groupid=0, jobs=1): err= 0: pid=86162: Mon Aug 21 15:55:15 2023
read: IOPS=12.7k, BW=49.5MiB/s (51.9MB/s)(8192MiB/165644msec)
clat (usec): min=29, max=509, avg=78.52, stdev=18.82
lat (usec): min=29, max=509, avg=78.58, stdev=18.82
clat percentiles (usec):
| 1.00th=[ 64], 5.00th=[ 65], 10.00th=[ 66], 20.00th=[ 67],
| 30.00th=[ 68], 40.00th=[ 68], 50.00th=[ 71], 60.00th=[ 81],
| 70.00th=[ 82], 80.00th=[ 84], 90.00th=[ 101], 95.00th=[ 116],
| 99.00th=[ 155], 99.50th=[ 169], 99.90th=[ 210], 99.95th=[ 229],
| 99.99th=[ 262]
bw ( KiB/s): min=49400, max=52760, per=6.31%, avg=50673.21, stdev=515.47, samples=331
iops : min=12350, max=13190, avg=12668.30, stdev=128.87, samples=331
lat (usec) : 50=0.01%, 100=89.48%, 250=10.51%, 500=0.01%, 750=0.01%
cpu : usr=0.90%, sys=4.67%, ctx=2097156, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=2097152,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=784MiB/s (822MB/s), 49.0MiB/s-49.6MiB/s (51.4MB/s-52.0MB/s), io=128GiB (137GB), run=165226-167263msec
Disk stats (read/write):
nvme0n1: ios=33540245/3, sectors=268321960/19, merge=0/0, ticks=2523457/1, in_queue=2523457, util=100.00%
Here we can see that we go, on average, 822MB/s, roughly ~50MB/s per process.