昨天有人在微信上问了我一个问题:怎么看随机写多还是顺序写多?
这是一个典型的IO分析问题。
在性能分析的过程中,有很多人会去看每秒写多少,可是写多写少,有什么问题呢?这是我们要关注的内容。
对于磁盘能力的判断其实也是在这里,在这个磁盘上你看到随机写5M就已经有达到能力上限了,而在另一个磁盘上可能随机写50M都是正常的。
要想分析磁盘的写能力,可以知道不同的磁盘对应用性能的影响,特别是对IO密集型的应用来说更是要关注这一点。
针对这一问题,我做个示例来解释一下。
首先装上fio,fio是一个测试io的工具。
[root@7DGroup ~]# yum install fio
[root@7DGroup ~]# fdisk -l
Disk /dev/vda: 53.7 GB, 53687091200 bytes, 104857600 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000d64b4
Device Boot Start End Blocks Id System
/dev/vda1 * 2048 104857599 52427776 83 Linux
[root@7DGroup ~]#
Sector Size即扇区大小。上面的512 bytes即是扇区大小。
查看block大小
[root@7DGroup ~]# tune2fs -l /dev/vda1 | grep "Block size"
Block size: 4096
[root@7DGroup ~]#
一个block是由8个sectors组成。所以是512 x 8 = 4096 bytes。
[root@7DGroup ~]# fio -filename=/home/~/testfile -iodepth=64 -ioengine=libaio -direct=1 -rw=randwrite -bs=4k -size=2G -numjobs=64 -runtime=20 -group_reporting -name=test-rand-write
test-rand-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
...
fio-3.7
Starting 64 processes
test-rand-write: Laying out IO file (1 file / 2048MiB)
Jobs: 64 (f=64): [w(64)][15.0%][r=0KiB/s,w=6370KiB/s][r=0,w=1Jobs: 64 (f=64): [w(64)][20.0%][r=0KiB/s,w=8216KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][25.0%][r=0KiB/s,w=8608KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][30.0%][r=0KiB/s,w=8672KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][35.0%][r=0KiB/s,w=8696KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][40.0%][r=0KiB/s,w=8716KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][45.0%][r=0KiB/s,w=8672KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][50.0%][r=0KiB/s,w=8848KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][55.0%][r=0KiB/s,w=8756KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][60.0%][r=0KiB/s,w=8812KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][65.0%][r=0KiB/s,w=8812KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][70.0%][r=0KiB/s,w=8848KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][75.0%][r=0KiB/s,w=8836KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][80.0%][r=0KiB/s,w=8588KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][85.0%][r=0KiB/s,w=8724KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][90.0%][r=0KiB/s,w=9076KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][95.0%][r=0KiB/s,w=8756KiB/s][r=0,w=2Jobs: 64 (f=64): [w(64)][100.0%][r=0KiB/s,w=8688KiB/s][r=0,w=2172 IOPS][eta 00m:00s]
test-rand-write: (groupid=0, jobs=64): err= 0: pid=23642: Wed Mar 11 09:04:54 2020
write: IOPS=2197, BW=8788KiB/s (8999kB/s)(173MiB/20176msec)
slat (usec): min=3, max=1005.2k, avg=28669.67, stdev=143372.07
clat (usec): min=1591, max=6497.8k, avg=1733090.60, stdev=1067661.27
lat (usec): min=1609, max=6999.5k, avg=1761761.26, stdev=1079655.35
clat percentiles (msec):
| 1.00th=[ 93], 5.00th=[ 600], 10.00th=[ 701], 20.00th=[ 802],
| 30.00th=[ 894], 40.00th=[ 1401], 50.00th=[ 1502], 60.00th=[ 1703],
| 70.00th=[ 2198], 80.00th=[ 2500], 90.00th=[ 3205], 95.00th=[ 3809],
| 99.00th=[ 4799], 99.50th=[ 5269], 99.90th=[ 6007], 99.95th=[ 6208],
| 99.99th=[ 6477]
bw ( KiB/s): min= 7, max= 816, per=2.26%, avg=198.23, stdev=185.14, samples=1623
iops : min= 1, max= 204, avg=49.51, stdev=46.30, samples=1623
lat (msec) : 2=0.02%, 4=0.16%, 10=0.50%, 20=0.09%, 50=0.02%
lat (msec) : 100=1.20%, 250=0.88%, 500=1.21%, 750=9.01%, 1000=23.58%
cpu : usr=0.01%, sys=0.06%, ctx=8199, majf=0, minf=1967
IO depths : 1=0.1%, 2=0.3%, 4=0.6%, 8=1.2%, 16=2.3%, 32=4.6%, >=64=90.9%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.2%, >=64=0.0%
issued rwts: total=0,44328,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=8788KiB/s (8999kB/s), 8788KiB/s-8788KiB/s (8999kB/s-8999kB/s), io=173MiB (182MB), run=20176-20176msec
Disk stats (read/write):
vda: ios=192/44315, merge=0/1026, ticks=2885/5039515, in_queue=5086634, util=100.00%
[root@7DGroup ~]#
[~@7DGroup ~]$ iostat -x -d 1
Linux 3.10.0-862.el7.x86_64 (7DGroup.testing-studio.com) 03/11/2020 _x86_64_ (2 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 64.00 24.00 1905.00 96.00 7876.00 8.27 199.60 91.33 0.79 92.47 0.43 83.00
vda 0.00 1.00 24.00 2224.00 396.00 8948.00 8.31 253.80 116.21 21.42 117.23 0.44 100.00
vda 0.00 3.00 5.00 2161.00 52.00 8656.00 8.04 254.43 115.53 93.40 115.58 0.46 100.00
vda 0.00 1.00 0.00 2217.00 0.00 8872.00 8.00 254.13 115.41 0.00 115.41 0.45 100.00
vda 0.00 1.00 0.00 2180.00 0.00 8724.00 8.00 284.31 116.22 0.00 116.22 0.51 112.10
vda 0.00 206.00 3.00 2176.00 24.00 9528.00 8.77 231.58 106.90 6.33 107.04 0.46 100.00
vda 0.00 34.00 0.00 2247.00 0.00 9128.00 8.12 250.35 111.31 0.00 111.31 0.45 100.00
vda 0.00 1.00 35.00 2177.00 1352.00 8712.00 9.10 254.86 114.53 17.29 116.10 0.45 100.00
vda 0.00 0.00 7.00 2171.00 80.00 8684.00 8.05 254.91 117.12 59.57 117.31 0.46 100.00
vda 0.00 2.00 0.00 2204.00 0.00 8824.00 8.01 253.74 115.22 0.00 115.22 0.45 100.00
vda 0.00 303.00 0.00 2198.00 0.00 10016.00 9.11 241.84 108.99 0.00 108.99 0.45 100.00
vda 0.00 1.00 0.00 2199.00 0.00 8796.00 8.00 254.11 116.99 0.00 116.99 0.45 100.00
vda 0.00 2.00 0.00 2199.00 0.00 8808.00 8.01 255.14 116.00 0.00 116.00 0.45 100.00
vda 0.00 0.00 1.00 2213.00 4.00 8852.00 8.00 253.19 116.23 9.00 116.28 0.45 100.00
vda 0.00 0.00 116.00 2211.00 2296.00 8844.00 9.57 261.34 108.23 5.62 113.61 0.44 102.90
vda 0.00 404.00 1.00 2146.00 472.00 10200.00 9.94 245.29 113.20 202.00 113.16 0.47 99.90
vda 0.00 1.00 0.00 2178.00 0.00 8716.00 8.00 254.43 115.69 0.00 115.69 0.46 100.00
vda 0.00 0.00 0.00 2270.00 0.00 9080.00 8.00 254.27 114.14 0.00 114.14 0.44 100.00
vda 0.00 1.00 0.00 2188.00 0.00 8756.00 8.00 254.68 114.86 0.00 114.86 0.46 100.00
vda 0.00 0.00 0.00 2169.00 0.00 8676.00 8.00 254.06 115.80 0.00 115.80 0.46 100.00
vda 0.00 489.00 29.00 779.00 820.00 5072.00 14.58 74.50 118.95 10.00 123.01 0.51 41.60
随机写结果分析
以上结果中,拿出一条来说明一下。
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 116.00 2211.00 2296.00 8844.00 9.57 261.34 108.23 5.62 113.61 0.44 102.90
在这一条中wkB/s为8656,根据前面查到的信息,计算下写次数如下:
(8844*1024)/4096 = 2211(次)
也即是w/s的值。也就是说上面的每一次写都是随机写,每次写都是4K。没有连续写过一个block。
将上面的结果整理成图看一下。
[root@7DGroup ~]# fio -filename=/home/~/test -iodepth=64 -ioengine=libaio -direct=1 -rw=write -bs=4k -size=2g -numjobs=64 -runtime=20 -group_reporting -name=test-seq-write
test-seq-write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
...
fio-3.7
Starting 64 processes
Jobs: 64 (f=64): [W(64)][20.0%][r=0KiB/s,w=173MiB/s][r=0,w=44Jobs: 64 (f=64): [W(64)][25.0%][r=0KiB/s,w=178MiB/s][r=0,w=45Jobs: 64 (f=64): [W(64)][33.3%][r=0KiB/s,w=160MiB/s][r=0,w=40Jobs: 64 (f=64): [W(64)][35.0%][r=0KiB/s,w=163MiB/s][r=0,w=41Jobs: 64 (f=64): [W(64)][40.0%][r=0KiB/s,w=174MiB/s][r=0,w=44Jobs: 64 (f=64): [W(64)][45.0%][r=0KiB/s,w=195MiB/s][r=0,w=50Jobs: 64 (f=64): [W(64)][50.0%][r=0KiB/s,w=190MiB/s][r=0,w=48Jobs: 64 (f=64): [W(64)][55.0%][r=0KiB/s,w=158MiB/s][r=0,w=40Jobs: 64 (f=64): [W(64)][60.0%][r=0KiB/s,w=140MiB/s][r=0,w=35Jobs: 64 (f=64): [W(64)][65.0%][r=0KiB/s,w=140MiB/s][r=0,w=35Jobs: 64 (f=64): [W(64)][70.0%][r=0KiB/s,w=144MiB/s][r=0,w=36Jobs: 64 (f=64): [W(64)][75.0%][r=0KiB/s,w=126MiB/s][r=0,w=32Jobs: 64 (f=64): [W(64)][80.0%][r=0KiB/s,w=132MiB/s][r=0,w=33Jobs: 64 (f=64): [W(64)][85.7%][r=0KiB/s,w=127MiB/s][r=0,w=32Jobs: 64 (f=64): [W(64)][90.0%][r=0KiB/s,w=131MiB/s][r=0,w=33Jobs: 64 (f=64): [W(64)][95.0%][r=0KiB/s,w=156MiB/s][r=0,w=39.9k IOPSJobs: 64 (f=64): [W(64)][100.0%][r=0KiB/s,w=148MiB/s][r=0,w=37.9k IOPS][eta 00m:00s]
test-seq-write: (groupid=0, jobs=64): err= 0: pid=29351: Wed Mar 11 09:32:45 2020
write: IOPS=43.0k, BW=168MiB/s (176MB/s)(3379MiB/20096msec)
slat (nsec): min=387, max=105121k, avg=57281.16, stdev=1018413.58
clat (usec): min=432, max=305566, avg=94871.28, stdev=50558.84
lat (usec): min=1011, max=310779, avg=94929.52, stdev=50583.10
clat percentiles (msec):
| 1.00th=[ 8], 5.00th=[ 14], 10.00th=[ 20], 20.00th=[ 53],
| 30.00th=[ 88], 40.00th=[ 93], 50.00th=[ 96], 60.00th=[ 101],
| 70.00th=[ 104], 80.00th=[ 110], 90.00th=[ 184], 95.00th=[ 192],
| 99.00th=[ 207], 99.50th=[ 215], 99.90th=[ 292], 99.95th=[ 296],
| 99.99th=[ 300]
bw ( KiB/s): min= 1021, max= 6480, per=1.56%, avg=2689.53, stdev=806.09, samples=2525
iops : min= 255, max= 1620, avg=672.25, stdev=201.55, samples=2525
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.15%, 10=2.06%, 20=8.43%, 50=8.96%
lat (msec) : 100=41.57%, 250=38.45%, 500=0.36%
cpu : usr=0.14%, sys=0.50%, ctx=46445, majf=0, minf=1963
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.5%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,864915,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=168MiB/s (176MB/s), 168MiB/s-168MiB/s (176MB/s-176MB/s), io=3379MiB (3543MB), run=20096-20096msec
Disk stats (read/write):
vda: ios=0/44205, merge=0/802076, ticks=0/3035521, in_queue=3052433, util=99.39%
[root@7DGroup ~]#
顺序写监控结果
[root@7DGroup ~]# iostat -x -d 1
Linux 3.10.0-862.el7.x86_64 (7DGroup.testing-studio.com) 03/11/2020 _x86_64_ (2 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 37312.00 0.00 2101.00 0.00 159368.00 151.71 157.45 74.39 0.00 74.39 0.48 100.10
vda 0.00 39715.00 0.00 2530.00 0.00 170552.00 134.82 154.20 61.61 0.00 61.61 0.40 100.00
vda 0.00 46224.00 0.00 2411.00 0.00 196772.00 163.23 162.48 66.83 0.00 66.83 0.41 100.00
vda 0.00 46567.00 0.00 2523.00 0.00 198820.00 157.61 163.02 64.85 0.00 64.85 0.40 100.00
vda 0.00 40533.00 0.00 2070.00 0.00 172260.00 166.43 151.07 72.95 0.00 72.95 0.48 100.00
vda 0.00 36154.00 0.00 1996.00 0.00 159424.00 159.74 153.97 77.90 0.00 77.90 0.50 100.10
vda 0.00 32457.00 0.00 2017.00 0.00 135488.00 134.35 153.49 74.38 0.00 74.38 0.50 100.00
vda 0.00 34367.33 0.00 1804.95 0.00 146257.43 162.06 155.13 86.99 0.00 86.99 0.55 99.11
vda 0.00 32396.00 0.00 1876.00 0.00 137956.00 147.07 154.61 81.67 0.00 81.67 0.53 100.00
vda 0.00 31925.00 0.00 1744.00 0.00 136140.00 156.12 155.76 88.44 0.00 88.44 0.57 100.10
vda 0.00 30974.00 0.00 1614.00 0.00 132836.00 164.60 153.09 97.19 0.00 97.19 0.62 100.20
vda 0.00 30620.00 0.00 1917.00 0.00 130372.00 136.02 151.35 77.84 0.00 77.84 0.52 100.00
vda 0.00 33711.00 0.00 2211.00 0.00 145920.00 131.99 152.76 69.49 0.00 69.49 0.45 100.00
vda 0.00 34815.00 0.00 2161.00 0.00 160936.00 148.95 142.48 70.59 0.00 70.59 0.46 100.00
vda 0.00 25413.00 0.00 1640.00 0.00 110504.00 134.76 104.69 63.86 0.00 63.86 0.43 70.20
顺序写结果分析
以上结果中,拿一条出来说明一下:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 39715.00 0.00 2530.00 0.00 170552.00 134.82 154.20 61.61 0.00 61.61 0.40 100.00
在这一条中,wkB/s是170552,计算一下如果全是随机写要写多少次。
(170552*1024)/4096 = 42,638(次)
但实际上,你可以看到w/s是2530次。也就是说,在顺序写的过程中,每次写:
170552/2530 = 67.4118577075 kB
即是16.8529644269个block。
结果图如下:
对比下来,你就知道,为什么我们在分析IO的时候,当应用引起的随机写高的话,会严重影响我们的性能了。
在我的这个系统的示例中就可以看出来,在IO使用率比较高的情况下,如果依旧是每次写4K左右,那就是随机写多;如果是每次写67K左右,那就是顺序写多。
在你的系统中,也可以先做这样的测试,再比对应用执行时的随机写多还是顺序写多。
PS:
再PS:
再再PS: