在使用sysbench进行大量基准测试之后,我得出以下结论:
为了生存(在性能方面)
- 邪恶的复制过程会淹没脏页
- 并且存在硬件写缓存(可能也没有)
- 每秒同步读取或写入(IOPS)至关重要
只需转储所有电梯,队列和脏页缓存。脏页的正确位置在该硬件写缓存的RAM中。
尽可能降低dirty_ratio(或新的dirty_bytes),但要注意顺序吞吐量。在我的特定情况下,最佳值为15 MB(echo 15000000 > dirty_bytes)。
这不是解决方案,而是解决方案,因为GB的RAM现在仅用于读取缓存,而不是脏缓存。为了使脏缓存在这种情况下正常工作,Linux内核后台刷新程序需要平均底层设备接受请求的速度,并相应地调整后台刷新。不容易。
规格和基准进行比较:
在dd将磁盘置零时进行了测试,sysbench 取得了巨大的成功,将16 kB的10个线程的fsync写操作从33 IOPS提升到700 IOPS(空闲限制:1500 IOPS),将单线程从8 IOPS提升到了400 IOPS。
在没有负载的情况下,IOPS不受影响(〜1500),吞吐量略有降低(从251 MB / s降至216 MB / s)。
dd 呼叫:
dd if=/dev/zero of=dumpfile bs=1024 count=20485672
对于sysbench,准备将test_file.0稀疏化为:
dd if=/dev/zero of=test_file.0 bs=1024 count=10485672
sysbench调用10个线程:
sysbench --test=fileio --file-num=1 --num-threads=10 --file-total-size=10G --file-fsync-all=on --file-test-mode=rndwr --max-time=30 --file-block-size=16384 --max-requests=0 run
sysbench调用一个线程:
sysbench --test=fileio --file-num=1 --num-threads=1 --file-total-size=10G --file-fsync-all=on --file-test-mode=rndwr --max-time=30 --file-block-size=16384 --max-requests=0 run
较小的块显示更大的数字。
--file-block-size = 4096,具有1 GB脏字节:
sysbench 0.4.12:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Extra file open flags: 0
1 files, 10Gb each
10Gb total file size
Block size 4Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() after each write operation.
Using synchronous I/O mode
Doing random write test
Threads started!
Time limit exceeded, exiting...
Done.
Operations performed:  0 Read, 30 Write, 30 Other = 60 Total
Read 0b  Written 120Kb  Total transferred 120Kb  (3.939Kb/sec)
      0.98 Requests/sec executed
Test execution summary:
      total time:                          30.4642s
      total number of events:              30
      total time taken by event execution: 30.4639
      per-request statistics:
           min:                                 94.36ms
           avg:                               1015.46ms
           max:                               1591.95ms
           approx.  95 percentile:            1591.30ms
Threads fairness:
      events (avg/stddev):           30.0000/0.00
      execution time (avg/stddev):   30.4639/0.00
--file-block-size = 4096,具有15 MB的dirty_bytes:
sysbench 0.4.12:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Extra file open flags: 0
1 files, 10Gb each
10Gb total file size
Block size 4Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() after each write operation.
Using synchronous I/O mode
Doing random write test
Threads started!
Time limit exceeded, exiting...
Done.
Operations performed:  0 Read, 13524 Write, 13524 Other = 27048 Total
Read 0b  Written 52.828Mb  Total transferred 52.828Mb  (1.7608Mb/sec)
    450.75 Requests/sec executed
Test execution summary:
      total time:                          30.0032s
      total number of events:              13524
      total time taken by event execution: 29.9921
      per-request statistics:
           min:                                  0.10ms
           avg:                                  2.22ms
           max:                                145.75ms
           approx.  95 percentile:              12.35ms
Threads fairness:
      events (avg/stddev):           13524.0000/0.00
      execution time (avg/stddev):   29.9921/0.00
--file-block-size = 4096,在空闲系统上具有15 MB的dirty_bytes:
sysbench 0.4.12:多线程系统评估基准
Running the test with following options:
Number of threads: 1
Extra file open flags: 0
1 files, 10Gb each
10Gb total file size
Block size 4Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Calling fsync() after each write operation.
Using synchronous I/O mode
Doing random write test
Threads started!
Time limit exceeded, exiting...
Done.
Operations performed:  0 Read, 43801 Write, 43801 Other = 87602 Total
Read 0b  Written 171.1Mb  Total transferred 171.1Mb  (5.7032Mb/sec)
 1460.02 Requests/sec executed
Test execution summary:
      total time:                          30.0004s
      total number of events:              43801
      total time taken by event execution: 29.9662
      per-request statistics:
           min:                                  0.10ms
           avg:                                  0.68ms
           max:                                275.50ms
           approx.  95 percentile:               3.28ms
Threads fairness:
      events (avg/stddev):           43801.0000/0.00
      execution time (avg/stddev):   29.9662/0.00
测试系统:
- Adaptec 5405Z(带保护的512 MB写缓存)
- 英特尔至强L5520
- 6 GiB RAM @ 1066 MHz
- 主板Supermicro X8DTN(5520芯片组)
- 12个Seagate Barracuda 1 TB磁盘
- 内核2.6.32
- 文件系统XFS
- Debian不稳定
总而言之,我现在确定此配置在数据库流量的空闲,高负载甚至满负载的情况下都能很好地执行,否则本来会被连续的流量饿死。顺序吞吐量要比两个千兆链路所能传递的要高,因此毫无疑问地降低了吞吐量。