我直接在两个不同的PCIe适配器上连接了两个PowerEdge 6950分频器(使用直线)。
我在每条线路上都有一个千兆链路(1000 MBit,全双工,双向流量控制)。
现在,我正在尝试使用双方的rr算法将这些接口绑定为bond0(我想为单个IP会话获取2000 MBit)。
当我通过使用dd bs = 1M和netcat在tcp模式下将/ dev / zero传输到/ dev / null来测试吞吐量时,我得到的吞吐量为70 MB / s,而不是预期的150MB / s以上。
当我使用单条线时,如果每条线使用不同的方向,则每条线的速度约为98 MB / s。如果流量进入“相同”方向,则使用单条线路时,线路速度分别为70 MB / s和90 MB / s。
仔细阅读bonding-readme(/usr/src/linux/Documentation/networking/bonding.txt)之后,我发现以下部分很有用:(13.1.1单交换机拓扑的MT Bonding模式选择)
balance-rr:此模式是唯一允许单个TCP / IP连接在多个接口上对流量进行条带化的模式。因此,它是唯一一种允许单个TCP / IP流利用一个接口的吞吐量以上的模式。但是,这样做需要付出一定的代价:条带化通常会导致对等系统接收到乱序的数据包,从而经常通过重新传输段来使TCP / IP的拥塞控制系统进入。
It is possible to adjust TCP/IP's congestion limits by altering the net.ipv4.tcp_reordering sysctl parameter. The usual default value is 3, and the maximum useful value is 127. For a four interface balance-rr bond, expect that a single TCP/IP stream will utilize no more than approximately 2.3 interface's worth of throughput, even after adjusting tcp_reordering. Note that this out of order delivery occurs when both the sending and receiving systems are utilizing a multiple interface bond. Consider a configuration in which a balance-rr bond feeds into a single higher capacity network channel (e.g., multiple 100Mb/sec ethernets feeding a single gigabit ethernet via an etherchannel capable switch). In this configuration, traffic sent from the multiple 100Mb devices to a destination connected to the gigabit device will not see packets out of order. However, traffic sent from the gigabit device to the multiple 100Mb devices may or may not see traffic out of order, depending upon the balance policy of the switch. Many switches do not support any modes that stripe traffic (instead choosing a port based upon IP or MAC level addresses); for those devices, traffic flowing from the gigabit device to the many 100Mb devices will only utilize one interface.
现在,我将所有线路(4)上两台连接的服务器上的参数从3更改为127。
再次绑定后,我得到约100 MB / s的速度,但仍不超过该速度。
有什么想法吗?
更新:来自的硬件详细信息lspci -v
:
24:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet Controller (rev 06)
Subsystem: Intel Corporation PRO/1000 PT Dual Port Server Adapter
Flags: bus master, fast devsel, latency 0, IRQ 24
Memory at dfe80000 (32-bit, non-prefetchable) [size=128K]
Memory at dfea0000 (32-bit, non-prefetchable) [size=128K]
I/O ports at dcc0 [size=32]
Capabilities: [c8] Power Management version 2
Capabilities: [d0] MSI: Mask- 64bit+ Count=1/1 Enable-
Capabilities: [e0] Express Endpoint, MSI 00
Kernel driver in use: e1000
Kernel modules: e1000
更新最终结果:
已复制8589934592字节(8.6 GB),35.8489秒,240 MB / s
我更改了很多tcp / ip和低级驱动程序选项。这包括网络缓冲区的扩大。这就是为什么dd
现在显示大于200 MB / s的数字的原因:dd终止,同时仍有待传输的输出(在发送缓冲区中)。
更新2011-08-05:为达到目标而更改的设置(/etc/sysctl.conf):
# See http://www-didc.lbl.gov/TCP-tuning/linux.html
# raise TCP max buffer size to 16 MB. default: 131071
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
# raise autotuninmg TCP buffer limits
# min, default and max number of bytes to use
# Defaults:
#net.ipv4.tcp_rmem = 4096 87380 174760
#net.ipv4.tcp_wmem = 4096 16384 131072
# Tuning:
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# Default: Backlog 300
net.core.netdev_max_backlog = 2500
#
# Oracle-DB settings:
fs.file-max = 6815744
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500
kernel.shmmax = 2147659776
kernel.sem = 1250 256000 100 1024
net.core.rmem_default = 262144
net.core.wmem_default = 262144
#
# Tuning for network-bonding according to bonding.txt:
net.ipv4.tcp_reordering=127
绑定设备的特殊设置(SLES:/ etc / sysconfig / network / ifcfg-bond0):
MTU='9216'
LINK_OPTIONS='txqueuelen 10000'
请注意,设置最大可能的MTU是解决方案的关键。
调整所涉及网卡的rx / tx缓冲区:
/usr/sbin/ethtool -G eth2 rx 2048 tx 2048
/usr/sbin/ethtool -G eth4 rx 2048 tx 2048
nuttcp
。轻松测试单个连接或多个连接。
/proc/net/bonding/bond0
以确保自己确实进入了balance-rr状态?您是否看到注释n您粘贴的有关4接口绑定的文档仅给您带来2.3接口价值的吞吐量?有鉴于此,您似乎不太可能接近所需的2000mb / s。