e1000e意外重置适配器/检测到硬件单元挂起

36

我有一台戴尔Intel®Xeon®CPU L5420 @ 2.50GHz的戴尔1U服务器，在8个内核上运行x86_64上通用的Ubuntu Server Kernel版本3.13.0-32。它具有两个1000baseT网卡。我已经设置它可以将数据包从eth0转发到eth1。

我注意到在我的kern.log文件中，它一直挂着然后停下来。这经常发生。这种情况每隔几秒钟发生一次，然后可能需要几分钟，然后再返回几秒钟。

这是日志文件转储：

 [118943.768245] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
 [118943.768245]   TDH                  <45>
 [118943.768245]   TDT                  <50>
 [118943.768245]   next_to_use          <50>
 [118943.768245]   next_to_clean        <43>
 [118943.768245] buffer_info[next_to_clean]:
 [118943.768245]   time_stamp           <101c48d04>
 [118943.768245]   next_to_watch        <45>
 [118943.768245]   jiffies              <101c4970f>
 [118943.768245]   next_to_watch.status <0>
 [118943.768245] MAC Status             <80283>
 [118943.768245] PHY Status             <792d>
 [118943.768245] PHY 1000BASE-T Status  <7800>
 [118943.768245] PHY Extended Status    <3000>
 [118943.768245] PCI Status             <10>
 [118944.780015] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly

这是ethtool的信息：

设定：

Settings for eth0:

Supported ports: [ TP ]
Supported link modes:   10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes:  10baseT/Half 10baseT/Full 
                        100baseT/Half 100baseT/Full 
                        1000baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: off (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
               drv probe link
Link detected: yes

驾驶员信息：

ethtool -i eth0

driver: e1000e
version: 2.3.2-k
firmware-version: 1.4-0
bus-info: 0000:00:19.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

是什么原因造成的？这仅仅是软件中的错误还是实际的硬件问题？我已经看到许多其他类似问题，但没有真正的解决方案，这也使我相信这是软件问题？

也许有人可以帮我一下吗？

— 凯尔·库克斯
source

似乎是已知的问题：bugzilla.kernel.org/show_bug.cgi?

— id=47331

26

好的，所以昨晚晚上发布此问题后，我继续进行一些研究，我遇到的唯一真正的解决方案似乎已经解决了这个问题。

使用ethtool禁用TSO，GSO和GRO：

ethtool -K eth0 gso off gro off tso off

根据此处找到的帖子：http : //ehc.ac/p/e1000/bugs/378/

据我了解，这将或可能导致性能下降。

我还注意到另一种解决方案是禁用活动状态电源管理

pcie_aspm=off

根据有关serverfault的文章：Linux e1000e（英特尔网络驱动程序）问题异常严重，我应该从哪里开始？

我还没有尝试过这种解决方案。我将尝试一下，看看是否有帮助，然后回发我的发现。

编辑：

好的，所以我尝试关闭Active-State电源管理，pcie_aspm = off，但这没有任何效果。我继续注意到我的日志文件中有错误。

这可能仍对某些人有用，因为启用电源管理后，某些英特尔NIC会出现不同内核进入睡眠状态的问题。

— 凯尔·库克斯
source

2

谢谢！我尝试了ethtool修复程序，它解决了我的问题。（也将其粘贴在初始化脚本中）

— 彼得

嗨，您知道运行ethtool -K eth0 gso off gro off tso off是否会在短时间内断开连接？

— godzillante

确实，使用ethtool禁用选项有所帮助，禁用电源管理选项却

— 无济于事

2

“根据此处找到的帖子：ehc.ac/p/e1000/bugs/378 ”现在转到域名抢注者，您可以在此处找到原始内容： web.archive.org/web/20160205153351/http

— Mike McCabe

6

在BIOS中禁用增强型C1（C1E）可以为我修复它。

不知道C1E的低功耗状态是否与驱动程序混淆，或者当处理器处于此状态时驱动程序中是否存在oops。

无论如何，问题解决了。

— 史蒂夫·格
source

这正是对我有用的解决方案。在华擎H170M-ITX / DL主板上运行Ubuntu 16.04 LTS。感谢SteveG。=）

— 尾巴

请注意，这可能会增加服务器功耗很多！

— Flatron

0

我遇到了问题（触发了与您相同的内核错误以及诸如“ Corrupted MAC on input”之类的用户空间SSH错误）。

解

对我有用的是禁用TCP校验和卸载：

# ethtool -K eth0 tx off rx off

与debian-ish / etc / network / interfaces的长期清洁集成：

#!/bin/bash
#
# Disables TCP offloading on all ifaces
#
# Inspired by: @Michelunik https://serverfault.com/a/422554/62953

RUN=true
case "${IF_NO_TOE,,}" in
    no|off|false|disable|disabled)
        RUN=false
    ;;
esac


# Other offloading options that could be disabled (not TCP related):
#  sg tso ufo gso gro lro rxvlan txvlan rxhash
# see man ethtool

if [ "$MODE" = start -a "$RUN" = true ]; then
  TOE_OPTIONS="rx tx"
  for TOE_OPTION in $TOE_OPTIONS; do
    /sbin/ethtool --offload "$IFACE" "$TOE_OPTION" off &>/dev/null || true
  done
fi

来源，灵感。

语境

德比·杰西（Debian Jessie）
内核4.7.0-0.bpo.1-amd64
lspci 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I218-V (rev 04)

— 乔斯林·德拉兰德（Jocelyn Delalande）
source

-1

尝试更新您的驱动程序。不知道它在Ubuntu的哪个位置或推荐哪个版本，但是对于CentOS或EL 6，它是：

http://mirror.symnds.com/distributions/elrepo/elrepo/el6/x86_64/RPMS/kmod-e1000e-3.1.0.2-1.el6.elrepo.x86_64.rpm

— 弗雷德·弗林特
source