2 java处理挂起约2.8s


0

我在同一个vm上的VMWare服务器上运行了2个进程。 Centos 6.x

我在两个进程上运行了strace并保存了输出

6970  14:04:09.643295 futex(0x7f47d8027754, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f47d8027750, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
6969  14:04:09.643304 futex(0x7f47d8027754, FUTEX_WAIT_PRIVATE, 54869, NULL <unfinished ...>
6971  14:04:09.643353 futex(0x7f47d802b128, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
6970  14:04:09.643363 <... futex resumed> ) = 0 <0.000063>
6969  14:04:09.643372 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) <0.000063>
6971  14:04:09.643411 <... futex resumed> ) = 0 <0.000052>
6969  14:04:09.643420 futex(0x7f47d8027728, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
6971  14:04:09.643459 futex(0x7f47d8030854, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f47d8030850, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
6969  14:04:09.643469 <... futex resumed> ) = 0 <0.000044>
6971  14:04:09.643501 <... futex resumed> ) = 1 <0.000037>
6974  14:04:09.650775 <... futex resumed> ) = 0 <0.511421>
17035 14:04:12.446009 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <2.902876>
17009 14:04:12.446112 <... poll resumed> ) = 1 ([\{fd=271, revents=POLLIN}]) <3.742208>
17008 14:04:12.446131 <... poll resumed> ) = 1 ([\{fd=193, revents=POLLIN}]) <3.747483>
24085 14:04:12.446144 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <2.822784>
24082 14:04:12.446157 <... poll resumed> ) = 1 ([\{fd=300, revents=POLLIN}]) <4.404341>
6416  14:04:12.446172 <... poll resumed> ) = 1 ([\{fd=187, revents=POLLIN}]) <3.557929>
18296 14:04:12.446189 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <2.822459>
18295 14:04:12.446201 <... poll resumed> ) = 1 ([\{fd=290, revents=POLLIN}]) <3.518658>

关键部分是

6974  14:04:09.650775 <... futex resumed> ) = 0 <0.511421>
17035 14:04:12.446009 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <2.902876>

另一个过程是

7510  14:04:09.622601 futex(0x7f8874033728, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000024>
7510  14:04:09.622698 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={72, 763938}, ru_stime={3, 512466}, ...}) = 0 <0.000023>
7510  14:04:09.622761 futex(0x7f8874033754, FUTEX_WAIT_BITSET_PRIVATE, 1, {162879, 332257449}, ffffffff <unfinished ...>
7543  14:04:09.644930 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <0.050089>
7543  14:04:09.644991 futex(0x7f88757c9128, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000039>
7543  14:04:09.645091 futex(0x7f88757c9154, FUTEX_WAIT_BITSET_PRIVATE, 1, {162879, 104576419}, ffffffff <unfinished ...>
7766  14:04:09.671201 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <0.100101>
17254 14:04:12.445858 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <2.859094>
24083 14:04:12.446056 <... poll resumed> ) = 1 ([\{fd=277, revents=POLLIN}]) <4.404263>

关键位是

7766  14:04:09.671201 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <0.100101>
17254 14:04:12.445858 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <2.859094>

大约在同一时间java进程停止并且它们超时。

现在我有其他日志显示那段时间发生的事情 - 我相信。

我该如何追踪futex是什么?或者我如何才能找到更多信息/我可以获取更多数据来诊断此问题?


可能是一个Linux内核错误。看到 这里
xenoid

@xenoid这是2015年 - 从那时起的任何内核今天都会非常不安全(前幽灵等)。
Eugen Rieck

我看到了内核错误,但指出它已经相当陈旧了
Keyzer Suze

我正在运行最新的centos 6.x
Keyzer Suze
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.