需要增加到上游unix套接字的nginx吞吐量-Linux内核调整吗？

28

我正在运行一个nginx服务器，它充当上游unix套接字的代理，如下所示：

upstream app_server {
        server unix:/tmp/app.sock fail_timeout=0;
}

server {
        listen ###.###.###.###;
        server_name whatever.server;
        root /web/root;

        try_files $uri @app;
        location @app {
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header X-Forwarded-Proto $scheme;
                proxy_set_header Host $http_host;
                proxy_redirect off;
                proxy_pass http://app_server;
        }
}

反过来，某些应用服务器进程会在请求/tmp/app.sock可用时将其拉出。这里使用的特定应用服务器是Unicorn，但是我认为这与这个问题无关。

问题是，似乎经过一定的负载后，nginx无法以足够快的速率通过套接字获取请求。我设置了多少个应用服务器进程都没有关系。

我在nginx错误日志中收到大量这些消息：

connect() to unix:/tmp/app.sock failed (11: Resource temporarily unavailable) while connecting to upstream

许多请求的状态码为502，而不需要很长时间才能完成的请求。Nginx写入队列状态徘徊在1000左右。

无论如何，我觉得这里缺少明显的东西，因为nginx和应用服务器的这种特殊配置非常普遍，尤其是对于Unicorn（实际上是推荐的方法）。是否需要设置任何Linux内核选项，或者是否需要在Nginx中进行设置？关于如何增加上游套接字的吞吐量的任何想法？我明显做错了什么？

有关环境的其他信息：

$ uname -a
Linux servername 2.6.35-32-server #67-Ubuntu SMP Mon Mar 5 21:13:25 UTC 2012 x86_64 GNU/Linux

$ ruby -v
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux]

$ unicorn -v
unicorn v4.3.1

$ nginx -V
nginx version: nginx/1.2.1
built by gcc 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
TLS SNI support enabled

当前内核调整：

net.core.rmem_default = 65536
net.core.wmem_default = 65536
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_mem = 16777216 16777216 16777216
net.ipv4.tcp_window_scaling = 1
net.ipv4.route.flush = 1
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1
net.core.somaxconn = 8192
net.netfilter.nf_conntrack_max = 524288

Nginx用户的Ulimit设置：

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65535
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

— 李本
source

您是否检查的输出ulimit，特别是打开文件的数量？

— 哈立德2012年

@Khaled ulimit -n说65535。

— 李·李

16

听起来瓶颈是应用程序为套接字供电，而不是Nginx本身。当与套接字一起使用而不是与TCP / IP连接一起使用PHP时，我们经常看到这一点。在我们的例子中，PHP的瓶颈要比Nginx早得多。

您是否检查过sysctl.conf连接跟踪限制，套接字积压限制

net.core.somaxconn
net.core.netdev_max_backlog

— 本·莱萨尼-索纳西
source

2

我解决了这个问题。查看我发布的答案。实际上，这实际上是应用程序瓶颈，而不是套接字。由于诊断错误，我早些时候已排除了这种情况，但事实证明问题出在另一台服务器的吞吐量上。几个小时前就解决了。我将向您颁发赏金，因为尽管我在问题中误诊了，但您几乎已经确定了问题的根源。但是，在我的回答中打上勾号，因为我的回答描述了确切的情况，因此将来可能会对类似问题的人有所帮助。

— 李李

将新服务器移到可以提供足够吞吐量的位置，完全重建了系统，并且仍然存在相同的问题。因此，事实证明我的问题毕竟还没有解决... =（我仍然认为它是特定于应用程序的，但是什么也没想到。这个新服务器的设置与其他服务器一样，可以正常工作。是的，somaxconn和netdev_max_backlog正确建立。–

— Ben Lee

您的问题不是nginx，它的功能不止于此-但这并不是说您可能没有流氓设置。如果未正确配置限制，则套接字在高负载下特别敏感。您可以改为使用tcp / ip尝试您的应用程序吗？

— Ben Lessani-Sonassi 2012年

使用tcp / ip甚至出现更严重的问题（写入队列的爬升甚至更快）。我在另一台机器上都设置了nginx / unicorn /内核，它们完全相同（据我所知），并且其他机器没有出现此问题。（我可以在两台计算机之间切换dns，以进行实时负载测试，并使dns在60秒的ttl上运行）

— Ben Lee

现在，每台计算机和数据库计算机之间的吞吐量是相同的，新计算机和数据库计算机之间的延迟比旧计算机和数据库之间的延迟大约高30％。但是，十分之几毫秒多了30％并不是问题。

— 本·李

2

您可以尝试查看 unix_dgram_qlen，请参阅proc docs。尽管通过在队列中指向更多内容可能使问题复杂化？您必须查看（netstat -x ...）

— 吉姆
source

这有什么进展吗？

— jmw 2012年

1

感谢您的想法，但这似乎没有任何效果。

— Ben Lee

0

我通过增加config / unicorn.rb中的积压数量来解决...我过去积压的数量为64。

 listen "/path/tmp/sockets/manager_rails.sock", backlog: 64

而我得到这个错误：

 2014/11/11 15:24:09 [error] 12113#0: *400 connect() to unix:/path/tmp/sockets/manager_rails.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: 192.168.101.39, server: , request: "GET /welcome HTTP/1.0", upstream: "http://unix:/path/tmp/sockets/manager_rails.sock:/welcome", host: "192.168.101.93:3000"

现在，我增加到1024，但没有收到错误消息：

 listen "/path/tmp/sockets/manager_rails.sock", backlog: 1024

— 阿德里安
source

0

tl; dr

确保Unicorn积压很大（使用套接字，比TCP快） listen("/var/www/unicorn.sock", backlog: 1024)
优化NGINX性能设置，例如worker_connections 10000;

讨论区

我们有同样的问题-Unicorn在NGINX反向代理后面提供了一个Rails应用程序。

我们在Nginx错误日志中得到了这样的行：

2019/01/29 15:54:37 [error] 3999#3999: *846 connect() to unix:/../unicorn.sock failed (11: Resource temporarily unavailable) while connecting to upstream, client: xx.xx.xx.xx, request: "GET / HTTP/1.1"

在阅读其他答案时，我们还认为也许是独角兽应该受到谴责，因此我们增加了积压的订单，但这并不能解决问题。监视服务器进程很明显，Unicorn没有处理请求，因此NGINX似乎是瓶颈。

在nginx.conf此性能调整文章中搜索NGINX设置进行调整，指出了一些设置可能会影响NGINX可以处理多少个并行请求，尤其是：

user www-data;
worker_processes auto;
pid /run/nginx.pid;
worker_rlimit_nofile 400000; # important

events {    
  worker_connections 10000; # important
  use epoll; # important
  multi_accept on; # important
}

http {
  sendfile on;
  tcp_nopush on;
  tcp_nodelay on;
  keepalive_timeout 65;
  types_hash_max_size 2048;
  keepalive_requests 100000; # important
  server_names_hash_bucket_size 256;
  include /etc/nginx/mime.types;
  default_type application/octet-stream;
  ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
  ssl_prefer_server_ciphers on;
  access_log /var/log/nginx/access.log;
  error_log /var/log/nginx/error.log;
  gzip on;
  gzip_disable "msie6";
  include /etc/nginx/conf.d/*.conf;
  include /etc/nginx/sites-enabled/*;
}

— 表观基因
source

-1

在独角兽配置中，backlog的默认值为1024。

http://unicorn.bogomips.org/Unicorn/Configurator.html

listen "/path/to/.unicorn.sock", :backlog => 1024

1024客户端是Unix域套接字限制。

— 对马健
source