psql:致命错误:对不起,已经有太多客户端


16

当尝试访问使用postgresql数据库的网站,甚至使用psql实用程序或pgadmin3时,我突然收到此错误。

我的数据库设置为处理150个最大连接:

# SHOW max_connections;
 max_connections 
-----------------
 150
(1 row)

重新启动我的网站所在的ubuntu服务器(这实际上是使用连接的唯一方法)后,我看到当前的连接数量为140:

# select count(*) from pg_stat_activity;
 count 
-------
   140
(1 row)

我不明白重新启动服务器后突然有这么多连接。所以我检查了postgresql活动:

# SELECT * FROM pg_stat_activity;

而且我看到100列具有相同的确切查询,如下所示:

SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1

更重要的是它们都具有相同的客户端地址(我的Web服务器)。

该Web服务器在连接池为50的导轨上使用ruby。即使连接池为50,Passenger进程/ prefork apache配置也是单线程的,因此每个进程无法生成50个线程和50个数据库连接。更重要的是,这发生在系统重启后,所有用户都退出了我的Web服务器。可能是数据库服务器上的postgresql不知道Web服务器重新启动,并且仍在尝试执行这些查询。

为了回答克雷格的评论,在等待列下显示字母“ f”。看来查询仍在执行,并且锁尚未释放。如我前面所述,奇怪的是,在这种执行状态下,突然之间出现了相隔毫秒的100多个彼此相同的查询。这对我来说是个谜:

mydb=# SELECT * FROM pg_stat_activity;

 datid  | datname  | procpid | usesysid | usename |                                                                           current_query                                                                           | waiting |          xact_start           |          query_start          |         backend_start         |  client_addr   | client_port
--------+----------+---------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------------------------------+-------------------------------+-------------------------------+----------------+-------------
 464875 | mydb     |    4992 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:48.437081-04 | 2014-06-28 22:46:48.437081-04 | 2014-06-28 22:46:44.089764-04 | 192.111.11.111 |       37166
 464875 | mydb     |    4993 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:48.497764-04 | 2014-06-28 22:46:48.497764-04 | 2014-06-28 22:46:44.277856-04 | 192.111.11.111 |       37167
 464875 | mydb     |    4994 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:48.504425-04 | 2014-06-28 22:46:48.504425-04 | 2014-06-28 22:46:44.485269-04 | 192.111.11.111 |       37168
 464875 | mydb     |    4996 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:48.482695-04 | 2014-06-28 22:46:48.482695-04 | 2014-06-28 22:46:44.688203-04 | 192.111.11.111 |       37169
 464875 | mydb     |    4998 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:48.432836-04 | 2014-06-28 22:46:48.432836-04 | 2014-06-28 22:46:44.703883-04 | 192.111.11.111 |       37170

-- many more

 464875 | mydb     |    5052 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:59.584386-04 | 2014-06-28 22:46:59.584386-04 | 2014-06-28 22:46:51.85682-04  | 192.111.11.111 |       37360
 464875 | mydb     |    5053 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:59.506483-04 | 2014-06-28 22:46:59.506483-04 | 2014-06-28 22:46:52.083316-04 | 192.111.11.111 |       37367
 464875 | mydb     |    8958 |    16387 | myuser | <IDLE>                                                                                                                                                            | f       |                               | 2014-06-29 00:05:06.735249-04 | 2014-06-27 16:34:39.307312-04 | 192.111.11.111 |       52759
 464875 | mydb     |    5054 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:59.52573-04  | 2014-06-28 22:46:59.52573-04  | 2014-06-28 22:46:52.285867-04 | 192.111.11.111 |       37371
 464875 | mydb     |    5055 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:59.530804-04 | 2014-06-28 22:46:59.530804-04 | 2014-06-28 22:46:52.303562-04 | 192.111.11.111 |       37372
 464875 | mydb     |    5056 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:59.572198-04 | 2014-06-28 22:46:59.572198-04 | 2014-06-28 22:46:52.31447-04  | 192.111.11.111 |       37373
 464875 | mydb     |    5057 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:59.872037-04 | 2014-06-28 22:46:59.872037-04 | 2014-06-28 22:46:52.323721-04 | 192.111.11.111 |       37374
 464875 | mydb     |    5058 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:59.961803-04 | 2014-06-28 22:46:59.961803-04 | 2014-06-28 22:46:52.334238-04 | 192.111.11.111 |       37375
 464875 | mydb     |    5059 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:59.53713-04  | 2014-06-28 22:46:59.53713-04  | 2014-06-28 22:46:52.347227-04 | 192.111.11.111 |       37376
 464875 | mydb     |    5060 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:47:00.208948-04 | 2014-06-28 22:47:00.208948-04 | 2014-06-28 22:46:52.360008-04 | 192.111.11.111 |       37377
 464875 | mydb     |    5061 |    16387 | myuser | SELECT  "reports".* FROM "reports"  WHERE (("reports"."time" < '2014-06-28 13:30:42.000000' AND "reports"."unit_id" = 3192)) ORDER BY "reports"."id" DESC LIMIT 1 | f       | 2014-06-28 22:46:59.938983-04 | 2014-06-28 22:46:59.938983-04 | 2014-06-28 22:46:52.369496-04 | 192.111.11.111 |       37378

看一看pg_stat_activity.backend_start。这些连接是在Web服务器重启之前还是之后创建的?如果它们都是新连接,我想这意味着问题出在Web服务器的一端。
尼克·巴恩斯

@NickBarnes这些连接在“ current_query”列下都具有相同的查询,并且它们的backend_start时间实际上是相同的(相隔毫秒)。那真是太奇怪了,我相信如果内存能正确使用它们,它们全都在重启之前。但是我认为重新启动会断开连接。
JohnMerlino 2014年

1
好的...您可能需要检查top服务器,以查看这些进程是否繁忙。如果是这样,那么我认为一旦查询结束,连接就应该消失(或者,您现在就可以杀死它们)。如果它们闲置,并且连接肯定已断开,那么我不确定接下来会发生什么或如何防止此情况发生
Nick Barnes

1
检查中的waiting标记pg_stat_activity,看看它们是否被锁住。
Craig Ringer 2014年

1
您粘贴的输出SELECT * FROM pg_stat_activity;令人难以置信-列数不足。状态栏会说什么?那是这个问题最重要的领域。
eradman 2015年

Answers:


5

这似乎是客户端编程的特定问题。您将无法通过提高“ max_connections”参数来解决此问题。

我发现了一个可能的相关问题: Ruby数据库连接池

尽管如此,您还可以进行更多的服务器端调试:

启用“ log_connections”和“ log_disconnections”。还要将“ log_line_prefix”与“%m%a%p”一起使用。

调试PostgreSQL服务器非常有用的应用程序是powa或其他更顶级的应用程序,例如:pg_activity

对于实时服务器调试,我更喜欢pg_activity-尤其是它具有显示阻止程序和终止会话的功能。


-4

这是解决问题的最好方法。。。

使用SSH Putty登录到服务器,

sudo /etc/init.d/postgresql停止

这样可以杀死数据库中的无效日志进程,

sudo /etc/init.d/postgresql开始


6
然后下次您再次停止生产服务器时?您的解决方案显然可以消除卡住的流程,但不能解释为什么存在这些流程,也不是可持续的流程。
dezso
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.