mongodb配置服务器不同步


9

我已经设置了2个分片,2个副本服务器和3个配置服务器以及2个mongos。我有以下问题:

1)mongo配置服务器不同步:

Aug 14 09:46:48 server mongos.27017[10143]: Sun Aug 11 09:46:48.987 [CheckConfigServers] ERROR: config servers not in sync! config servers mongocfg1.testing.com:27000 and mongocfg3.testing.com:27000 differ#012chunks: "d2c08c5f1ee6048e5f6fab30e37a70f0"#011chunks: "7e643e9402ba90567ddc9388c2abdb8a"#012databases: "6f35ec52b536eee608d5bc706a72ec1e"#011databases: "6f35ec52b536eee608d5bc706a72ec1e"

2)我使用此文档来同步服务器:http : //docs.mongodb.org/manual/tutorial/replace-config-server/ 3)同步后,我重新启动了一个mongos服务器,并在日志中看到了这一点:

Thu Aug 15 09:56:05.376 [mongosMain] MongoS version 2.4.4 starting: pid=1575 port=27111 64-bit host=web-inno.innologica.com (--help for usage)
Thu Aug 15 09:56:05.376 [mongosMain] git version: 4ec1fb96702c9d4c57b1e06dd34eb73a16e407d2
Thu Aug 15 09:56:05.376 [mongosMain] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Thu Aug 15 09:56:05.376 [mongosMain] options: { configdb: "mongocfg1.testing.com:27000,mongocfg2.testing.com:27000,mongocfg3.testing.com:27000", keyFile: "/mongo_database/pass.key", port: 27111 }
Thu Aug 15 09:56:05.582 [mongosMain] SyncClusterConnection connecting to [mongocfg1.testing.com:27000]
Thu Aug 15 09:56:05.583 [mongosMain] SyncClusterConnection connecting to [mongocfg2.testing.com:27000]
Thu Aug 15 09:56:05.583 [mongosMain] SyncClusterConnection connecting to [mongocfg3.testing.com:27000]
Thu Aug 15 09:56:05.585 [mongosMain] SyncClusterConnection connecting to [mongocfg1.testing.com:27000]
Thu Aug 15 09:56:05.586 [mongosMain] SyncClusterConnection connecting to [mongocfg2.testing.com:27000]
Thu Aug 15 09:56:05.586 [mongosMain] SyncClusterConnection connecting to [mongocfg3.testing.com:27000]
Thu Aug 15 09:56:07.213 [Balancer] about to contact config servers and shards
Thu Aug 15 09:56:07.213 [websvr] admin web console waiting for connections on port 28111
Thu Aug 15 09:56:07.213 [Balancer] starting new replica set monitor for replica set replica01 with seed of mongo1.testing.com:27020,mongo2.testing.com:27020,mongo3.testing.com:27017
Thu Aug 15 09:56:07.214 [Balancer] successfully connected to seed mongo1.testing.com:27020 for replica set replica01
Thu Aug 15 09:56:07.214 [Balancer] changing hosts to { 0: "mongo1.testing.com:27020", 1: "mongo2.testing.com:27020" } from replica01/
Thu Aug 15 09:56:07.214 [Balancer] trying to add new host mongo1.testing.com:27020 to replica set replica01
Thu Aug 15 09:56:07.215 [Balancer] successfully connected to new host mongo1.testing.com:27020 in replica set replica01
Thu Aug 15 09:56:07.215 [Balancer] trying to add new host mongo2.testing.com:27020 to replica set replica01
Thu Aug 15 09:56:07.215 [Balancer] successfully connected to new host mongo2.testing.com:27020 in replica set replica01
Thu Aug 15 09:56:07.215 [mongosMain] waiting for connections on port 27111
Thu Aug 15 09:56:07.427 [Balancer] Primary for replica set replica01 changed to mongo1.testing.com:27020
Thu Aug 15 09:56:07.429 [Balancer] replica set monitor for replica set replica01 started, address is replica01/mongo1.testing.com:27020,mongo2.testing.com:27020
Thu Aug 15 09:56:07.429 [ReplicaSetMonitorWatcher] starting
Thu Aug 15 09:56:07.430 [Balancer] starting new replica set monitor for replica set replica02 with seed of mongo5.testing.com:27020,mongo6.testing.com:27020
Thu Aug 15 09:56:07.431 [Balancer] successfully connected to seed mongo5.testing.com:27020 for replica set replica02
Thu Aug 15 09:56:07.432 [Balancer] changing hosts to { 0: "mongo5.testing.com:27020", 1: "mongo6.testing.com:27020" } from replica02/
Thu Aug 15 09:56:07.432 [Balancer] trying to add new host mongo5.testing.com:27020 to replica set replica02
Thu Aug 15 09:56:07.432 [Balancer] successfully connected to new host mongo5.testing.com:27020 in replica set replica02
Thu Aug 15 09:56:07.432 [Balancer] trying to add new host mongo6.testing.com:27020 to replica set replica02
Thu Aug 15 09:56:07.433 [Balancer] successfully connected to new host mongo6.testing.com:27020 in replica set replica02
Thu Aug 15 09:56:07.712 [Balancer] Primary for replica set replica02 changed to mongo5.testing.com:27020
Thu Aug 15 09:56:07.714 [Balancer] replica set monitor for replica set replica02 started, address is replica02/mongo5.testing.com:27020,mongo6.testing.com:27020
Thu Aug 15 09:56:07.715 [Balancer] config servers and shards contacted successfully
Thu Aug 15 09:56:07.715 [Balancer] balancer id: web-inno.innologica.com:27111 started at Aug 15 09:56:07
Thu Aug 15 09:56:07.715 [Balancer] SyncClusterConnection connecting to [mongocfg1.testing.com:27000]
Thu Aug 15 09:56:07.716 [Balancer] SyncClusterConnection connecting to [mongocfg2.testing.com:27000]
Thu Aug 15 09:56:24.438 [mongosMain] connection accepted from 127.0.0.1:55303 #1 (1 connection now open)
Thu Aug 15 09:56:24.443 [conn1]  authenticate db: admin { authenticate: 1, nonce: "6cc9a76b79656179", user: "admin", key: "xxxxxxxxxxxxxxxxxxx" }
Thu Aug 15 09:56:26.676 [conn1] creating WriteBackListener for: mongo1.testing.com:27020 serverID: 520c7b87e4a4c3afa569b21a
Thu Aug 15 09:56:26.676 [conn1] creating WriteBackListener for: mongo2.testing.com:27020 serverID: 520c7b87e4a4c3afa569b21a
Thu Aug 15 09:56:26.678 [conn1] creating WriteBackListener for: mongo5.testing.com:27020 serverID: 520c7b87e4a4c3afa569b21a
Thu Aug 15 09:56:26.678 [conn1] creating WriteBackListener for: mongo6.testing.com:27020 serverID: 520c7b87e4a4c3afa569b21a
Thu Aug 15 09:56:26.679 [conn1] SyncClusterConnection connecting to [mongocfg1.testing.com:27000]
Thu Aug 15 09:56:26.679 [conn1] SyncClusterConnection connecting to [mongocfg2.testing.com:27000]
Thu Aug 15 09:56:26.680 [conn1] SyncClusterConnection connecting to [mongocfg3.testing.com:27000]
Thu Aug 15 09:57:33.704 [conn1] warning: inconsistent chunks found when reloading collection.documents, previous version was 8651|7||51b5c7a96b2903a0b3fac106, this should be rare
Thu Aug 15 09:57:33.714 [conn1] warning: ChunkManager loaded an invalid config for collection.documents, trying again
Thu Aug 15 09:57:34.065 [conn1] warning: inconsistent chunks found when reloading collection.documents, previous version was 8651|7||51b5c7a96b2903a0b3fac106, this should be rare
Thu Aug 15 09:57:34.076 [conn1] warning: ChunkManager loaded an invalid config for collection.documents, trying again
Thu Aug 15 09:57:34.491 [conn1] warning: inconsistent chunks found when reloading collection.documents, previous version was 8651|7||51b5c7a96b2903a0b3fac106, this should be rare
Thu Aug 15 09:57:34.503 [conn1] warning: ChunkManager loaded an invalid config for collection.documents, trying again
Thu Aug 15 09:57:34.533 [conn1] Assertion: 13282:Couldn't load a valid config for collection.documents after 3 attempts. Please try again.
0xa82161 0xa46e8b 0xa473cc 0x8b857e 0x93cb52 0x93f329 0x93ff18 0x94311f 0x9740e0 0x991865 0x669887 0xa6e8ce 0x7f4456361851 0x7f445570790d
 /usr/bin/mongos(_ZN5mongo15printStackTraceERSo+0x21) [0xa82161]
 /usr/bin/mongos(_ZN5mongo11msgassertedEiPKc+0x9b) [0xa46e8b]
 /usr/bin/mongos() [0xa473cc]
 /usr/bin/mongos(_ZN5mongo12ChunkManager18loadExistingRangesERKSs+0x24e) [0x8b857e]
 /usr/bin/mongos(_ZN5mongo8DBConfig14CollectionInfo5shardEPNS_12ChunkManagerE+0x52) [0x93cb52]
 /usr/bin/mongos(_ZN5mongo8DBConfig14CollectionInfoC1ERKNS_7BSONObjE+0x149) [0x93f329]
 /usr/bin/mongos(_ZN5mongo8DBConfig5_loadEv+0xa48) [0x93ff18]
 /usr/bin/mongos(_ZN5mongo8DBConfig4loadEv+0x1f) [0x94311f]
 /usr/bin/mongos(_ZN5mongo4Grid11getDBConfigESsbRKSs+0x480) [0x9740e0]
 /usr/bin/mongos(_ZN5mongo7Request5resetEv+0x1d5) [0x991865]
 /usr/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x67) [0x669887]
 /usr/bin/mongos(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xa6e8ce]
 /lib64/libpthread.so.0(+0x7851) [0x7f4456361851]
 /lib64/libc.so.6(clone+0x6d) [0x7f445570790d]
Thu Aug 15 09:57:34.549 [conn1] scoped connection to mongocfg1.testing.com:27000,mongocfg2.testing.com:27000,mongocfg3.testing.com:27000 not being returned to the pool
Thu Aug 15 09:57:34.549 [conn1] warning: error loading initial database config information :: caused by :: Couldn't load a valid config for collection.documents after 3 attempts. Please try again.
Thu Aug 15 09:57:34.549 [conn1] AssertionException while processing op type : 2004 to : collection.system.namespaces :: caused by :: 13282 error loading initial database config information :: caused by :: Couldn't load a valid config for collection.documents after 3 attempts. Please try again.
Thu Aug 15 09:57:37.722 [Balancer] SyncClusterConnection connecting to [mongocfg1.testing.com:27000]
Thu Aug 15 09:57:37.723 [Balancer] SyncClusterConnection connecting to [mongocfg2.testing.com:27000]
Thu Aug 15 09:57:37.723 [Balancer] SyncClusterConnection connecting to [mongocfg3.testing.com:27000]

第一个mongos也出现此错误“警告:错误:加载初始数据库配置信息::错误,原因是:::尝试3次后无法加载有效的collection.document配置。请重试。”

但现在就工作。

重新启动后的第二个mongos无法正常工作;

mongos> show collections
Thu Aug 15 09:57:34.550 JavaScript execution failed: error: {
    "$err" : "error loading initial database config information :: caused by :: Couldn't load a valid config for collection.documents after 3 attempts. Please try again.",
    "code" : 13282
} at src/mongo/shell/query.js:L128
mongos>

恢复配置服务器的下一步是什么?

欢迎提供所有建议。

Answers:


11

还原配置服务器,特别是如果您遇到了某种灾难性事件,这很棘手,但并非没有可能。但是,在进行下一步之前,有一个大胆的警告:

备份一切

这意味着要备份所有三个配置服务器。我会给您一些建议,通常是正确的,但是在覆盖/替换任何内容之前备份每个当前的配置服务器实例

快速说明一下,配置服务器未配置为副本集-每个配置服务器实例都应与其他实例完全相同(至少对于所有重要集合而言)。因此,任何运行良好的配置服务器都可以用来替换运行状况不佳的配置服务器,然后您可以按照前面提到的教程重新获得良好的配置。

恢复的关键是确定运行状况良好的配置服务器,然后使用它替换其他配置服务器-然后最终获得3台相同的配置服务器。

有多种方法可以完成此操作,它们基本上分为三类:

1)使用错误信息

打印出的错误消息实际上使您知道它认为哪台配置服务器是健康的,尽管从消息中看不出来。通用阅读方法如下:

ERROR: config servers not in sync! config servers <healthy-server> and <out-of-sync-server> differ

基本上,列表中的第一个是健康的,如果您是的话mongocfg1.testing.com:27000。那是我们健康配置数据库的第一个候选人。

2)dbhash用于比较所有三个并选择同意的三个

在每个配置服务器上,使用切换到配置数据库use config,运行db.runCommand("dbhash")并比较以下集合的哈希值:

  • 大块
  • 数据库
  • 设定
  • 分片

您正在寻找两个同意的服务器,并以此为基础来确定那些主机上的config数据库的版本基本上是可信赖的,并且应该用作种子。

3.手动检查配置数据库中的集合

最后,看一下配置数据库,并注意上面第二个选项中列出的集合。这是根据您对数据的熟悉程度做出的直接判断。

希望这三种方法都将您指向同一台主机。应该使用该配置服务器作为其他两个的种子(在备份之后,可以返回)。基本上,这是您最好的选择。如果失败,那么您可能要尝试其他版本之一(从备份中选择)-始终确保在启动它们时,这三个版本完全相同。

最后,请始终确保所有mongos进程都使用相同的配置服务器字符串,并且所有3个服务器始终在每个进程中以相同的顺序列出-否则,在所有mongos进程中这样做都可能导致(非常)奇怪的结果。


对于第二类,我没有看到“数据库”集合。如果“块”之类的其他内容同步,这有多重要?可以在其他地方找到吗?
抓取

我按照您所说的,只看到3个配置服务器的“ md5”:“ d41d8cd98f00b204e9800998ecf8429e”。我该如何纠正错误?
阿米特·特里帕蒂

嗨,AdamC,我目前遇到同样的问题,因此我必须尽快解决。我有一个快速的问题。我必须在关闭configdb之前停止所有mongos和mongod吗?
rendybjunior

如果您有紧急问题,建议您就此问题寻求专业建议-与MongoDB联系,并寻求他们的支持。我不再为MongoDB工作,也不想与您讨论这种过程,尤其是对于数据库的最新版本,此类问题已经发生了很大的变化(此答案写于4年前)
Adam C
By using our site, you acknowledge that you have read and understand our Cookie Policy and Privacy Policy.
Licensed under cc by-sa 3.0 with attribution required.