获取集合中所有键的名称

322

我想获取MongoDB集合中所有键的名称。

例如，从此：

db.things.insert( { type : ['dog', 'cat'] } );
db.things.insert( { egg : ['cat'] } );
db.things.insert( { type : [] } );
db.things.insert( { hello : []  } );

我想获得唯一的键：

type, egg, hello

mongodb mongodb-query aggregation-framework

— 史蒂夫
source

346

您可以使用MapReduce做到这一点：

mr = db.runCommand({
  "mapreduce" : "my_collection",
  "map" : function() {
    for (var key in this) { emit(key, null); }
  },
  "reduce" : function(key, stuff) { return null; }, 
  "out": "my_collection" + "_keys"
})

然后在结果集合上进行非重复运行，以便找到所有键：

db[mr.result].distinct("_id")
["foo", "bar", "baz", "_id", ...]

— 克里斯蒂娜
source

2

嗨，您好！我刚刚发布了此问题的后续版本，询问即使使用位于数据结构中更深层次的键（stackoverflow.com/questions/2997004/…），该片段也如何工作。

— Andrea Fiore 2010年

1

@kristina：怎么可能，我得到整个事情用在这个时候用钥匙列出的东西集合。它看起来与历史记录机制有关，因为我得到了过去修改过的东西

— Shawn

3

我知道这是一个旧线程，但是我似乎也有类似的需求。我正在使用nodejs mongodb本机驱动程序。产生的临时集合似乎总是空的。我为此在集合类中使用mapreduce函数。那不可能吗？

— 迪帕克

6

这可能很明显，但是如果要获取子文档中所有唯一键的列表，只需修改以下行：for (var key in this.first_level.second_level.nth_level) { emit(key, null); }

— dtbarne

3

我使用map（）而不是保存到集合上然后再运行，而不是：我使用了map（）：db.runCommand({..., out: { "inline" : 1 }}).results.map(function(i) { return i._id; });

— Ian Stanley

203

以克里斯蒂娜的答案为灵感，我创建了一个名为Variety的开源工具，该工具可以做到这一点：https : //github.com/variety/variety

— 詹姆斯·克罗普乔
source

13

恭喜，这是一个了不起的工具。它完全符合问题的要求，并且可以配置限制，深度等。任何后续建议。

— Paul Biggar 2012年

74

您可以使用聚集新$objectToArrray的3.4.4版本把所有顶尖键和值对添加到文档数组，然后$unwind与$group 用$addToSet得到跨越整个集合不同的键。

$$ROOT 用于引用顶级文档。

db.things.aggregate([
  {"$project":{"arrayofkeyvalue":{"$objectToArray":"$$ROOT"}}},
  {"$unwind":"$arrayofkeyvalue"},
  {"$group":{"_id":null,"allkeys":{"$addToSet":"$arrayofkeyvalue.k"}}}
])

您可以使用以下查询在单个文档中获取密钥。

db.things.aggregate([
  {"$match":{_id: "5e8f968639bb8c67726686bc"}}, /* Replace with the document's ID */
  {"$project":{"arrayofkeyvalue":{"$objectToArray":"$$ROOT"}}},
  {"$project":{"keys":"$arrayofkeyvalue.k"}}
])

— 萨加尔·维拉姆（Sagar Veeram）
source

20

这确实是最好的答案。解决该问题而无需涉及其他编程语言或程序包，并与支持聚合框架的所有驱动程序一起使用（甚至是Meteor！）

— Micah Henning

2

如果要返回一个数组，而不是一个包含带有“ allkeys”键的单个映射条目的光标，则可以追加.next()["allkeys"]到命令（假设集合中至少包含一个元素）。

— M. Justin

19

尝试这个：

doc=db.thinks.findOne();
for (key in doc) print(key);

— 卡洛斯·LM
source

49

错误答案，因为这仅输出集合中单个文档的字段-其他文档可能都具有完全不同的键。

— Asya Kamsky

15

对于我来说，这仍然是最有用的答案，只是一个简单的合理最小值。

— 鲍里斯·伯科夫

11

这没用吗？如果给您错误的答案，它有什么用？

— Zlatko

4

上下文显示了有用的内容：如果对数据进行了规范化（例如，来自CSV文件的原始数据），则将非常有用...对于从SQL导入的数据，将非常有用。

— 彼得·克劳斯

5

这不是一个好的答案，而是关于如何获取集合中一个元素的键而不是集合中所有键的答案！

— yonatan '16

16

如果目标集合不是太大，则可以在mongo shell客户端下尝试以下操作：

var allKeys = {};

db.YOURCOLLECTION.find().forEach(function(doc){Object.keys(doc).forEach(function(key){allKeys[key]=1})});

allKeys;

— 李春林
source

如果我想看这里，我如何给regExp特定的键？

— TB.M，2017年

@ TB.M，您可以尝试：db.configs.find（）。forEach（function（doc）{Object.keys（doc）.forEach（function（key））{if（/YOURREGEXP/.test(key））{ allKeys [key] = 1}}）}））;

— 李春林

测试在这里是什么意思？你能解释一下吗？

— TB.M

@ TB.M developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/...

— 李春林

14

使用pymongo清理并重用的解决方案：

from pymongo import MongoClient
from bson import Code

def get_keys(db, collection):
    client = MongoClient()
    db = client[db]
    map = Code("function() { for (var key in this) { emit(key, null); } }")
    reduce = Code("function(key, stuff) { return null; }")
    result = db[collection].map_reduce(map, reduce, "myresults")
    return result.distinct('_id')

用法：

get_keys('dbname', 'collection')
>> ['key1', 'key2', ... ]

— 英戈·费舍尔
source

1

效果很好。最后得到了我的问题解决了....这是堆栈溢出最简单的方案，我看到..

— 啪阿尔法

要按类型过滤，只需在if (typeof(this[key]) == 'number')之前添加例如emit(key, null)。

— Skippy le Grand Gourou

10

使用python。返回集合中所有顶级键的集合：

#Using pymongo and connection named 'db'

reduce(
    lambda all_keys, rec_keys: all_keys | set(rec_keys), 
    map(lambda d: d.keys(), db.things.find()), 
    set()
)

— 莱泽
source

1

我发现这有效，但是与原始mongod查询相比，效率如何？

— 耶稣·戈麦斯

1

与在Mongodb中直接执行此操作相比，我非常确定这是非常低效的

— Ingo Fischer

9

这是在Python中工作的示例：此示例内联返回结果。

from pymongo import MongoClient
from bson.code import Code

mapper = Code("""
    function() {
                  for (var key in this) { emit(key, null); }
               }
""")
reducer = Code("""
    function(key, stuff) { return null; }
""")

distinctThingFields = db.things.map_reduce(mapper, reducer
    , out = {'inline' : 1}
    , full_response = True)
## do something with distinctThingFields['results']

— 鲍勃·海
source

9

如果您使用的是mongodb 3.4.4及更高版本，则可以使用$objectToArray和$group聚合来使用以下聚合

db.collection.aggregate([
  { "$project": {
    "data": { "$objectToArray": "$$ROOT" }
  }},
  { "$project": { "data": "$data.k" }},
  { "$unwind": "$data" },
  { "$group": {
    "_id": null,
    "keys": { "$addToSet": "$data" }
  }}
])

这是工作示例

— 阿什
source

这是最好的答案。您还可以$match在聚合管道的开头使用来仅获取与条件匹配的文档的键。

— RonquilloAeon

5

令人惊讶的是，这里没有人使用简单javascript和Set逻辑自动过滤重复值，这在mongo shell上很简单，如下所示：

var allKeys = new Set()
db.collectionName.find().forEach( function (o) {for (key in o ) allKeys.add(key)})
for(let key of allKeys) print(key)

这将在集合名称collectionName中打印所有可能的唯一键。

— 克里希纳·普拉萨德（Krishna Prasad）
source

3

这对我来说很好：

var arrayOfFieldNames = [];

var items = db.NAMECOLLECTION.find();

while(items.hasNext()) {
  var item = items.next();
  for(var index in item) {
    arrayOfFieldNames[index] = index;
   }
}

for (var index in arrayOfFieldNames) {
  print(index);
}

— 确认者
source

3

我想提到的最好办法做到这一点这里是mongod的3.4.4+但没有使用的$unwind运营商，并使用管道只有两个阶段。相反，我们可以使用$mergeObjects和$objectToArray运算符。

在$group阶段中，我们使用$mergeObjects运算符返回单个文档，其中键/值来自集合中的所有文档。

然后是$project我们使用$map并$objectToArray返回密钥的地方。

let allTopLevelKeys =  [
    {
        "$group": {
            "_id": null,
            "array": {
                "$mergeObjects": "$$ROOT"
            }
        }
    },
    {
        "$project": {
            "keys": {
                "$map": {
                    "input": { "$objectToArray": "$array" },
                    "in": "$$this.k"
                }
            }
        }
    }
];

现在，如果我们有一个嵌套的文档，并且还想获取密钥，那么这是可行的。为了简单起见，让我们考虑一个包含简单嵌入式文档的文档，如下所示：

{field1: {field2: "abc"}, field3: "def"}
{field1: {field3: "abc"}, field4: "def"}

以下管道产生所有键（field1，field2，field3，field4）。

let allFistSecondLevelKeys = [
    {
        "$group": {
            "_id": null,
            "array": {
                "$mergeObjects": "$$ROOT"
            }
        }
    },
    {
        "$project": {
            "keys": {
                "$setUnion": [
                    {
                        "$map": {
                            "input": {
                                "$reduce": {
                                    "input": {
                                        "$map": {
                                            "input": {
                                                "$objectToArray": "$array"
                                            },
                                            "in": {
                                                "$cond": [
                                                    {
                                                        "$eq": [
                                                            {
                                                                "$type": "$$this.v"
                                                            },
                                                            "object"
                                                        ]
                                                    },
                                                    {
                                                        "$objectToArray": "$$this.v"
                                                    },
                                                    [
                                                        "$$this"
                                                    ]
                                                ]
                                            }
                                        }
                                    },
                                    "initialValue": [

                                    ],
                                    "in": {
                                        "$concatArrays": [
                                            "$$this",
                                            "$$value"
                                        ]
                                    }
                                }
                            },
                            "in": "$$this.k"
                        }
                    }
                ]
            }
        }
    }
]

稍作努力，我们就可以在元素也是对象的数组字段中获取所有子文档的键。

— 苯乙烯
source

是的，$unwind将会爆炸集合（字段数量*文档数量），我们可以通过$mergeObjects在所有版本>上使用来避免这种情况3.6。同样，在以前应该已经看过这个答案的情况下，我的生活会更轻松（ -_-）

— whoami

3

也许有些偏离主题，但是您可以递归地漂亮打印对象的所有键/字段：

function _printFields(item, level) {
    if ((typeof item) != "object") {
        return
    }
    for (var index in item) {
        print(" ".repeat(level * 4) + index)
        if ((typeof item[index]) == "object") {
            _printFields(item[index], level + 1)
        }
    }
}

function printFields(item) {
    _printFields(item, 0)
}

当集合中的所有对象具有相同的结构时很有用。

— ed
source

1

要获得所有键减的列表_id，请考虑运行以下聚合管道：

var keys = db.collection.aggregate([
    { "$project": {
       "hashmaps": { "$objectToArray": "$$ROOT" } 
    } }, 
    { "$project": {
       "fields": "$hashmaps.k"
    } },
    { "$group": {
        "_id": null,
        "fields": { "$addToSet": "$fields" }
    } },
    { "$project": {
            "keys": {
                "$setDifference": [
                    {
                        "$reduce": {
                            "input": "$fields",
                            "initialValue": [],
                            "in": { "$setUnion" : ["$$value", "$$this"] }
                        }
                    },
                    ["_id"]
                ]
            }
        }
    }
]).toArray()[0]["keys"];

— 克列丹
source

0

我试图用nodejs编写，最后想到了这个：

db.collection('collectionName').mapReduce(
function() {
    for (var key in this) {
        emit(key, null);
    }
},
function(key, stuff) {
    return null;
}, {
    "out": "allFieldNames"
},
function(err, results) {
    var fields = db.collection('allFieldNames').distinct('_id');
    fields
        .then(function(data) {
            var finalData = {
                "status": "success",
                "fields": data
            };
            res.send(finalData);
            delteCollection(db, 'allFieldNames');
        })
        .catch(function(err) {
            res.send(err);
            delteCollection(db, 'allFieldNames');
        });
 });

阅读新创建的集合“ allFieldNames”后，将其删除。

db.collection("allFieldNames").remove({}, function (err,result) {
     db.close();
     return; 
});

— 高塔姆
source

0

根据mongoldb 文档，distinct

在单个集合或视图中查找指定字段的不同值，并将结果返回到数组中。

和索引收集操作将返回给定键或索引的所有可能值：

返回一个数组，该数组包含一个文档列表，这些文档标识并描述集合中的现有索引

因此，在给定的方法中，可以使用类似于下一个的方法，以便查询集合中所有已注册索引的内容，并返回一个带有索引的对象作为键（此示例对NodeJS使用async / await，但是显然，您可以使用任何其他异步方法）：

async function GetFor(collection, index) {

    let currentIndexes;
    let indexNames = [];
    let final = {};
    let vals = [];

    try {
        currentIndexes = await collection.indexes();
        await ParseIndexes();
        //Check if a specific index was queried, otherwise, iterate for all existing indexes
        if (index && typeof index === "string") return await ParseFor(index, indexNames);
        await ParseDoc(indexNames);
        await Promise.all(vals);
        return final;
    } catch (e) {
        throw e;
    }

    function ParseIndexes() {
        return new Promise(function (result) {
            let err;
            for (let ind in currentIndexes) {
                let index = currentIndexes[ind];
                if (!index) {
                    err = "No Key For Index "+index; break;
                }
                let Name = Object.keys(index.key);
                if (Name.length === 0) {
                    err = "No Name For Index"; break;
                }
                indexNames.push(Name[0]);
            }
            return result(err ? Promise.reject(err) : Promise.resolve());
        })
    }

    async function ParseFor(index, inDoc) {
        if (inDoc.indexOf(index) === -1) throw "No Such Index In Collection";
        try {
            await DistinctFor(index);
            return final;
        } catch (e) {
            throw e
        }
    }
    function ParseDoc(doc) {
        return new Promise(function (result) {
            let err;
            for (let index in doc) {
                let key = doc[index];
                if (!key) {
                    err = "No Key For Index "+index; break;
                }
                vals.push(new Promise(function (pushed) {
                    DistinctFor(key)
                        .then(pushed)
                        .catch(function (err) {
                            return pushed(Promise.resolve());
                        })
                }))
            }
            return result(err ? Promise.reject(err) : Promise.resolve());
        })
    }

    async function DistinctFor(key) {
        if (!key) throw "Key Is Undefined";
        try {
            final[key] = await collection.distinct(key);
        } catch (e) {
            final[key] = 'failed';
            throw e;
        }
    }
}

因此，使用基本_id索引查询集合将返回以下内容（测试集合在测试时只有一个文档）：

Mongo.MongoClient.connect(url, function (err, client) {
    assert.equal(null, err);

    let collection = client.db('my db').collection('the targeted collection');

    GetFor(collection, '_id')
        .then(function () {
            //returns
            // { _id: [ 5ae901e77e322342de1fb701 ] }
        })
        .catch(function (err) {
            //manage your error..
        })
});

请注意，这使用NodeJS驱动程序固有的方法。正如其他答案所暗示的那样，还有其他方法，例如聚合框架。我个人认为此方法更灵活，因为您可以轻松创建和微调返回结果的方式。显然，这仅处理顶级属性，而不处理嵌套属性。另外，为了确保所有文档都应具有辅助索引（主_id除外），这些索引应设置为required。

— 吉尔默夫
source

0

我们可以通过使用mongo js文件来实现。在您的getCollectionName.js文件中添加以下代码，并在Linux的控制台中运行js文件，如下所示：

mongo --host 192.168.1.135 getCollectionName.js

db_set = connect("192.168.1.135:27017/database_set_name"); // for Local testing
// db_set.auth("username_of_db", "password_of_db"); // if required

db_set.getMongo().setSlaveOk();

var collectionArray = db_set.getCollectionNames();

collectionArray.forEach(function(collectionName){

    if ( collectionName == 'system.indexes' || collectionName == 'system.profile' || collectionName == 'system.users' ) {
        return;
    }

    print("\nCollection Name = "+collectionName);
    print("All Fields :\n");

    var arrayOfFieldNames = []; 
    var items = db_set[collectionName].find();
    // var items = db_set[collectionName].find().sort({'_id':-1}).limit(100); // if you want fast & scan only last 100 records of each collection
    while(items.hasNext()) {
        var item = items.next(); 
        for(var index in item) {
            arrayOfFieldNames[index] = index;
        }
    }
    for (var index in arrayOfFieldNames) {
        print(index);
    }

});

quit();

谢谢@ackuser

— 伊尔沙德·汗（Irshad Khan）
source

0

遵循@James Cropcho的回答，我发现以下内容非常易于使用。这是一个二进制工具，这正是我在寻找的东西： mongoeye。

使用此工具需要大约2分钟的时间才能从命令行导出我的架构。

— paneer_tikka
source

0

我知道这个问题已有10年历史了，但是没有C＃解决方案，这花了我几个小时才弄清楚。我正在使用.NET驱动程序并System.Linq返回键列表。

var map = new BsonJavaScript("function() { for (var key in this) { emit(key, null); } }");
var reduce = new BsonJavaScript("function(key, stuff) { return null; }");
var options = new MapReduceOptions<BsonDocument, BsonDocument>();
var result = await collection.MapReduceAsync(map, reduce, options);
var list = result.ToEnumerable().Select(item => item["_id"].ToString());

— 安德鲁·萨莫尔
source

-1

我扩展了Carlos LM的解决方案，因此更加详细。

模式示例：

var schema = {
    _id: 123,
    id: 12,
    t: 'title',
    p: 4.5,
    ls: [{
            l: 'lemma',
            p: {
                pp: 8.9
            }
        },
         {
            l: 'lemma2',
            p: {
               pp: 8.3
           }
        }
    ]
};

输入控制台：

var schemafy = function(schema, i, limit) {
    var i = (typeof i !== 'undefined') ? i : 1;
    var limit = (typeof limit !== 'undefined') ? limit : false;
    var type = '';
    var array = false;

    for (key in schema) {
        type = typeof schema[key];
        array = (schema[key] instanceof Array) ? true : false;

        if (type === 'object') {
            print(Array(i).join('    ') + key+' <'+((array) ? 'array' : type)+'>:');
            schemafy(schema[key], i+1, array);
        } else {
            print(Array(i).join('    ') + key+' <'+type+'>');
        }

        if (limit) {
            break;
        }
    }
}

跑：

schemafy(db.collection.findOne());

输出量

_id <number>
id <number>
t <string>
p <number>
ls <object>:
    0 <object>:
    l <string>
    p <object>:
        pp <number>

— va5ja
source

3

他的回答是错误的，而您却以此为基础。重点是输出所有文档的所有字段，而不是第一个文档的字段可能与下一个文档不同。

— Asya Kamsky

-3

我有1个更简单的解决方法...

您可以做的是在将数据/文档插入主集合“事物”时，必须在1个单独的集合中插入属性，例如“ things_attributes”。

因此，每次插入“事物”时，您都会从“ things_attributes”中获得该文档的值与新文档密钥的比较，如果存在新密钥，则将该文档附加到该文档中，然后再次将其重新插入。

因此Things_attributes只有1个唯一键文档，您可以通过使用findOne（）在需要时轻松获得

— Paresh Behede
source

对于具有许多条目的数据库，其中所有键的查询都很频繁，而插入却很少，因此缓存“获取所有键”查询的结果将是有意义的。这是做到这一点的一种方法。

— 斯科特，