I need to check if a find
statement returns a non-empty query.
我需要检查find语句是否返回非空查询。
What I was doing was the following:
我正在做的是以下内容:
query = collection.find({"string": field})
if not query: #do something
Then I realized that my if
statement was never executed because find
returns a cursor, either the query is empty or not.
然后我意识到我的if语句从未被执行,因为find返回一个游标,或者查询是否为空。
Therefore I checked the documentation and I find two methods that can help me:
因此,我查看了文档,找到了两种可以帮助我的方法:
-
count(with_limit_and_skip=False)
which (from the description):count(with_limit_and_skip = False)其中(来自描述):
Returns the number of documents in the results set for this query.
返回此查询的结果集中的文档数。
It seems a good way to check, but this means that I need to count all the results in cursor to know if it is zero or not, right? A little bit expensive?
这似乎是一种检查的好方法,但这意味着我需要计算光标中的所有结果,以确定它是否为零,对吧?有点贵吗?
-
retrieved
which (from the description):检索哪个(来自描述):
The number of documents retrieved so far.
到目前为止检索的文件数量。
I tested it on an empty query set and it returns zero, but it's not clear what it does and I don't know if it's right for me.
我在一个空的查询集上测试它,它返回零,但它不清楚它做了什么,我不知道它是否适合我。
So, which is the best way (best practice) to check if a find()
query returns an empty set or not? Is one of the methods described above right for this purpose? And what about performance? Are there other ways to do it?
那么,检查find()查询是否返回空集的最佳方法(最佳实践)是什么?上面描述的方法之一是否适用于此目的?性能怎么样?还有其他方法吗?
Just to be clear: I need to know if the query is empty and I'd like to find the best way with the cursor with respect to performance and being pythonic.
需要明确的是:我需要知道查询是否为空,我想找到关于性能和pythonic的光标的最佳方法。
4 个解决方案
#1
29
EDIT: While this was true in 2014, modern versions of pymongo and MongoDB have changed this behaviour. Buyer beware:
编辑:虽然这在2014年是正确的,但现代版本的pymongo和MongoDB已经改变了这种行为。买家要小心:
.count()
is the correct way to find the number of results that are returned in the query. The count()
method does not exhaust the iterator for your cursor, so you can safely do a .count()
check before iterating over the items in the result set.
.count()是查找查询中返回的结果数的正确方法。 count()方法不会耗尽游标的迭代器,因此您可以在迭代结果集中的项之前安全地执行.count()检查。
Performance of the count method was greatly improved in MongoDB 2.4. The only thing that could slow down your count
is if the query has an index set on it, or not. To find out if you have an index on the query, you can do something like
MongoDB 2.4中计数方法的性能得到了极大的提高。唯一可以减慢计数的是查询是否设置了索引。要查明您是否有查询索引,您可以执行类似的操作
query = collection.find({"string": field})
print query.explain()
If you see BasicCursor
in the result, you need an index on your string
field for this query.
如果在结果中看到BasicCursor,则需要在此字符串字段中为此查询添加索引。
EDIT: as @alvapan pointed out, pymongo deprecated this method in pymongo 3.7+ and now prefers you to use count_documents
in a separate query.
编辑:正如@alvapan所指出的,pymongo在pymongo 3.7+中弃用了这个方法,现在更喜欢在单独的查询中使用count_documents。
item_count = collection.count_documents({"string": field})
The right way to count the number of items you've returned on a query is to check the .retreived
counter on the query after you iterate over it, or to enumerate
the query in the first place:
计算查询返回的项目数的正确方法是在迭代后检查查询中的.retreived计数器,或者首先枚举查询:
# Using .retrieved
query = collection.find({"string": field})
for item in query:
print(item)
print('Located {0:,} item(s)'.format(query.retrieved))
Or, another way:
或者,另一种方式:
# Using the built-in enumerate
query = collection.find({"string": field})
for index, item in enumerate(query):
print(item)
print('Located {0:,} item(s)'.format(index+1))
#2
7
How about just using find_one
instead of find
? Then you can just check whether you got a result or None
. And if "string" is indexed, you can pass fields = {"string":1, "_id" :0}
, and thus make it an index-only query, which is even faster.
如何使用find_one而不是find?然后你可以检查你是否有结果或无。如果索引“string”,则可以传递fields = {“string”:1,“_ id”:0},从而使其成为仅索引查询,甚至更快。
#3
3
Another solution is converting cursor to list, if the cursor doesn't have any data then empty list else list contains all data.
另一种解决方案是将光标转换为列表,如果光标没有任何数据,则空列表其他列表包含所有数据。
doc_list = collection.find({}); #find all data
have_list = True if len(list(doc_list)) else False;
#4
2
From my tests, the quickest way is
从我的测试来看,最快的方法是
if query.first():
# do something
In [51]: %timeit query = MyMongoDoc.objects(); query.first()
100 loops, best of 3: 2.12 ms per loop
In [52]: %timeit query = MyMongoDoc.objects(); query.count()
100 loops, best of 3: 4.28 ms per loop
(Using MongoDB 2.6.7, 2015-03-26)
(使用MongoDB 2.6.7,2015-03-26)
#1
29
EDIT: While this was true in 2014, modern versions of pymongo and MongoDB have changed this behaviour. Buyer beware:
编辑:虽然这在2014年是正确的,但现代版本的pymongo和MongoDB已经改变了这种行为。买家要小心:
.count()
is the correct way to find the number of results that are returned in the query. The count()
method does not exhaust the iterator for your cursor, so you can safely do a .count()
check before iterating over the items in the result set.
.count()是查找查询中返回的结果数的正确方法。 count()方法不会耗尽游标的迭代器,因此您可以在迭代结果集中的项之前安全地执行.count()检查。
Performance of the count method was greatly improved in MongoDB 2.4. The only thing that could slow down your count
is if the query has an index set on it, or not. To find out if you have an index on the query, you can do something like
MongoDB 2.4中计数方法的性能得到了极大的提高。唯一可以减慢计数的是查询是否设置了索引。要查明您是否有查询索引,您可以执行类似的操作
query = collection.find({"string": field})
print query.explain()
If you see BasicCursor
in the result, you need an index on your string
field for this query.
如果在结果中看到BasicCursor,则需要在此字符串字段中为此查询添加索引。
EDIT: as @alvapan pointed out, pymongo deprecated this method in pymongo 3.7+ and now prefers you to use count_documents
in a separate query.
编辑:正如@alvapan所指出的,pymongo在pymongo 3.7+中弃用了这个方法,现在更喜欢在单独的查询中使用count_documents。
item_count = collection.count_documents({"string": field})
The right way to count the number of items you've returned on a query is to check the .retreived
counter on the query after you iterate over it, or to enumerate
the query in the first place:
计算查询返回的项目数的正确方法是在迭代后检查查询中的.retreived计数器,或者首先枚举查询:
# Using .retrieved
query = collection.find({"string": field})
for item in query:
print(item)
print('Located {0:,} item(s)'.format(query.retrieved))
Or, another way:
或者,另一种方式:
# Using the built-in enumerate
query = collection.find({"string": field})
for index, item in enumerate(query):
print(item)
print('Located {0:,} item(s)'.format(index+1))
#2
7
How about just using find_one
instead of find
? Then you can just check whether you got a result or None
. And if "string" is indexed, you can pass fields = {"string":1, "_id" :0}
, and thus make it an index-only query, which is even faster.
如何使用find_one而不是find?然后你可以检查你是否有结果或无。如果索引“string”,则可以传递fields = {“string”:1,“_ id”:0},从而使其成为仅索引查询,甚至更快。
#3
3
Another solution is converting cursor to list, if the cursor doesn't have any data then empty list else list contains all data.
另一种解决方案是将光标转换为列表,如果光标没有任何数据,则空列表其他列表包含所有数据。
doc_list = collection.find({}); #find all data
have_list = True if len(list(doc_list)) else False;
#4
2
From my tests, the quickest way is
从我的测试来看,最快的方法是
if query.first():
# do something
In [51]: %timeit query = MyMongoDoc.objects(); query.first()
100 loops, best of 3: 2.12 ms per loop
In [52]: %timeit query = MyMongoDoc.objects(); query.count()
100 loops, best of 3: 4.28 ms per loop
(Using MongoDB 2.6.7, 2015-03-26)
(使用MongoDB 2.6.7,2015-03-26)