Following is how I query data from my mongodb using pymongo:
以下是我使用pymongo从我的mongodb查询数据的方法:
def is_philippine_facebook(self,facebook_user):
is_philippine = False
db_server = self.ConfigSectionMap('db_server')
database_name = db_server['database']
db = self.client[database_name]
cursor = db[collection_name].find({
'isPhilippine':True,
'facebook_user': re.compile('@'+facebook_user, re.IGNORECASE)
})
for document in cursor:
if document is not None:
is_philippine = True
break
return is_philippine
In fact, I want to query records having a certain facebook_user
with incasesensitive option. However, the query returns many incorrect results. For example, if facebook_user
is WWF
, records with WWF_XYZ
will be returned.
实际上,我想查询具有某个具有incasesensitive选项的facebook_user的记录。但是,查询返回许多不正确的结果。例如,如果facebook_user是WWF,则将返回包含WWF_XYZ的记录。
How can I fix this? Thanks.
我怎样才能解决这个问题?谢谢。
2 个解决方案
#1
0
Sounds like you want a word boundary \b
听起来像你想要一个单词边界\ b
'facebook_user': re.compile('@'+ facebook_user +'\\b', re.IGNORECASE)
So if you supply WWF
or wwf
then it only matches up to the end of the "word" and not beyond it.
因此,如果您提供WWF或wwf,那么它只匹配“单词”的结尾,而不是超出它。
As a note, case insensitive searches an searches not anchored with the caret ^
to the beginning of the string require a full collection scan and are not very efficient.
作为注释,不区分大小写的搜索不以插入符号^锚定的搜索到字符串的开头需要完整的集合扫描并且效率不高。
If matching to the beginning of a string you should use the caret, and you should probably normalize case as a document property for searching so you do not need the "case insensitive" option either. These two things are required for an index to be used on a search. See $regex
in the documentation
如果匹配到字符串的开头,则应使用插入符,并且您应该将case作为搜索的文档属性进行规范化,这样您也不需要“不区分大小写”选项。在搜索上使用索引需要这两件事。请参阅文档中的$ regex
#2
2
Use the following fix:
使用以下修复:
re.compile(r'@{0}\b'.format(facebook_user), re.IGNORECASE)
See the regex demo.
请参阅正则表达式演示。
Pattern details:
-
@WWF
- a literal@WWF
-
\b
- a word boundary (requires a char other than letter, digit or_
, or end of string after@WWF
)
@WWF - 文字@WWF
\ b - 单词边界(在@WWF之后需要除字母,数字或_之外的字符或字符串结尾)
If a facebook_user
may contain special chars, you need to use
如果facebook_user可能包含特殊字符,则需要使用
re.compile(r'(?<!\w)@{0}(?!\w)'.format(re.escape(facebook_user)), re.IGNORECASE)
However, the facebook_user
seems to only contain word chars, so a word boundary should really suffice in this case.
但是,facebook_user似乎只包含单词字符,因此在这种情况下,单词边界应该足够了。
#1
0
Sounds like you want a word boundary \b
听起来像你想要一个单词边界\ b
'facebook_user': re.compile('@'+ facebook_user +'\\b', re.IGNORECASE)
So if you supply WWF
or wwf
then it only matches up to the end of the "word" and not beyond it.
因此,如果您提供WWF或wwf,那么它只匹配“单词”的结尾,而不是超出它。
As a note, case insensitive searches an searches not anchored with the caret ^
to the beginning of the string require a full collection scan and are not very efficient.
作为注释,不区分大小写的搜索不以插入符号^锚定的搜索到字符串的开头需要完整的集合扫描并且效率不高。
If matching to the beginning of a string you should use the caret, and you should probably normalize case as a document property for searching so you do not need the "case insensitive" option either. These two things are required for an index to be used on a search. See $regex
in the documentation
如果匹配到字符串的开头,则应使用插入符,并且您应该将case作为搜索的文档属性进行规范化,这样您也不需要“不区分大小写”选项。在搜索上使用索引需要这两件事。请参阅文档中的$ regex
#2
2
Use the following fix:
使用以下修复:
re.compile(r'@{0}\b'.format(facebook_user), re.IGNORECASE)
See the regex demo.
请参阅正则表达式演示。
Pattern details:
-
@WWF
- a literal@WWF
-
\b
- a word boundary (requires a char other than letter, digit or_
, or end of string after@WWF
)
@WWF - 文字@WWF
\ b - 单词边界(在@WWF之后需要除字母,数字或_之外的字符或字符串结尾)
If a facebook_user
may contain special chars, you need to use
如果facebook_user可能包含特殊字符,则需要使用
re.compile(r'(?<!\w)@{0}(?!\w)'.format(re.escape(facebook_user)), re.IGNORECASE)
However, the facebook_user
seems to only contain word chars, so a word boundary should really suffice in this case.
但是,facebook_user似乎只包含单词字符,因此在这种情况下,单词边界应该足够了。