I need a scheduler job which will execute every 5 mins and process next 100 records from a mongodb collection. It should start from the document which was inserted first. So, in the first run, i can sort the data in ascending order and get the first 100 documents. But for the consecutive runs, how can i retrieve the next 100 records giving the last processed document's object id? (i'm not sure how to use the object id here as it is a generating string with different parameters... i don't have any other id defined.)
我需要一个调度器任务,它将每5分钟执行一次,并处理mongodb集合中的下100条记录。它应该从首先插入的文档开始。因此,在第一次运行中,我可以按升序对数据进行排序,并获得前100个文档。但是对于连续的运行,如何检索下100条记录,并给出最后一个已处理文档的对象id?(我不知道如何使用这里的对象id,因为它是一个具有不同参数的生成字符串……)我没有定义其他的id。
If this is not a good way to retrieve records from mongodb for a large data set, please suggest a better way.
如果这不是从mongodb获取大数据集的记录的好方法,请建议更好的方法。
Each document looks like below:
每个文件如下:
{ "_id" : { "$oid" : "51ff17c8e4b02969f18e72bb"} , "source_of_info" : "somesource" ,
"entityinfo" : [ { "user" : "Alfredo Vela Zancada" , "social_network_entity_id" :
364221775325822977 , "text" : "blah blah blah" , "created_at" : { "$date" : "2013-08-
05T03:10:12.000Z"}}] , "relatedURLs" : [ { "url" : "http://t.co/swqP3FYQt5"
,"expanded_url" : "http://ow.ly/nCkIS"}]}
Thanks.
谢谢。
1 个解决方案
#1
3
If you keep track of which iteration you're on you could use something like:
如果您跟踪您正在进行的迭代,您可以使用以下内容:
db.users.find().limit(100).skip(1200)
db.users.find().limit(100).skip(1200)
Another solution might be to add a 'processed' flag to each entry. Default it to false. Then do a findAndModify when you get the next 100 where processed is false, and modify them to now be true.
另一种解决方案可能是向每个条目添加“已处理”标志。缺省为false。然后在得到处理为false的下一个100时执行findAndModify,并将它们修改为true。
#1
3
If you keep track of which iteration you're on you could use something like:
如果您跟踪您正在进行的迭代,您可以使用以下内容:
db.users.find().limit(100).skip(1200)
db.users.find().limit(100).skip(1200)
Another solution might be to add a 'processed' flag to each entry. Default it to false. Then do a findAndModify when you get the next 100 where processed is false, and modify them to now be true.
另一种解决方案可能是向每个条目添加“已处理”标志。缺省为false。然后在得到处理为false的下一个100时执行findAndModify,并将它们修改为true。