限制选择查询批量大小

I am using MongoTool runner to import the data from mongoDB to Hadoop mapreduce jobs. Due to the size of the data i am getting OutOfMemoryError. So i want to limit the number of records i fetch in a batch fashion.

我使用MongoTool运行器将数据从mongoDB导入到Hadoop mapreduce作业。由于数据的大小我得到了OutOfMemoryError。所以我想限制我以批量方式获取的记录数。

MongoConfigUtil.setQuery()

can set only the query but i cannot set the size to limit the number of records fetched. What i am looking for is something like

只能设置查询,但我无法设置大小来限制提取的记录数。我正在寻找的是类似的东西

MongoConfigUtil.setBatchSize() and then MongoConfigUtil.getNextBatch()

MongoConfigUtil.setBatchSize()然后是MongoConfigUtil.getNextBatch()

something like that.

类似的东西。

Kindly suggest.

1 个解决方案

#1

You can use the setLimit method of the class MongoInputSplit, passing the number of document that you want to fetch.

您可以使用MongoInputSplit类的setLimit方法,传递要获取的文档数。

myMongoInputSplitObj = new MongoInputSplit(*param*)
myMongoInputSplitObj.setLimit(100)

MongoConfigUtil setLimit
Allow users to set the limit on MongoInputSplits (HADOOP-267).

MongoConfigUtil setLimit允许用户设置MongoInputSplits(HADOOP-267)的限制。

#1