I am using MongoTool
runner to import the data from mongoDB to Hadoop mapreduce jobs. Due to the size of the data i am getting OutOfMemoryError
. So i want to limit the number of records i fetch in a batch fashion.
我使用MongoTool运行器将数据从mongoDB导入到Hadoop mapreduce作业。由于数据的大小我得到了OutOfMemoryError。所以我想限制我以批量方式获取的记录数。
MongoConfigUtil.setQuery()
can set only the query but i cannot set the size to limit the number of records fetched. What i am looking for is something like
只能设置查询,但我无法设置大小来限制提取的记录数。我正在寻找的是类似的东西
MongoConfigUtil.setBatchSize() and then MongoConfigUtil.getNextBatch()
MongoConfigUtil.setBatchSize()然后是MongoConfigUtil.getNextBatch()
something like that.
类似的东西。
Kindly suggest.
1 个解决方案
#1
You can use the setLimit method of the class MongoInputSplit, passing the number of document that you want to fetch.
您可以使用MongoInputSplit类的setLimit方法,传递要获取的文档数。
myMongoInputSplitObj = new MongoInputSplit(*param*)
myMongoInputSplitObj.setLimit(100)
MongoConfigUtil setLimit
Allow users to set the limit on MongoInputSplits (HADOOP-267).
MongoConfigUtil setLimit允许用户设置MongoInputSplits(HADOOP-267)的限制。
#1
You can use the setLimit method of the class MongoInputSplit, passing the number of document that you want to fetch.
您可以使用MongoInputSplit类的setLimit方法,传递要获取的文档数。
myMongoInputSplitObj = new MongoInputSplit(*param*)
myMongoInputSplitObj.setLimit(100)
MongoConfigUtil setLimit
Allow users to set the limit on MongoInputSplits (HADOOP-267).
MongoConfigUtil setLimit允许用户设置MongoInputSplits(HADOOP-267)的限制。