记一次MongoDB Map&Reduce入门操作

时间:2022-04-29 01:58:13
  • 需求说明

用Map&Reduce计算几个班级中,每个班级10岁和20岁之间学生的数量:

  • 需求分析

  • 学生表的字段:

db.students.insert({classid:1, age:14, name:'Tom'})

将classid随机1和2、age在8-25岁之间随机,name在3-7个字符之间随机。

  • 数据写入

  • 数据写入java脚本

往mrtask库中students写入1000万条数据:


package org.test;

import java.util.ArrayList;

import java.util.List;

import java.util.Random;

import com.mongodb.BasicDBObject;

import com.mongodb.DB;

import com.mongodb.DBCollection;

import com.mongodb.DBCursor;

import com.mongodb.DBObject;

import com.mongodb.MongoClient;

import com.mongodb.ServerAddress;

public class TestMongoDBReplSet {

    public static void main(String[] args) {

        try {

            List<ServerAddress> addresses = new ArrayList<ServerAddress>();

            ServerAddress address1 = new ServerAddress("172.16.16.89", 27017);

            addresses.add(address1);

            MongoClient client = new MongoClient(addresses);

            DB db = client.getDB("mrtask");

            DBCollection coll = db.getCollection("students");

            // 数据写入

            BasicDBObject object = new BasicDBObject();

            for (int i = 1; i <= 10000000; i++) {

                object.append("classid", 1 + (int) (Math.random() * 2));

                object.append("age", 8 + (int) (Math.random() * 17));

                object.append("name", getName());

                coll.insert(object);

                object.clear();

            }

        } catch (Exception e) {

            e.printStackTrace();

        }

    }

    public static String getName() {

        ArrayList list = new ArrayList();

        for (char c = 'a'; c <= 'z'; c++) {

            list.add(c);

        }

        String str = "";

        int end = 3 + (int) (Math.random() * 4);

        for (int i = 0; i < end; i++) {

            int num = (int) (Math.random() * 26);

            str = str + list.get(num);

        }

        return str;

    }

}

  

  • 查看数据写入

经查看,mrtask库中students表中有1000万条的数据:

[root@localhost bin]# ./mongo

MongoDB shell version: 2.6.11

connecting to: test

> show dbs

admin   (empty)

local   0.078GB

mrtask  3.952GB

test    0.453GB

> use mrtask

switched to db mrtask

> db.students.find().count()

10000000

  • Map&Reduce计算

  • Map计算

> mapfun = function(){emit(this.classid,1)}

  • Reduce计算

> reducefun=function (key, values) { var count = 0; values.forEach(function (v) {count += v;}); return count; }

> ff = function (key, value) { return {classid:key, count:value}; }

  • Result输出

> classid_res = db.runCommand({

mapreduce:"students",

map:mapfun,

reduce:reducefun,

out:"students_classid_res",

finalize:ff,

query:{age:{$gt:10,$lt:20}}

});

  • 计算结果

> db.students_classid_res.find()

{ "_id" : 1, "value" : { "classid" : 1, "count" : 2643128 } }

{ "_id" : 2, "value" : { "classid" : 2, "count" : 2650870 } }