跟着时下炒得火热的NOSQL潮流,学习了一下mongodb,记录在此,希望与感兴趣的同学一起研究!
MongoDB概述
mongodb由C++写就,其名字来自humongous这个单词的中间部分,是由10gen开发并维护的,关于它的一个最简洁描述为:scalable, high-performance, open source, schema-free, document-oriented database。MongoDB的主要目标是在键/值存储方式(提供了高性能和高度伸缩性)以及传统的RDBMS系统(丰富的功能)架起一座桥梁,集两者的优势于一身。
MongoDB特性:
l 面向文档存储
l 全索引支持,扩展到内部对象和内嵌数组
l 复制和高可用
l 自动分片支持云级扩展性
l 查询记录分析
l 动态查询
l 快速,就地更新
l 支持Map/Reduce操作
l GridFS文件系统
l 商业支持,培训和咨询
配置
Master-slaves 模式
机器 | IP | 角色 |
test001 | 192.168.1.1 | master |
test002 | 192.168.1.2 | slave |
test003 | 192.168.1.3 | slave |
test004 | 192.168.1.4 | slave |
test005 | 192.168.1.5 | slave |
test006 | 192.168.1.6 | slave |
启动master:
1 | ./mongod |
添加repl用户:
123 | ./mongo >use > |
启动slaves:
12 | ./mongod -fork |
添加repl用户:
123 | ./mongo >use > |
autoresync 参数会在系统发生意外情况造成主从数据不同步时,自动启动复制操作 (同步复制 10 分钟内仅执行一次)。除此之外,还可以用 –slavedelay 设定更新频率(秒)。
通常我们会使用主从方案实现读写分离,但需要设置 Slave_OK。
slaveOk
When querying a replica pair or replica set, drivers route their requests to the master mongod by default; to perform a query against an (arbitrarily-selected) slave, the query can be run with the slaveOk option. Here’s how to do so in the shell:
db.getMongo().setSlaveOk(); // enable querying a slave
db.users.find(...)
Note: some language drivers permit specifying the slaveOk option on each find(), others make this a connection-wide setting. See your language’s driver for details.
Replica Set模式
Replica Sets 使用 n 个 Mongod 节点,构建具备自动容错转移(auto-failover)、自动恢复(auto-recovery) 的高可用方案。
机器 | IP | 角色 |
test001 | 192.168.1.1 | secondary |
test002 | 192.168.1.2 | secondary |
test003 | 192.168.1.3 | primary |
test004 | 192.168.1.4 | secondary |
test005 | 192.168.1.5 | secondary |
test006 | 192.168.1.6 | secondary |
test007 | 192.168.1.7 | secondary |
启动:
1 | ./mongod |
添加repl用户:
123 | ./mongo >use > |
配置:
12345678910 | config={_id:'set1',members:[ {_id:0,host:'test001:27017'}, {_id:1,host:'test002:27017'}, {_id:2,host:'test003:27017'}, {_id:3,host:'test004:27017'}, {_id:4,host:'test005:27017'}, {_id:5,host:'test006:27017'}, {_id:6,host:'test007:27017'}] } rs.initiate(config); |
查看:
访问 http://test001 :28017/_replSet
或者
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465 | ./mongo > { "set" "date" "myState" "members" { "_id" "name" "health" "state" "self" }, { "_id" "name" "health" "state" "uptime" "lastHeartbeat" }, { "_id" "name" "health" "state" "uptime" "lastHeartbeat" }, { "_id" "name" "health" "state" "uptime" "lastHeartbeat" }, { "_id" "name" "health" "state" "uptime" "lastHeartbeat" }, { "_id" "name" "health" "state" "uptime" "lastHeartbeat" }, { "_id" "name" "health" "state" "uptime" "lastHeartbeat" } ], "ok" } |
在Replica Sets上做操作后调用getlasterror使写操作同步到至少3台机器后才返回
db.runCommand( { getlasterror : 1 , w : 3 } )
注:该模式不支持auth功能,需要auth功能请选择m-s模式
Sharding模式
要构建一个 MongoDB Sharding Cluster,需要三种角色:
- Shard Server: mongod 实例,用于存储实际的数据块。
- Config Server: mongod 实例,存储了整个 Cluster Metadata,其中包括 chunk 信息。
- Route Server: mongos 实例,前端路由,客户端由此接入,且让整个集群看上去像单一进程数据库。
机器 | IP | 角色 |
test002 | 192.168.1.2 | mongod shard11:27017 |
test003 | 192.168.1.3 | mongod shard21:27017 |
test004 | 192.168.1.4 | mongod shard31:27017 |
test005 | 192.168.1.5 | mongod config1:20000 mongs1:30000 |
test006 | 192.168.1.6 | mongod config2:20000 mongs2:30000 |
test007 | 192.168.1.7 | mongod config3:20000 mongs3:30000 |
test008 | 192.168.1.8 | mongod shard12:27017 |
test009 | 192.168.1.9 | mongod shard22:27017 |
test010 | 192.168.1.10 | mongod shard32:27017 |
Shard配置
Shard1
[test002; test008]
test002:
1 | ./mongod |
test008:
1 | ./mongod |
初始化shard1
12345 | config={_id:'shard1',members:[ {_id:0,host:'test002:27017'}, {_id:1,host:'test008:27017'}] } rs.initiate(config); |
Shard2
[test003; test009]
test003:
1 | ./mongod |
test009:
1 | ./mongod |
初始化shard2
12345 | config={_id:'shard2',members:[ {_id:0,host:'test003:27017'}, {_id:1,host:'test009:27017'}] } rs.initiate(config); |
Shard3
[test004; test010]
test004:
1 | ./mongod |
test010:
1 | ./mongod |
初始化shard3
12345 | config={_id:'shard3',members:[ {_id:0,host:'test004:27017'}, {_id:1,host:'test010:27017'}] } rs.initiate(config); |
config server配置
[test005; test006; test007]
1 | ./mongod |
Mongos配置
[test005; test006; test007]
1 | ./mongos |
Route 转发请求到实际的目标服务进程,并将多个结果合并回传给客户端。Route 本身并不存储任何数据和状态,仅在启动时从 Config Server 获取信息。Config Server 上的任何变动都会传递给所有的 Route Process。
Configuring the Shard Cluster
1. 连接admin数据库
1 | ./mongo |
2. 加入shards
123 | db.runCommand({addshard:"shard1/test002:27017,test008:27017",name:"s1",maxsize:20480}); db.runCommand({addshard:"shard2/test003:27017,test009:27017",name:"s2",maxsize:20480}); db.runCommand({addshard:"shard3/test004:27017,test010:27017",name:"s3",maxsize:20480}); |
3. Listing shards
1 | db.runCommand({listshards:1}) |
如果列出了以上3个shards,表示shards已经配置成功
4. 激活数据库和表分片
12 | db.runCommand({enablesharding:"taobao"}); db.runCommand({shardcollection:"taobao.test0",key:{_id:1}}); |
使用
shell操作数据库
超级用户相关:
1) 进入数据库admin
1 | use |
2) 增加或修改用户密码
1 | db.addUser('name','pwd') |
3) 查看用户列表
1 | db.system.users.find() |
4) 用户认证
1 | db.auth('name','pwd') |
5) 删除用户
1 | db.removeUser('name') |
6) 查看所有用户
1 | show |
7) 查看所有数据库
1 | show |
8) 查看所有的collection
1 | show |
9) 查看各collection的状态
1 | db.printCollectionStats() |
10) 查看主从复制状态
1 | db.printReplicationInfo() |
11) 修复数据库
1 | db.repairDatabase() |
12) 设置记录profiling,0=off 1=slow 2=all
1 | db.setProfilingLevel(1) |
13) 查看profiling
1 | show |
14) 拷贝数据库
1 | db.copyDatabase('mail_addr','mail_addr_tmp') |
15) 删除collection
1 | db.mail_addr.drop() |
16) 删除当前的数据库
1 | db.dropDatabase() |
增加删除修改:
1) Insert
123 | db.user.insert({'name':'dump','age':1}) or db.user.save({'name':'dump','age':1}) |
嵌套对象:
1 | db.foo.save({'name':'dump','address':{'city':'hangzhou','post':310015},'phone':[138888888,13999999999]}) |
数组对象:
1 | db.user_addr.save({'Uid':'dump','Al':['test-1@taobao.com','test-2@taobao.com']}) |
2) delete
删除name=’dump’的用户信息:
1 | db.user.remove({'name':'dump'}) |
删除foo表所有信息:
1 | db.foo.remove() |
3) update
//update foo set xx=4 where yy=6
//如果不存在则插入,允许修改多条记录
1 | db.foo.update({'yy':6},{'$set':{'xx':4}},upsert=true,multi=true) |
查询:
12345678 | coll.find() coll.find().limit(10) coll.find().sort({x:1}) coll.find().sort({x:1}).skip(5).limit(10) coll.find({x:10}) coll.find({x: coll.find({}, coll.count() |
其他:
12345 | coll.find({"address.city":"gz"}) coll.find({likes:"math"}) coll.find({name: coll.find({phone: coll.find({name: |
索引:
1(ascending),-1(descending)
1234567 | coll.ensureIndex({productid:1}) coll.ensureIndex({district:1, coll.ensureIndex({"address.city":1}) coll.ensureIndex({productid:1}, coll.ensureIndex({productid:1}, coll.getIndexes() coll.dropIndex({productid:1}) |
MongoDB Drivers
Mongodb支持的client 编程api非常多,由于dump中心是建立在hadoop的基础上的,所以着重介绍java api,后面的测试程序采用的也是java api.
MongoDB in Java
下载MongoDB的Java驱动,把jar包(mongo-2.3.jar)扔到项目里去就行了,
Java中,Mongo对象是线程安全的,一个应用中应该只使用一个Mongo对象。Mongo对象会自动维护一个连接池,默认连接数为10。
123456789101112131415161718192021222324252627282930313233343536373839404142434445 | import try{ Mongo DB if db.authenticate("name", } DBCollection coll.slaveOk();//repl //insert BasicDBObject //赋值 doc.put("name", doc.put("type", coll.insert(doc); …… //select //查询一条数据 BasicDBObject doc.put("name", DBObject …… //使用游标查询 DBCursor while(cur.hasNext()) cur.next(); …… } …… //update DBObject DBObject qlist.put("_id", dblist.put("t1", coll.update(qlist, …… //delete DBObject dlist.put("_id", coll.remove(dlist); }catch(MongoException } |
MongoDB 测试
测试版本: 1.6.3
采用单线程分别插入100万,300万,500万,1000万数据和多个线程,每线程插入100万数据.
插入数据格式:
1 | { |
1) Master slaves模式
Insert
Per-thread rows | run time | Per-thread insert | Total-insert | Total rows | threads |
1000000 | 20 | 50000 | 50000 | 1000000 | 1 |
3000000 | 60 | 50000 | 50000 | 3000000 | 1 |
5000000 | 99 | 50505 | 50505 | 5000000 | 1 |
8000000 | 159 | 50314 | 50314 | 8000000 | 1 |
10000000 | 208 | 48076 | 48076 | 10000000 | 1 |
1000000 | 64 | 15625 | 31250 | 2000000 | 2 |
Mongodb只有主节点才能进行插入和更新操作.
Update
数据格式:
1 | { |
Per-thread rows | run time | Per-thread update | Total-update | Total rows | threads |
1000000 | 96 | 10416 | 10416 | 1000000 | 1 |
3000000 | 287 | 10452 | 10452 | 3000000 | 1 |
1000000 | 188 | 5319 | 15957 | 3000000 | 3 |
1000000 | 351 | 2849 | 14245 | 5000000 | 5 |
Select
以”_id”字段为key,返回整条记录
a) 客户端:单机多线程
Per-thread rows | run time | Per-thread select | Total-select | Total rows | threads |
1000000 | 72 | 13888 | 13888 | 1000000 | 1 |
1000000 | 129 | 7751 | 77519 | 10000000 | 10 |
1000000 | 554 | 1805 | 90252 | 50000000 | 50 |
1000000 | 1121 | 892 | 89206 | 100000000 | 100 |
1000000 | 2256 | 443 | 88652 | 200000000 | 200 |
b) 客户端:分布式多线程
程序部署在39台机器上
Per-thread rows | run time | Per-thread select | Total-select | Total rows | threads |
1000000 | 173 | 5780 | 5780*39=223470 | 1000000*39 | 1 |
1000000 | 1402 | 713 | 7132*39=278148 | 10000000*39 | 10 |
500000 | 1406 | 355 | 7112*39=277368 | 10000000*39 | 20 |
200000 | 1433 | 139 | 6978*39=272142 | 10000000*39 | 50 |
2) Replica Set 模式
Insert
Per-thread rows | run time | Per-thread insert | Total-insert | Total rows | threads |
1000000 | 40 | 25000 | 25000 | 1000000 | 1 |
3000000 | 117 | 25641 | 25641 | 3000000 | 1 |
5000000 | 211 | 23696 | 23696 | 5000000 | 1 |
8000000 | 289 | 27681 | 27681 | 8000000 | 1 |
10000000 | 388 | 25773 | 25773 | 10000000 | 1 |
1000000 | 83 | 12048 | 24096 | 2000000 | 2 |
1000000 | 210 | 4762 | 23809 | 5000000 | 5 |
Update
Per-thread rows | run time | Per-thread update | Total-update | Total rows | threads |
1000000 | 28 | 35714 | 35714 | 1000000 | 1 |
3000000 | 83 | 36144 | 36144 | 3000000 | 1 |
1000000 | 146 | 6849 | 20547 | 3000000 | 3 |
1000000 | 262 | 3816 | 19083 | 5000000 | 5 |
Select
以”_id”字段为key,返回整条记录
a) 客户端:单机多线程
Per-thread rows | run time | Per-thread select | Total-select | Total rows | threads |
1000000 | 198 | 5050 | 5050 | 1000000 | 1 |
1000000 | 264 | 3787 | 37878 | 10000000 | 10 |
1000000 | 436 | 2293 | 114678 | 50000000 | 50 |
1000000 | 754 | 1326 | 132625 | 100000000 | 100 |
1000000 | 1526 | 655 | 131061 | 200000000 | 200 |
b) 客户端:分布式多线程
程序部署在39台机器上
Per-thread rows | run time | Per-thread select | Total-select | Total rows | threads |
1000000 | 216 | 4629 | 4629*39=180531 | 1000000*39 | 1 |
1000000 | 1375 | 729 | 7293*39=284427 | 10000000*39 | 10 |
500000 | 1469 | 340 | 6807*39=265473 | 10000000*39 | 20 |
200000 | 1561 | 128 | 6406*39=249834 | 10000000*39 | 50 |
3) Sharding 模式
Insert
Per-thread rows | run time | Per-thread insert | Total-insert | Total rows | threads |
1000000 | 58 | 17241 | 17241 | 1000000 | 1 |
3000000 | 180 | 16666 | 16666 | 3000000 | 1 |
5000000 | 373 | 13404 | 13404 | 5000000 | 1 |
2000000 | 234 | 8547 | 17094 | 4000000 | 2 |
2000000 | 447 | 4474 | 22371 | 10000000 | 5 |
Update
Per-thread rows | run time | Per-thread update | Total-update | Total rows | threads |
1000000 | 38 | 26315 | 26315 | 1000000 | 1 |
3000000 | 115 | 26086 | 26086 | 3000000 | 1 |
1000000 | 64 | 15625 | 46875 | 3000000 | 3 |
1000000 | 93 | 10752 | 53763 | 5000000 | 5 |
Select
以”_id”字段为key,返回整条记录
a) 客户端:单机多线程
Per-thread rows | run time | Per-thread select | Total-select | Total rows | threads |
1000000 | 277 | 3610 | 3610 | 1000000 | 1 |
1000000 | 456 | 2192 | 21929 | 10000000 | 10 |
1000000 | 1158 | 863 | 43177 | 50000000 | 50 |
1000000 | 2299 | 434 | 43497 | 100000000 | 100 |
b) 客户端:分布式多线程
程序部署在39台机器上
Per-thread rows | run time | Per-thread select | Total-select | Total rows | threads |
1000000 | 659 | 1517 | 1517*39= 59163 | 1000000*39 | 1 |
1000000 | 8540 | 117 | 1170*39=45630 | 10000000*39 | 10 |
小结:
Mongodb在M-S和Repl-Set模式下查询效率还是不错的,区别在于Repl-Set模式如果有primary节点挂掉,系统自己会选举出另一个primary节点,不会影响后续的使用,原来的主节点恢复后自动成为secondary节点,而M-S模式一旦master 节点挂掉需要手工将别的slaves 节点修改成master,另外Repl-Set模式最多只能有7个节点.
由于sharding模式查询速度下降明显,耗时太长,所以只测试了2轮,估计他的威力应该在数据量非常大的环境下才能体现出来吧,以上数据仅供参考,现在只是简单的进行了测试,接下来会对源码进行一下研究,欢迎和感兴趣的同学多多交流!
http://www.searchtb.com/2010/12/a-probe-into-the-mongodb.html