I am looking for the database/mechanism to store the data where I can write the data and read the data with high performance.
我正在寻找数据库/机制来存储数据,以便能够在其中编写数据并以高性能读取数据。
This storage is used to for storing the Logging like important information across multiple systems. Since it's critical data which will be logged, read performance should be pretty fast as these data will be used to show history. Since we never do update on them/delete on them/or do any kinda joins, I am looking for right solution.
Probably we might archive the data in long time but that's something ok to deal with.
这种存储用于在多个系统中存储重要信息之类的日志记录。由于它是将被记录的关键数据,因此读取性能应该非常快,因为这些数据将用于显示历史。由于我们从不更新它们/删除它们/或做任何形式的连接,我正在寻找正确的解决方案。也许我们可以长期保存数据,但这是可以处理的。
I tried looking at different sources to understand different NoSql databases, experts opinion is always better :)
我尝试寻找不同的来源来理解不同的NoSql数据库,专家的意见总是更好的:)
Must Have:
1. Fast Read without fail
2. Fast Write without fail
3. Random access Performance
4. Replication kinda feature, one goes down, immediately another should be up and working
5. Concurrent write/read data
Good to Have:
1. Search content like analysing the data for auditing with/without Indexes
Don't required:
1. Transactions are not required at all
2. Update never happens
3. Delete never happens
4. Joins are not required
Referred: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
提到:http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
3 个解决方案
#1
16
Be sure to consider Aerospike; Aerospike dominates in the adtech space where high throughput reads and writes are a required. Aerospike is frequently touted as having "the speed of Redis with the scalability of Cassandra." For searching/querying see Aerospike's secondary index documentation.
一定要考虑气峰;在需要高吞吐量读写的adtech领域,Aerospike占据了主导地位。Aerospike常被吹捧为“具有红色的速度和Cassandra的可扩展性”。搜索/查询请参阅Aerospike的二级索引文档。
For more information see the discussion/articles below:
有关更多信息,请参阅下面的讨论/文章:
- Aerospike vs Cassandra
- 喷管和钟vs卡桑德拉
- Aerospike vs Redis and Mongo
- Aerospike vs Redis和Mongo
- Aerospike Benchmarks
- 喷管和钟基准
Lastly verify the performance for yourself with the One million TPS on EC2 Instructions.
最后用EC2指令上的一百万TPS来验证性能。
#2
6
Let me be the Cassandra sponsor.
让我成为卡桑德拉的赞助人。
Disclaimer: I don't say Cassandra is better than the others because I don't even know so deeply mongo/redis/whatever and I don't want even come into this kind of stuffs.
免责声明:我并不是说卡桑德拉比其他人更好,因为我甚至都不了解蒙哥/瑞迪斯/任何东西,我甚至都不希望进入这类领域。
The reason why I suggest Cassandra is because your needs match perfectly with what Cassandra offers and your "don't required list" is a set of feature that are either not supported in Cassandra (joins for instances) or considered an anti-pattern (deletes and in some situations updates).
我之所以推荐Cassandra,是因为您的需求与Cassandra提供的和您的“不需要列表”是一组特性,这些特性在Cassandra中是不受支持的(实例连接),或者被认为是一种反模式(删除和在某些情况下更新)。
From your "Must Have" list, point by point
从你的“必须拥有”列表中,逐点列出
-
Fast Read without fail: Supported. You can choose the consistency level of each read operation deciding how much important is to retrieve the most fresh information and how much important is speed
快速阅读没有失败:支持。您可以选择每个读取操作的一致性级别,决定检索最新鲜的信息有多重要,速度有多重要
-
Fast Write without fail: Same as point 1
写得快而不失败:与第一点相同
-
Random access Performance: When coming in the Cassandra world you have to consider many parameters to get a random access performance but the most important that comes into my mind is the data model -- if you create a data model that scales horizontally (give a look here) and you avoid hotspots you get what you need. If you model your DB in a good way you should have O(1) for each operation since data are structured to be queried
随机访问性能:Cassandra世界到来的时候你必须考虑很多参数随机访问性能,但最重要的,进入我的心灵是数据模型——如果你创建一个数据模型尺度水平(给看这里),你得到你需要避免热点。如果您以一种良好的方式对数据库建模,那么每次操作都应该有O(1),因为数据是结构化的。
-
Replication: In this Cassandra is even better than what you might think. If one node goes down nothing changes to the cluster and everything(*) keep working perfectly. Cassandra spots no single point of failure. I can tell you with older Cassandra version I've had an uptime of more than 3 years
复制:在这个卡桑德拉甚至比你想象的更好。如果一个节点宕机,集群就不会发生任何变化,并且所有(*)都能正常工作。卡桑德拉没有发现单一的失败点。我可以告诉你,我有一个超过3年的旧卡桑德拉版本。
-
Concurrent write/read data: Cassandra uses the lww policy (last-write-wins) to handle concurrent writes on the same key. The system supports multiple read-write and with newer protocols also async operations.
并发写/读数据:Cassandra使用lww策略(last-write-wins)来处理对同一密钥的并发写。系统支持多种读写操作,新协议也支持异步操作。
There are lots of other interesting features Cassandra offers: linear horizontal scaling is the one I appreciate more but there is also the fact that you can know the instant in which every piece of data has been updated (the timestamp of lww), counters features and so on.
Cassandra提供的许多其他有趣的特性:线性水平扩展是我更欣赏的特性,但您也可以知道每一个数据片段被更新的瞬间(lww的时间戳)、计数器特性等等。
(*) - if you don't use Consistency Level All which, imho, should NEVER be used in such a system.
(*) -如果您不使用一致性级别,那么在这样的系统中,不应该使用所有的一致性级别。
#3
4
Here's a few more links on how you can span In-Memory with Disk (DRAM, SSM, and disk storage) w/ Aerospike:
这里还有一些关于如何使用磁盘(DRAM、SSM和磁盘存储)的内存空间的链接:
http://www.aerospike.com/hybrid-memory/
http://www.aerospike.com/hybrid-memory/
http://www.aerospike.com/docs/architecture/storage.html
http://www.aerospike.com/docs/architecture/storage.html
I think everyone is right in terms of matching the specific DB to your specific use case. For instance, Aerospike is optimal for key-value data. Other options might be better.
我认为在将特定的DB匹配到您的特定用例方面,每个人都是正确的。例如,Aerospike是键值数据的最佳选择。其他的选择可能更好。
By way of analogy, I'll always remember how, decades ago, a sister of mine once borrowed my computer and wrote her term paper in Microsoft Excel. Line after line was a different row of a spreadsheet. It looked ugly as heck, but, uh, okay. She got the task done. She cursed and swore at how difficult it was to edit the thing. No kidding!
打个比方,我还记得几十年前,我的一个妹妹借了我的电脑,用微软Excel写了学期论文。一行接着一行是一个不同的电子表格。看起来很丑,但是,好吧。她完成了任务。她咒骂着,咒骂着编辑这东西有多难。没有开玩笑!
Choosing the right NoSQL database for the right task will either make your job a breeze, or could cause you to curse a blue streak if you decided on the wrong basic tool for the task at hand.
为正确的任务选择正确的NoSQL数据库将使您的工作变得轻而易举,或者如果您决定为手头的任务选择错误的基本工具,则可能导致您的工作遭遇失败。
Of course, every vendor's going to defend their product. I think it's best the community answer the question. Here's another Stack Overflow thread answering a similar question:
当然,每个供应商都会保护他们的产品。我认为社区最好回答这个问题。下面是另一个回答类似问题的堆栈溢出线程:
Has anyone worked with Aerospike? How does it compare to MongoDB?
有人使用过Aerospike吗?它与MongoDB相比如何?
btw: Do you have any more specific insights for us on what type of problem you are trying to solve?
顺便问一下,对于你想要解决的问题,你有什么更具体的见解吗?
#1
16
Be sure to consider Aerospike; Aerospike dominates in the adtech space where high throughput reads and writes are a required. Aerospike is frequently touted as having "the speed of Redis with the scalability of Cassandra." For searching/querying see Aerospike's secondary index documentation.
一定要考虑气峰;在需要高吞吐量读写的adtech领域,Aerospike占据了主导地位。Aerospike常被吹捧为“具有红色的速度和Cassandra的可扩展性”。搜索/查询请参阅Aerospike的二级索引文档。
For more information see the discussion/articles below:
有关更多信息,请参阅下面的讨论/文章:
- Aerospike vs Cassandra
- 喷管和钟vs卡桑德拉
- Aerospike vs Redis and Mongo
- Aerospike vs Redis和Mongo
- Aerospike Benchmarks
- 喷管和钟基准
Lastly verify the performance for yourself with the One million TPS on EC2 Instructions.
最后用EC2指令上的一百万TPS来验证性能。
#2
6
Let me be the Cassandra sponsor.
让我成为卡桑德拉的赞助人。
Disclaimer: I don't say Cassandra is better than the others because I don't even know so deeply mongo/redis/whatever and I don't want even come into this kind of stuffs.
免责声明:我并不是说卡桑德拉比其他人更好,因为我甚至都不了解蒙哥/瑞迪斯/任何东西,我甚至都不希望进入这类领域。
The reason why I suggest Cassandra is because your needs match perfectly with what Cassandra offers and your "don't required list" is a set of feature that are either not supported in Cassandra (joins for instances) or considered an anti-pattern (deletes and in some situations updates).
我之所以推荐Cassandra,是因为您的需求与Cassandra提供的和您的“不需要列表”是一组特性,这些特性在Cassandra中是不受支持的(实例连接),或者被认为是一种反模式(删除和在某些情况下更新)。
From your "Must Have" list, point by point
从你的“必须拥有”列表中,逐点列出
-
Fast Read without fail: Supported. You can choose the consistency level of each read operation deciding how much important is to retrieve the most fresh information and how much important is speed
快速阅读没有失败:支持。您可以选择每个读取操作的一致性级别,决定检索最新鲜的信息有多重要,速度有多重要
-
Fast Write without fail: Same as point 1
写得快而不失败:与第一点相同
-
Random access Performance: When coming in the Cassandra world you have to consider many parameters to get a random access performance but the most important that comes into my mind is the data model -- if you create a data model that scales horizontally (give a look here) and you avoid hotspots you get what you need. If you model your DB in a good way you should have O(1) for each operation since data are structured to be queried
随机访问性能:Cassandra世界到来的时候你必须考虑很多参数随机访问性能,但最重要的,进入我的心灵是数据模型——如果你创建一个数据模型尺度水平(给看这里),你得到你需要避免热点。如果您以一种良好的方式对数据库建模,那么每次操作都应该有O(1),因为数据是结构化的。
-
Replication: In this Cassandra is even better than what you might think. If one node goes down nothing changes to the cluster and everything(*) keep working perfectly. Cassandra spots no single point of failure. I can tell you with older Cassandra version I've had an uptime of more than 3 years
复制:在这个卡桑德拉甚至比你想象的更好。如果一个节点宕机,集群就不会发生任何变化,并且所有(*)都能正常工作。卡桑德拉没有发现单一的失败点。我可以告诉你,我有一个超过3年的旧卡桑德拉版本。
-
Concurrent write/read data: Cassandra uses the lww policy (last-write-wins) to handle concurrent writes on the same key. The system supports multiple read-write and with newer protocols also async operations.
并发写/读数据:Cassandra使用lww策略(last-write-wins)来处理对同一密钥的并发写。系统支持多种读写操作,新协议也支持异步操作。
There are lots of other interesting features Cassandra offers: linear horizontal scaling is the one I appreciate more but there is also the fact that you can know the instant in which every piece of data has been updated (the timestamp of lww), counters features and so on.
Cassandra提供的许多其他有趣的特性:线性水平扩展是我更欣赏的特性,但您也可以知道每一个数据片段被更新的瞬间(lww的时间戳)、计数器特性等等。
(*) - if you don't use Consistency Level All which, imho, should NEVER be used in such a system.
(*) -如果您不使用一致性级别,那么在这样的系统中,不应该使用所有的一致性级别。
#3
4
Here's a few more links on how you can span In-Memory with Disk (DRAM, SSM, and disk storage) w/ Aerospike:
这里还有一些关于如何使用磁盘(DRAM、SSM和磁盘存储)的内存空间的链接:
http://www.aerospike.com/hybrid-memory/
http://www.aerospike.com/hybrid-memory/
http://www.aerospike.com/docs/architecture/storage.html
http://www.aerospike.com/docs/architecture/storage.html
I think everyone is right in terms of matching the specific DB to your specific use case. For instance, Aerospike is optimal for key-value data. Other options might be better.
我认为在将特定的DB匹配到您的特定用例方面,每个人都是正确的。例如,Aerospike是键值数据的最佳选择。其他的选择可能更好。
By way of analogy, I'll always remember how, decades ago, a sister of mine once borrowed my computer and wrote her term paper in Microsoft Excel. Line after line was a different row of a spreadsheet. It looked ugly as heck, but, uh, okay. She got the task done. She cursed and swore at how difficult it was to edit the thing. No kidding!
打个比方,我还记得几十年前,我的一个妹妹借了我的电脑,用微软Excel写了学期论文。一行接着一行是一个不同的电子表格。看起来很丑,但是,好吧。她完成了任务。她咒骂着,咒骂着编辑这东西有多难。没有开玩笑!
Choosing the right NoSQL database for the right task will either make your job a breeze, or could cause you to curse a blue streak if you decided on the wrong basic tool for the task at hand.
为正确的任务选择正确的NoSQL数据库将使您的工作变得轻而易举,或者如果您决定为手头的任务选择错误的基本工具,则可能导致您的工作遭遇失败。
Of course, every vendor's going to defend their product. I think it's best the community answer the question. Here's another Stack Overflow thread answering a similar question:
当然,每个供应商都会保护他们的产品。我认为社区最好回答这个问题。下面是另一个回答类似问题的堆栈溢出线程:
Has anyone worked with Aerospike? How does it compare to MongoDB?
有人使用过Aerospike吗?它与MongoDB相比如何?
btw: Do you have any more specific insights for us on what type of problem you are trying to solve?
顺便问一下,对于你想要解决的问题,你有什么更具体的见解吗?