I'm planning a product that will process updates from multiple data feeds. Input-data is guesstimated to be a total of 100Mbps stream containing 100 byte sized messages. These messages contain several data fields that needs to be checked for correlation with the existing data set within the application. If a input-message correlates with an existing data record, then the input-message will update the existing data-record, if not: it will create a new record. It is assumed that data are updated every 3 seconds in average.
我正在计划一种产品,它将处理来自多个数据源的更新。输入数据被估计为包含100字节大小的消息的总共100Mbps流。这些消息包含多个数据字段,需要检查这些数据字段是否与应用程序中的现有数据集相关联。如果输入消息与现有数据记录相关,则输入消息将更新现有数据记录,如果不是:它将创建新记录。假设数据平均每3秒更新一次。
The correlation process is assumed to be a bottleneck, and thus I intend to make our product able to run balanced in multiple processes if needed (most likely on a separate hardware or VM). Somewhat in the vicinity of Space-based architecture. I'd then like a shared storage between my processes so that all existing data records are visible to all the running processes. The shared storage will have to fetch possible candidates for correlation through a query/search based on some attributes (e.g. elevation). It will have to offer configuring warm redundancy, and a possibility to store snapshots every 5 minutes for logging.
假设关联过程是瓶颈,因此我打算使我们的产品能够在需要时在多个进程中运行平衡(最有可能在单独的硬件或VM上)。在天基架构附近有点儿。然后,我想在我的进程之间使用共享存储,以便所有正在运行的进程都可以看到所有现有数据记录。共享存储将必须通过基于某些属性(例如,提升)的查询/搜索来获取可能的相关候选者。它必须提供配置热冗余,并且可以每5分钟存储一次用于记录的快照。
Everything seems to be pointing towards MongoDB, but I'd like a confirmation from you that MongoDB will meet my needs. So do you think it is a go? -Thank you
一切似乎都指向MongoDB,但我希望您确认MongoDB将满足我的需求。所以你认为这是一个进展吗? -谢谢
NB: I am not considering a relational database because we want to focus all coding in our application, instead of having to make 'stored procedures'/'functions' in a separate environment to optimize the performance of our system. Further, the data is diverse and I don't want to try normalize it into a schema.
注意:我没有考虑关系数据库,因为我们希望将所有编码集中在我们的应用程序中,而不是必须在单独的环境中创建“存储过程”/“函数”以优化我们系统的性能。此外,数据是多种多样的,我不想尝试将其规范化为模式。
1 个解决方案
#1
3
Yes, MongoDB will meet your needs. I think the following aspects of your description are particularly relevant in your DB selection decision:
是的,MongoDB将满足您的需求。我认为您的描述的以下方面与您的数据库选择决策特别相关:
1. An update happens every 3 seconds
1.每3秒更新一次
MongoDB has a database level write-lock (usually short lived) that blocks read operations. This means that you want will want to ensure that you have enough memory to fit your working set, and you will generally not run into any write-lock issues. Note that bulk inserts will hold the write lock for longer.
MongoDB具有阻止读取操作的数据库级写锁定(通常是短暂的)。这意味着您希望确保您有足够的内存来适应您的工作集,并且您通常不会遇到任何写锁定问题。请注意,批量插入将保持写锁定的时间更长。
If you are sharding, you will want to consider shard keys that allow for write scaling i.e. distribute writes on different shards.
如果是分片,则需要考虑允许写入缩放的分片键,即在不同分片上分配写入。
2. Shared storage for multiple processes
2.多个进程的共享存储
This is a pretty common scenario; in fact, many MongoDB deployments are expected be accessed from multiple processes concurrently. Unlike the write-lock, the read-lock does not block other reads.
这是一种非常常见的情况;事实上,许多MongoDB部署预计会同时从多个进程访问。与写锁不同,读锁不会阻止其他读操作。
3. Warm redundancy
3.热冗余
Supported through MongoDB replication. If you'd like to read from secondary server(s) you will need to set the Read Preference to secondaryPreferred
in your driver.
通过MongoDB复制支持。如果您想从辅助服务器读取,则需要在驱动程序中将读取首选项设置为secondaryPreferred。
#1
3
Yes, MongoDB will meet your needs. I think the following aspects of your description are particularly relevant in your DB selection decision:
是的,MongoDB将满足您的需求。我认为您的描述的以下方面与您的数据库选择决策特别相关:
1. An update happens every 3 seconds
1.每3秒更新一次
MongoDB has a database level write-lock (usually short lived) that blocks read operations. This means that you want will want to ensure that you have enough memory to fit your working set, and you will generally not run into any write-lock issues. Note that bulk inserts will hold the write lock for longer.
MongoDB具有阻止读取操作的数据库级写锁定(通常是短暂的)。这意味着您希望确保您有足够的内存来适应您的工作集,并且您通常不会遇到任何写锁定问题。请注意,批量插入将保持写锁定的时间更长。
If you are sharding, you will want to consider shard keys that allow for write scaling i.e. distribute writes on different shards.
如果是分片,则需要考虑允许写入缩放的分片键,即在不同分片上分配写入。
2. Shared storage for multiple processes
2.多个进程的共享存储
This is a pretty common scenario; in fact, many MongoDB deployments are expected be accessed from multiple processes concurrently. Unlike the write-lock, the read-lock does not block other reads.
这是一种非常常见的情况;事实上,许多MongoDB部署预计会同时从多个进程访问。与写锁不同,读锁不会阻止其他读操作。
3. Warm redundancy
3.热冗余
Supported through MongoDB replication. If you'd like to read from secondary server(s) you will need to set the Read Preference to secondaryPreferred
in your driver.
通过MongoDB复制支持。如果您想从辅助服务器读取,则需要在驱动程序中将读取首选项设置为secondaryPreferred。