So just a quick best practice question here. How do I know when I should create new collections in MongoDB?
所以这里只是一个快速的最佳实践问题。我怎么知道何时应该在MongoDB中创建新的集合?
I have an app that queries TV show data. Should each show have its own collection, or should they all be store within one collection with relevant data in the same document. Please explain why you chose the approach you did. (I'm still very new to MongoDB. I'm used to MySql.)
我有一个查询电视节目数据的应用程序。每个节目是否都有自己的集合,或者它们是否应该存储在一个集合中,并在同一文档中包含相关数据。请解释您选择所采用方法的原因。 (我仍然是MongoDB的新手。我已经习惯了MySql。)
2 个解决方案
#1
The Two Most Popular Approaches to Schema Design in MongoDB
- Embed data into documents and store them in a single collection.
- Normalize data across multiple collections.
将数据嵌入到文档中并将其存储在单个集合中。
规范化多个集合中的数据。
Embedding Data
There are several reasons why MongoDB doesn't support joins across collections, and I won't get into all of them here. But the main reason why we don't need joins is because we can embed relevant data into a single hierarchical JSON document. We can think of it as pre-joining the data before we store it. In the relational database world, this amounts to denormalizing our data. In MongoDB, this is about the most routine thing we can do.
MongoDB不支持跨集合的连接有几个原因,我不会在这里讨论所有这些。但是我们不需要连接的主要原因是因为我们可以将相关数据嵌入到单个分层JSON文档中。在我们存储数据之前,我们可以将其视为预先加入数据。在关系数据库世界中,这相当于对我们的数据进行非规范化。在MongoDB中,这是我们可以做的最常规的事情。
Normalizing Data
Even though MongoDB doesn't support joins, we can still store related data across multiple collections and still get to it all, albeit in a round about way. This requires us to store a reference to a key from one collection inside another collection. It sounds similar to relational databases, but MongoDB doesn't enforce any of key constraints for us like most relational databases do. Enforcing key constraints is left entirely up to us. We're good enough to manage it though, right?
尽管MongoDB不支持连接,但我们仍然可以将相关数据存储在多个集合中,并且仍然可以实现所有这些,尽管可以实现。这要求我们存储对另一个集合中一个集合的密钥的引用。它听起来与关系数据库类似,但MongoDB并没有像大多数关系数据库那样对我们强制执行任何关键约束。执行关键约束完全取决于我们。我们很好管理它,对吗?
Accessing all related data in this way means we're required to make at least one query for every collection the data is stored across. It's up to each of us to decide if we can live with that.
以这种方式访问所有相关数据意味着我们需要为存储数据的每个集合至少进行一次查询。由我们每个人决定我们是否可以忍受这一点。
When to Embed Data
- Embed data when that embedded data will be accessed at the same time as the rest of the document. Pre-joining data that is frequently used together reduces the amount of code we have to write to query across multiple collections. It also reduces the number of round trips to the server.
- Embed data when that embedded data only pertains to that single document. Like most rules, we need to give this some thought before blindly following it. If we're storing an address for a user, we don't need to create a separate collection to store addresses just because the user might have a roommate with the same address. Remember, we're not normalizing here, so duplicating data to some degree is ok.
- Embed data when you need "transaction-like" writes. Prior to v4.0, MongoDB did not support transactions, though it does guarantee that a single document write is atomic. It'll write the document or it won't. Writes across multiple collections could not be made atomic, and update anomalies could occur for how many ever number of scenarios we can imagine. This is no longer the case since v4.0, however it is still more typical to denormalize data to avoid the need for transactions.
在与文档的其余部分同时访问嵌入数据时嵌入数据。经常一起使用的预加入数据减少了我们必须在多个集合中写入查询的代码量。它还减少了到服务器的往返次数。
当嵌入数据仅与该单个文档相关时嵌入数据。像大多数规则一样,我们需要在盲目跟随之前给出一些想法。如果我们为用户存储地址,我们不需要创建单独的集合来存储地址,因为用户可能有一个具有相同地址的室友。请记住,我们在这里没有规范化,因此在某种程度上复制数据是可以的。
在需要“类似事务”的写入时嵌入数据。在v4.0之前,MongoDB不支持事务,但它确保单个文档写入是原子的。它会写文件或不会。跨多个集合的写入无法成为原子,并且可能会出现更多异常情况,因为我们可以想象有多少场景。自v4.0以来不再是这种情况,但更典型的是对数据进行非规范化以避免需要进行事务处理。
When to Normalize Data
-
Normalize data when data that applies to many documents changes frequently. So here we're talking about "one to many" relationships. If we have a large number of documents that have a city field with the value "New York" and all of a sudden the city of New York decides to change its name to "New-New York", well then we have to update a lot of documents. Got anomalies? In cases like this where we suspect other cities will follow suit and change their name, then we'd be better off creating a
cities
collection containing a single document for each city. - Normalize data when data grows frequently. When documents grow, they have to be moved on disk. If we're embedding data that frequently grows beyond its allotted space, that document will have to be moved often. Since these documents are bigger each time they're moved, the process only grows more complex and won't get any better over time. By normalizing those embedded parts that grow frequently, we eliminate the need for the entire document to be moved.
- Normalize data when the document is expected to grow larger than 16MB. Documents have a 16MB limit in MongoDB. That's just the way things are. We should start breaking them up into multiple collections if we ever approach that limit.
适用于许多文档的数据经常更改时规范化数据。所以这里我们谈论的是“一对多”的关系。如果我们有大量的文件,其城市字段的值为“纽约”,纽约市突然决定将其名称更改为“新纽约”,那么我们必须更新很多文件。有异常吗?在这种情况下,我们怀疑其他城市会效仿并更改其名称,那么我们最好创建一个包含每个城市的单个文档的城市集合。
在数据频繁增长时规范化数据。文档增长时,必须将它们移动到磁盘上。如果我们嵌入的数据经常超出其分配的空间,则必须经常移动该文档。由于这些文件每次移动时都会变大,因此过程变得越来越复杂,随着时间的推移不会变得更好。通过规范化经常增长的嵌入式部件,我们无需移动整个文档。
当文档预计增长大于16MB时,标准化数据。 MongoDB中的文档限制为16MB。这就是事情的方式。如果我们接近这个限制,我们应该开始将它们分成多个集合。
The Most Important Consideration to Schema Design in MongoDB is...
How our applications access and use data. This requires us to think? Uhg! What data is used together? What data is used mostly as read-only? What data is written to frequently? Let your applications data access patterns drive your schema, not the other way around.
我们的应用如何访问和使用数据。这需要我们思考? UHG!什么数据一起使用?哪些数据主要用作只读?经常写什么数据?让您的应用程序数据访问模式驱动您的架构,而不是相反。
#2
The scope you've described is definitely not too much for "one collection". In fact, being able to store everything in a single place is the whole point of a MongoDB collection.
您所描述的范围对于“一个集合”来说绝对不是太多。事实上,能够将所有内容存储在一个地方是MongoDB集合的重点。
For the most part, you don't want to be thinking about querying across combined tables as you would in SQL. Unlike in SQL, MongoDB lets you avoid thinking in terms of "JOINs"--in fact MongoDB doesn't even support them natively.
在大多数情况下,您不希望像在SQL中那样考虑跨组合表进行查询。与SQL不同,MongoDB允许您避免考虑“JOIN” - 实际上MongoDB甚至不支持它们本地。
See this slideshare: http://www.slideshare.net/mongodb/migrating-from-rdbms-to-mongodb?related=1
请参阅此幻灯片:http://www.slideshare.net/mongodb/migrating-from-rdbms-to-mongodb?related=1
Specifically look at slides 24 onward. Note how a MongoDB schema is meant to replace the multi-table schemas customary to SQL and RDBMS.
具体看一下前面的幻灯片24。请注意MongoDB模式如何替换SQL和RDBMS习惯的多表模式。
In MongoDB a single document holds all information regarding a record. All records are stored in a single collection.
在MongoDB中,单个文档包含有关记录的所有信息。所有记录都存储在一个集合中。
Also see this question: MongoDB query multiple collections at once
另请参阅此问题:MongoDB一次查询多个集合
#1
The Two Most Popular Approaches to Schema Design in MongoDB
- Embed data into documents and store them in a single collection.
- Normalize data across multiple collections.
将数据嵌入到文档中并将其存储在单个集合中。
规范化多个集合中的数据。
Embedding Data
There are several reasons why MongoDB doesn't support joins across collections, and I won't get into all of them here. But the main reason why we don't need joins is because we can embed relevant data into a single hierarchical JSON document. We can think of it as pre-joining the data before we store it. In the relational database world, this amounts to denormalizing our data. In MongoDB, this is about the most routine thing we can do.
MongoDB不支持跨集合的连接有几个原因,我不会在这里讨论所有这些。但是我们不需要连接的主要原因是因为我们可以将相关数据嵌入到单个分层JSON文档中。在我们存储数据之前,我们可以将其视为预先加入数据。在关系数据库世界中,这相当于对我们的数据进行非规范化。在MongoDB中,这是我们可以做的最常规的事情。
Normalizing Data
Even though MongoDB doesn't support joins, we can still store related data across multiple collections and still get to it all, albeit in a round about way. This requires us to store a reference to a key from one collection inside another collection. It sounds similar to relational databases, but MongoDB doesn't enforce any of key constraints for us like most relational databases do. Enforcing key constraints is left entirely up to us. We're good enough to manage it though, right?
尽管MongoDB不支持连接,但我们仍然可以将相关数据存储在多个集合中,并且仍然可以实现所有这些,尽管可以实现。这要求我们存储对另一个集合中一个集合的密钥的引用。它听起来与关系数据库类似,但MongoDB并没有像大多数关系数据库那样对我们强制执行任何关键约束。执行关键约束完全取决于我们。我们很好管理它,对吗?
Accessing all related data in this way means we're required to make at least one query for every collection the data is stored across. It's up to each of us to decide if we can live with that.
以这种方式访问所有相关数据意味着我们需要为存储数据的每个集合至少进行一次查询。由我们每个人决定我们是否可以忍受这一点。
When to Embed Data
- Embed data when that embedded data will be accessed at the same time as the rest of the document. Pre-joining data that is frequently used together reduces the amount of code we have to write to query across multiple collections. It also reduces the number of round trips to the server.
- Embed data when that embedded data only pertains to that single document. Like most rules, we need to give this some thought before blindly following it. If we're storing an address for a user, we don't need to create a separate collection to store addresses just because the user might have a roommate with the same address. Remember, we're not normalizing here, so duplicating data to some degree is ok.
- Embed data when you need "transaction-like" writes. Prior to v4.0, MongoDB did not support transactions, though it does guarantee that a single document write is atomic. It'll write the document or it won't. Writes across multiple collections could not be made atomic, and update anomalies could occur for how many ever number of scenarios we can imagine. This is no longer the case since v4.0, however it is still more typical to denormalize data to avoid the need for transactions.
在与文档的其余部分同时访问嵌入数据时嵌入数据。经常一起使用的预加入数据减少了我们必须在多个集合中写入查询的代码量。它还减少了到服务器的往返次数。
当嵌入数据仅与该单个文档相关时嵌入数据。像大多数规则一样,我们需要在盲目跟随之前给出一些想法。如果我们为用户存储地址,我们不需要创建单独的集合来存储地址,因为用户可能有一个具有相同地址的室友。请记住,我们在这里没有规范化,因此在某种程度上复制数据是可以的。
在需要“类似事务”的写入时嵌入数据。在v4.0之前,MongoDB不支持事务,但它确保单个文档写入是原子的。它会写文件或不会。跨多个集合的写入无法成为原子,并且可能会出现更多异常情况,因为我们可以想象有多少场景。自v4.0以来不再是这种情况,但更典型的是对数据进行非规范化以避免需要进行事务处理。
When to Normalize Data
-
Normalize data when data that applies to many documents changes frequently. So here we're talking about "one to many" relationships. If we have a large number of documents that have a city field with the value "New York" and all of a sudden the city of New York decides to change its name to "New-New York", well then we have to update a lot of documents. Got anomalies? In cases like this where we suspect other cities will follow suit and change their name, then we'd be better off creating a
cities
collection containing a single document for each city. - Normalize data when data grows frequently. When documents grow, they have to be moved on disk. If we're embedding data that frequently grows beyond its allotted space, that document will have to be moved often. Since these documents are bigger each time they're moved, the process only grows more complex and won't get any better over time. By normalizing those embedded parts that grow frequently, we eliminate the need for the entire document to be moved.
- Normalize data when the document is expected to grow larger than 16MB. Documents have a 16MB limit in MongoDB. That's just the way things are. We should start breaking them up into multiple collections if we ever approach that limit.
适用于许多文档的数据经常更改时规范化数据。所以这里我们谈论的是“一对多”的关系。如果我们有大量的文件,其城市字段的值为“纽约”,纽约市突然决定将其名称更改为“新纽约”,那么我们必须更新很多文件。有异常吗?在这种情况下,我们怀疑其他城市会效仿并更改其名称,那么我们最好创建一个包含每个城市的单个文档的城市集合。
在数据频繁增长时规范化数据。文档增长时,必须将它们移动到磁盘上。如果我们嵌入的数据经常超出其分配的空间,则必须经常移动该文档。由于这些文件每次移动时都会变大,因此过程变得越来越复杂,随着时间的推移不会变得更好。通过规范化经常增长的嵌入式部件,我们无需移动整个文档。
当文档预计增长大于16MB时,标准化数据。 MongoDB中的文档限制为16MB。这就是事情的方式。如果我们接近这个限制,我们应该开始将它们分成多个集合。
The Most Important Consideration to Schema Design in MongoDB is...
How our applications access and use data. This requires us to think? Uhg! What data is used together? What data is used mostly as read-only? What data is written to frequently? Let your applications data access patterns drive your schema, not the other way around.
我们的应用如何访问和使用数据。这需要我们思考? UHG!什么数据一起使用?哪些数据主要用作只读?经常写什么数据?让您的应用程序数据访问模式驱动您的架构,而不是相反。
#2
The scope you've described is definitely not too much for "one collection". In fact, being able to store everything in a single place is the whole point of a MongoDB collection.
您所描述的范围对于“一个集合”来说绝对不是太多。事实上,能够将所有内容存储在一个地方是MongoDB集合的重点。
For the most part, you don't want to be thinking about querying across combined tables as you would in SQL. Unlike in SQL, MongoDB lets you avoid thinking in terms of "JOINs"--in fact MongoDB doesn't even support them natively.
在大多数情况下,您不希望像在SQL中那样考虑跨组合表进行查询。与SQL不同,MongoDB允许您避免考虑“JOIN” - 实际上MongoDB甚至不支持它们本地。
See this slideshare: http://www.slideshare.net/mongodb/migrating-from-rdbms-to-mongodb?related=1
请参阅此幻灯片:http://www.slideshare.net/mongodb/migrating-from-rdbms-to-mongodb?related=1
Specifically look at slides 24 onward. Note how a MongoDB schema is meant to replace the multi-table schemas customary to SQL and RDBMS.
具体看一下前面的幻灯片24。请注意MongoDB模式如何替换SQL和RDBMS习惯的多表模式。
In MongoDB a single document holds all information regarding a record. All records are stored in a single collection.
在MongoDB中,单个文档包含有关记录的所有信息。所有记录都存储在一个集合中。
Also see this question: MongoDB query multiple collections at once
另请参阅此问题:MongoDB一次查询多个集合