I have question about normalization. Suppose I have an applications dealing with songs.
我对正常化有疑问。假设我有一个处理歌曲的应用程序。
First I thought about doing like this:
首先我想这样做:
Songs Table:
id | song_title | album_id | publisher_id | artist_id
Albums Table:
id | album_title | etc...
Publishers Table:
id | publisher_name | etc...
Artists Tale:
id | artist_name | etc...
Then as I think about normalization stuff. I thought I should get rid of "album_id, publisher_id, and artist_id in songs table and put them in intermediate tables like this.
当我想到标准化的时候。我认为我应该去掉歌曲表中的album_id、publisher_id和artist_id,并将它们放在像这样的中间表中。
Table song_album:
song_id, album_id
Table song_publisher
song_id, publisher_id
Table song_artist
song_id, artist_id
Now I can't decide which is the better way. I'm not an expert on database design so If someone would point out the right direction. It would awesome.
现在我不能决定哪一个更好。我不是数据库设计方面的专家,所以如果有人指出正确的方向的话。就太棒了。
Are there any performance issues between two approaches?
两种方法之间是否存在性能问题?
Thanks
谢谢
5 个解决方案
#1
3
Forget about performance issues. The question is Does this model represent the data correctly?
忘记性能问题。问题是这个模型是否正确地表示了数据?
The intermediate tables are called "junction tables" and they are useful when you can have a many-to-many relationship. For example, if you store the song "We Are the World" in your database, then you are going to have many artists for that song. Each of those artists are also responsible for creating many other songs. Therefor, to represent the data correctly, you will have to use junction tables, just as you did in the second version.
中间表称为“连接表”,当您可以拥有多对多关系时,它们是有用的。例如,如果你在你的数据库中存储歌曲“We Are the World”,那么你将会有很多歌手演唱这首歌。每一位艺术家都有责任创作许多其他的歌曲。因此,要正确地表示数据,您必须使用连接表,就像在第二个版本中所做的那样。
#2
2
That depends. If you can guarantee that a particular song always belongs to one single album, go for your first approach. If not, you have a n-to-n relationship and need a join table: that is your second approach. Both are completely ok in terms of normalization.
那得看情况。如果你能保证某首歌总是属于一张专辑,那就开始你的第一步吧。如果不是,则有n-to-n关系,需要一个连接表:这是第二种方法。在标准化方面,两者都是完全可以的。
It is important that you design your database in a way you can map your data to it.
设计数据库时,一定要将数据映射到数据库。
Dont worry about performance here. Performance depends more on how you optimized your indexes and how your queries look like, than on having to do one more join operation or not (your second approach, the join table, would need one more join in every query).
不要担心这里的表现。性能更多地取决于优化索引的方式和查询的外观,而不是必须进行多一个连接操作(第二个方法,连接表,将需要在每个查询中多一个连接)。
#3
1
The first structure is mixing up the semantics (e.g. writing the publisher name for each single song). The second structure will allow you to put invalid data in the database (e.g. one song can belong to two albums). Here is what I understood from the problem domain and my suggestions for the design:
第一个结构是混合语义(例如为每首歌写出版商名)。第二个结构将允许您将无效的数据放入数据库(例如,一首歌曲可以属于两个专辑)。以下是我对问题领域的理解以及我对设计的建议:
One album is published by only one publisher, thus you don't need to specify the publisher in every single song, you just need to put the publisher_ID in the Albums table. Also if you keep the artist_ID in the Songs table, each one of your songs can have only one artist at a time; but by putting the song_ID and artist_ID in a linkage table you can have multiple artists for one song (like the time when 2 singers sing one song together). The publisher_id goes to albums table as each album is published by one publisher. Also for table names it is always advised to use singular form.
一个专辑只由一个发布者发布,因此不需要在每首歌中指定发布者,只需将publisher_ID放在album表中。另外,如果您将artist_ID保存在歌曲表中,那么您的每首歌曲一次只能有一位歌手;但是,通过将song_ID和artist_ID放在一个链接表中,您可以为一首歌拥有多个艺术家(比如两个歌手一起唱一首歌的时间)。publisher_id进入相册表,因为每个相册由一个发布者发布。对于表名,通常建议使用单数形式。
Here is my suggested design:
以下是我的建议设计:
Song Table:
id | song_title | album_id | ...
Album Table:
id | album_title | publisher_id | ...
Publisher Table:
id | publisher_name | ...
Artist Table:
id | artist_name | ...
Song_Artist Table:
song_id | artist_id | artist_role | ...
#4
0
Songs can appear on multiple albums. Think a greatest hits release. Its important to zoom out of the technical muck and consider the real world use of an application (or database).
歌曲可以出现在多个专辑中。想一个最成功的版本。重要的是要缩小技术的混乱,考虑应用程序(或数据库)在现实世界中的使用。
#5
-3
I'd stick with the first one, for two reasons:
我坚持第一个,有两个原因:
- A song is only associated with one album, one publisher and one artist, so you don't need to create separate tables for them (if, for example, a song can have more than one artist, then create the song_artist table).
- 一首歌只与一个专辑、一个出版商和一个艺术家相关联,所以您不需要为他们创建单独的表(例如,如果一首歌可以有多个艺术家,那么创建song_artist表)。
- It's more efficient. With the second approach you'll need to make some joins.
- 这是更有效率。对于第二种方法,您将需要进行一些连接。
#1
3
Forget about performance issues. The question is Does this model represent the data correctly?
忘记性能问题。问题是这个模型是否正确地表示了数据?
The intermediate tables are called "junction tables" and they are useful when you can have a many-to-many relationship. For example, if you store the song "We Are the World" in your database, then you are going to have many artists for that song. Each of those artists are also responsible for creating many other songs. Therefor, to represent the data correctly, you will have to use junction tables, just as you did in the second version.
中间表称为“连接表”,当您可以拥有多对多关系时,它们是有用的。例如,如果你在你的数据库中存储歌曲“We Are the World”,那么你将会有很多歌手演唱这首歌。每一位艺术家都有责任创作许多其他的歌曲。因此,要正确地表示数据,您必须使用连接表,就像在第二个版本中所做的那样。
#2
2
That depends. If you can guarantee that a particular song always belongs to one single album, go for your first approach. If not, you have a n-to-n relationship and need a join table: that is your second approach. Both are completely ok in terms of normalization.
那得看情况。如果你能保证某首歌总是属于一张专辑,那就开始你的第一步吧。如果不是,则有n-to-n关系,需要一个连接表:这是第二种方法。在标准化方面,两者都是完全可以的。
It is important that you design your database in a way you can map your data to it.
设计数据库时,一定要将数据映射到数据库。
Dont worry about performance here. Performance depends more on how you optimized your indexes and how your queries look like, than on having to do one more join operation or not (your second approach, the join table, would need one more join in every query).
不要担心这里的表现。性能更多地取决于优化索引的方式和查询的外观,而不是必须进行多一个连接操作(第二个方法,连接表,将需要在每个查询中多一个连接)。
#3
1
The first structure is mixing up the semantics (e.g. writing the publisher name for each single song). The second structure will allow you to put invalid data in the database (e.g. one song can belong to two albums). Here is what I understood from the problem domain and my suggestions for the design:
第一个结构是混合语义(例如为每首歌写出版商名)。第二个结构将允许您将无效的数据放入数据库(例如,一首歌曲可以属于两个专辑)。以下是我对问题领域的理解以及我对设计的建议:
One album is published by only one publisher, thus you don't need to specify the publisher in every single song, you just need to put the publisher_ID in the Albums table. Also if you keep the artist_ID in the Songs table, each one of your songs can have only one artist at a time; but by putting the song_ID and artist_ID in a linkage table you can have multiple artists for one song (like the time when 2 singers sing one song together). The publisher_id goes to albums table as each album is published by one publisher. Also for table names it is always advised to use singular form.
一个专辑只由一个发布者发布,因此不需要在每首歌中指定发布者,只需将publisher_ID放在album表中。另外,如果您将artist_ID保存在歌曲表中,那么您的每首歌曲一次只能有一位歌手;但是,通过将song_ID和artist_ID放在一个链接表中,您可以为一首歌拥有多个艺术家(比如两个歌手一起唱一首歌的时间)。publisher_id进入相册表,因为每个相册由一个发布者发布。对于表名,通常建议使用单数形式。
Here is my suggested design:
以下是我的建议设计:
Song Table:
id | song_title | album_id | ...
Album Table:
id | album_title | publisher_id | ...
Publisher Table:
id | publisher_name | ...
Artist Table:
id | artist_name | ...
Song_Artist Table:
song_id | artist_id | artist_role | ...
#4
0
Songs can appear on multiple albums. Think a greatest hits release. Its important to zoom out of the technical muck and consider the real world use of an application (or database).
歌曲可以出现在多个专辑中。想一个最成功的版本。重要的是要缩小技术的混乱,考虑应用程序(或数据库)在现实世界中的使用。
#5
-3
I'd stick with the first one, for two reasons:
我坚持第一个,有两个原因:
- A song is only associated with one album, one publisher and one artist, so you don't need to create separate tables for them (if, for example, a song can have more than one artist, then create the song_artist table).
- 一首歌只与一个专辑、一个出版商和一个艺术家相关联,所以您不需要为他们创建单独的表(例如,如果一首歌可以有多个艺术家,那么创建song_artist表)。
- It's more efficient. With the second approach you'll need to make some joins.
- 这是更有效率。对于第二种方法,您将需要进行一些连接。