I am making a database plan, and I am a little confused on Normalising to the Second Normal Form. I am also worried about the large number of columns, and I can't rightly figure out what to do with them.
我正在制定一个数据库计划,我有点搞不懂如何将其规范化为第二种正常形式。我也担心大量的列,我不知道该怎么处理它们。
This is my Table that I am focusing on MatchDetails
:
这是我的表格,我关注的是匹配细节:
Idea 1
img链接
The Player_ID
is a Unique Primary Key for another table Users
. MatchID
is a Unique Primary Key for the table Matches
. The relationship between Matches and Players is many to many.
Player_ID是另一个表用户的唯一主键。MatchID是表匹配的唯一主键。比赛和运动员之间的关系是多方面的。
Would this work as a compound key? In the sense that 1 player can only have taken part in a particular Match once? Do the columns to the right of MatchID
Have a functional dependency on the Compound key, in the sense that they are unique TO that Compound key?
这是一个复合键吗?在某种意义上,一个玩家只能参加一场特定的比赛一次?在MatchID右边的列是否有对复合键的函数依赖,在这个意义上,它们是那个复合键的唯一?
Idea 2
img链接
In this example, the Participation_ID
is a Unique Primary Key for the table, since there can be multiple instances of the same Player_ID
and the same MatchID
for various combinations of Players and Matches.
在本例中,Participation_ID是该表的唯一主键,因为对于不同的参与者和比赛组合,可以有多个相同Player_ID和相同MatchID的实例。
In this example, I would guess that this column is in Second Normal Form because there is only one Primary Key, and the Match Values are unique, and are thus functionally dependent? I am a little confused on Functional Dependency despite trying to read about it here.
在这个例子中,我猜这一列是第二正规形式,因为只有一个主键,而且匹配值是唯一的,因此函数依赖?尽管我在这里读到过,我还是有点困惑于函数依赖关系。
Oh and another small thing...
The final thing that I am a little in doubt about, is the huge number of columns. All of the information to the right of MatchID
are details about HOW the player (Player_ID
) performed in the match (MatchID
). Should they be in another table?
最后一件我有点怀疑的事情是,大量的列。MatchID右边的所有信息都是关于球员(Player_ID)在比赛中的表现的详细信息。他们应该在另一张桌子上吗?
Link to other tables if you would like to see the layout so far: http://i.imgur.com/52ax04g.png
如果您想查看到目前为止的布局,请链接到其他表:http://i.imgur.com/52ax04g.png
Please ignore that MatchID doesn't have an underscore and the other ID's do, It's only an excel plan!
请忽略MatchID没有下划线和其他ID,这只是一个excel计划!
2 个解决方案
#1
1
Unless the same player can participate in the same match more than once, you'll have to have a composite key {Player_ID, MatchID}, whether you add another key (such as {Participation_ID}) or not.
除非同一玩家可以多次参与同一场比赛,否则无论是否添加另一个密钥(比如{Participation_ID}),都必须有一个复合密钥{Player_ID, MatchID}。
Adding {Participation_ID} key only makes sense if you have some other tables that reference it1 and you want to make their foreign keys slimmer, or if you use a particularly hostile ORM that requires a non-composite primary key.
添加{Participation_ID}键只有在您有一些引用it1的其他表时才有意义,并且您希望使它们的外键更小,或者如果您使用了一个需要非复合主键的特别敌对的ORM时才有意义。
Do the columns to the right of MatchID Have a functional dependency on the Compound key
MatchID右边的列是否具有对复合键的功能性依赖
Yes.
是的。
You can think of a "functional dependency" simply as a way for saying that the relation (a set of tuples) is a function. For a relation to be function (in the mathematical sense of that word), it must always produce same "result" for same "arguments".
您可以简单地将“功能依赖项”视为表示关系(一组元组)是函数的一种方式。对于一个函数关系(从数学意义上来说),它必须总是为相同的“参数”产生相同的“结果”。
If the attributes of the given key are the "arguments" and the rest of the attributes are the "result", then no two different results can ever be produced from the same arguments, simply because the key is unique and therefore any particular combination of values of key attributes2 cannot identify more than one tuple.
如果给定键的属性是“参数”,其余的属性是“结果”,然后没有两个不同的结果可以产生相同的参数,因为关键是惟一的,因此任何特定的值的组合键attributes2无法识别多个元组。
So all attributes are always functionally dependent on the key. That is always true for any key, otherwise it wouldn't be a key.
所以所有的属性在功能上总是依赖于键。对于任何键都是这样,否则它就不是键。
The only question is whether some non-key attribute is also dependent on the proper subset of the key attributes. If it is, you have violated the 2NF.3
唯一的问题是,某些非键属性是否也依赖于键属性的适当子集。如果是,你违反了2NF.3
In your case, if any of the attributes depends on Player_ID alone (or MatchID alone), that would violate the 2NF.
在您的示例中,如果任何属性都单独依赖Player_ID(或MatchID),则会违反2NF。
The final thing that I am a little in doubt about, is the huge number of columns. All of the information to the right of MatchID are details about HOW the player (Player_ID) performed in the match (MatchID). Should they be in another table?
最后一件我有点怀疑的事情是,大量的列。MatchID右边的所有信息都是关于球员(Player_ID)在比赛中的表现的详细信息。他们应该在另一张桌子上吗?
Looks like they are where they need to be from the logical standpoint. It is unlikely, but possible, that you might have some physical reasons for vertically partitioning the data.4
从逻辑的角度来看,它们似乎是它们所需要的。这是不可能的,但是可能的,您可能有一些物理原因来垂直划分数据
Some unrelated suggestions:
一些不相关的建议:
-
Use consistent naming: if there is Player_ID, there should be Match_ID, not MatchID (or vice-verse).Whops, I missed your last sentence. - 使用一致的命名:如果有Player_ID,则应该有Match_ID,而不是MatchID(或副词)。哎呀,我错过了你的最后一句话。
- Use singular for table names, for the same reason singular is typically used for class names in OOP.
- 表名使用单数,与OOP中类名使用单数的原因相同。
1 Which you don't as far as I can see.
在我看来,你看不出来。
2 Aka. "prime" attributes. Strangely enough, a prime attribute does not have to belong to a "primary" key (it can belong to an alternate key), so just saying "key attributes" is probably a better terminology, IMHO.
2又名。“'”属性。奇怪的是,prime属性不必属于“主”键(它可以属于另一个键),所以说“键属性”可能是更好的术语,IMHO。
3 Obviously, this is only a concern for composite keys, because if a key has only one attribute, its proper subset is empty.
显然,这只是对组合键的关注,因为如果一个键只有一个属性,那么它的适当子集就是空的。
4 DBMSes can typically handle hundreds or even thousands of columns these days, and this doesn't really qualify as "huge number of columns".
如今,4个dbms通常可以处理数百甚至数千个列,而这并不能算是“大量的列”。
#2
2
The columns {PlayerID, MatchID} seem to work as a compound key.
列{PlayerID, MatchID}似乎作为复合键工作。
The columns to the right of MatchID do have a functional dependency on the (compound) primary key, as long as they represent that player's statistics in one particular match.
MatchID右边的列确实具有对(复合)主键的功能依赖,只要它们表示特定匹配中玩家的统计信息。
If those columns instead represent the players overall statistics, then they're dependent only on PlayerID, and this design is not in 2NF.
如果这些列表示玩家的总体统计数据,那么它们只依赖PlayerID,而这个设计不在2NF中。
The normal forms take into account every candidate key, not just the primary key. The fact that you later add an integer row identifier, ParticipationID, doesn't change anything in my previous paragraphs--the columns {PlayerID, MatchID} still seem to be a (compound) candidate key, and you have to take them into account.
正常的表单考虑到每个候选键,而不仅仅是主键。事实上,您稍后添加了一个整数行标识符、参与式id,在我前面的段落中没有改变任何东西——列{PlayerID, MatchID}似乎仍然是一个(复合)候选键,您必须考虑它们。
There's no such thing as "I don't have too many columns" normal form. If you need 20 attributes that are functionally dependent on every candidate key, then you need 20 attributes that are functionally dependent on every candidate key.
没有“我没有太多的列”这样的形式。如果需要20个功能上依赖于每个候选键的属性,那么需要20个功能上依赖于每个候选键的属性。
#1
1
Unless the same player can participate in the same match more than once, you'll have to have a composite key {Player_ID, MatchID}, whether you add another key (such as {Participation_ID}) or not.
除非同一玩家可以多次参与同一场比赛,否则无论是否添加另一个密钥(比如{Participation_ID}),都必须有一个复合密钥{Player_ID, MatchID}。
Adding {Participation_ID} key only makes sense if you have some other tables that reference it1 and you want to make their foreign keys slimmer, or if you use a particularly hostile ORM that requires a non-composite primary key.
添加{Participation_ID}键只有在您有一些引用it1的其他表时才有意义,并且您希望使它们的外键更小,或者如果您使用了一个需要非复合主键的特别敌对的ORM时才有意义。
Do the columns to the right of MatchID Have a functional dependency on the Compound key
MatchID右边的列是否具有对复合键的功能性依赖
Yes.
是的。
You can think of a "functional dependency" simply as a way for saying that the relation (a set of tuples) is a function. For a relation to be function (in the mathematical sense of that word), it must always produce same "result" for same "arguments".
您可以简单地将“功能依赖项”视为表示关系(一组元组)是函数的一种方式。对于一个函数关系(从数学意义上来说),它必须总是为相同的“参数”产生相同的“结果”。
If the attributes of the given key are the "arguments" and the rest of the attributes are the "result", then no two different results can ever be produced from the same arguments, simply because the key is unique and therefore any particular combination of values of key attributes2 cannot identify more than one tuple.
如果给定键的属性是“参数”,其余的属性是“结果”,然后没有两个不同的结果可以产生相同的参数,因为关键是惟一的,因此任何特定的值的组合键attributes2无法识别多个元组。
So all attributes are always functionally dependent on the key. That is always true for any key, otherwise it wouldn't be a key.
所以所有的属性在功能上总是依赖于键。对于任何键都是这样,否则它就不是键。
The only question is whether some non-key attribute is also dependent on the proper subset of the key attributes. If it is, you have violated the 2NF.3
唯一的问题是,某些非键属性是否也依赖于键属性的适当子集。如果是,你违反了2NF.3
In your case, if any of the attributes depends on Player_ID alone (or MatchID alone), that would violate the 2NF.
在您的示例中,如果任何属性都单独依赖Player_ID(或MatchID),则会违反2NF。
The final thing that I am a little in doubt about, is the huge number of columns. All of the information to the right of MatchID are details about HOW the player (Player_ID) performed in the match (MatchID). Should they be in another table?
最后一件我有点怀疑的事情是,大量的列。MatchID右边的所有信息都是关于球员(Player_ID)在比赛中的表现的详细信息。他们应该在另一张桌子上吗?
Looks like they are where they need to be from the logical standpoint. It is unlikely, but possible, that you might have some physical reasons for vertically partitioning the data.4
从逻辑的角度来看,它们似乎是它们所需要的。这是不可能的,但是可能的,您可能有一些物理原因来垂直划分数据
Some unrelated suggestions:
一些不相关的建议:
-
Use consistent naming: if there is Player_ID, there should be Match_ID, not MatchID (or vice-verse).Whops, I missed your last sentence. - 使用一致的命名:如果有Player_ID,则应该有Match_ID,而不是MatchID(或副词)。哎呀,我错过了你的最后一句话。
- Use singular for table names, for the same reason singular is typically used for class names in OOP.
- 表名使用单数,与OOP中类名使用单数的原因相同。
1 Which you don't as far as I can see.
在我看来,你看不出来。
2 Aka. "prime" attributes. Strangely enough, a prime attribute does not have to belong to a "primary" key (it can belong to an alternate key), so just saying "key attributes" is probably a better terminology, IMHO.
2又名。“'”属性。奇怪的是,prime属性不必属于“主”键(它可以属于另一个键),所以说“键属性”可能是更好的术语,IMHO。
3 Obviously, this is only a concern for composite keys, because if a key has only one attribute, its proper subset is empty.
显然,这只是对组合键的关注,因为如果一个键只有一个属性,那么它的适当子集就是空的。
4 DBMSes can typically handle hundreds or even thousands of columns these days, and this doesn't really qualify as "huge number of columns".
如今,4个dbms通常可以处理数百甚至数千个列,而这并不能算是“大量的列”。
#2
2
The columns {PlayerID, MatchID} seem to work as a compound key.
列{PlayerID, MatchID}似乎作为复合键工作。
The columns to the right of MatchID do have a functional dependency on the (compound) primary key, as long as they represent that player's statistics in one particular match.
MatchID右边的列确实具有对(复合)主键的功能依赖,只要它们表示特定匹配中玩家的统计信息。
If those columns instead represent the players overall statistics, then they're dependent only on PlayerID, and this design is not in 2NF.
如果这些列表示玩家的总体统计数据,那么它们只依赖PlayerID,而这个设计不在2NF中。
The normal forms take into account every candidate key, not just the primary key. The fact that you later add an integer row identifier, ParticipationID, doesn't change anything in my previous paragraphs--the columns {PlayerID, MatchID} still seem to be a (compound) candidate key, and you have to take them into account.
正常的表单考虑到每个候选键,而不仅仅是主键。事实上,您稍后添加了一个整数行标识符、参与式id,在我前面的段落中没有改变任何东西——列{PlayerID, MatchID}似乎仍然是一个(复合)候选键,您必须考虑它们。
There's no such thing as "I don't have too many columns" normal form. If you need 20 attributes that are functionally dependent on every candidate key, then you need 20 attributes that are functionally dependent on every candidate key.
没有“我没有太多的列”这样的形式。如果需要20个功能上依赖于每个候选键的属性,那么需要20个功能上依赖于每个候选键的属性。