Suppose I have the following tables:
假设我有以下表格:
____________________ ____________________
| Organisms | | Species |
|--------------------| |--------------------|
|OrganismId (int, PK)| |SpeciesId (int, PK) |
|SpeciesId (int, FK) |∞---------1|Name (varchar) |
|Name (varchar) | |____________________|
|____________________| 1
1 |
| |
| |
∞ ∞
______________________ ____________________ _______________
| OrganismPropsValues | | SpeciesProps | | Props |
|----------------------| |--------------------| |---------------|
|OrganismId (int, FK) | |PropId (int,PK,FK) | ∞-----1|PropId (int,PK)|
|PropId (int, FK) | |SpeciesId(int,PK,FK)| |Name (varchar) |
|Value (varchar) | |____________________| |_______________|
|______________________| 1
∞ |
| |
-----------------------------------------------------------
A quick explanation of what I am trying to represent here: suppose we have a list of species, such as cat, dog, human, etc. We also have a set of properties (abbreviated Props so I could fit it more easily in the diagram) which apply to some but not necessarily all species--for example, this may be tail length (for species with tails), eye color (for those with eyes), etc.
快速解释我在这里要表达的内容:假设我们有一个物种列表,例如猫,狗,人等等。我们还有一组属性(缩写为Props,所以我可以更容易地在图中使用它)适用于某些但不一定适用于所有物种 - 例如,这可能是尾长(对于有尾巴的物种),眼睛颜色(对于有眼睛的人)等。
SpeciesProps is a linker table that defines which properties apply to which species-- so here we would have {Human, Eye Color}, {Dog, Eye Color}, {Cat, Eye Color}, {Dog, Tail Length}, {Cat, Tail Length}. We do not have {Human, Tail Length} because Tail Length is obviously not a valid property to apply to a human.
SpeciesProps是一个链接表,用于定义哪些属性适用于哪些物种 - 所以这里我们会有{人类,眼睛颜色},{狗,眼睛颜色},{猫,眼睛颜色},{狗,尾长},{猫,尾长}。我们没有{Human,Tail Length},因为Tail Length显然不是适用于人类的有效属性。
The Organisms table holds actual "implementations" of the species-- So here we might have {Human, Bob}, {Dog, Rufus}, and {Cat, Felix}.
“生物”表格包含物种的实际“实施” - 所以我们可能会有{人类,鲍勃},{狗,鲁弗斯}和{猫,菲利克斯}。
Here is now my issue: in the OrganismPropsValues table, I want to store the 'values' of the properties for each organism--so for example, for Bob I want to store {Bob, Eye Color, Blue}. For Rufus, I would want to store {Rufus, Eye Color, Brown} and {Rufus, Tail Length, 20} (similar for Felix). My problem however, is that in the schema that I have detailed, it is perfectly possible to store {Bob, Tail Length, 10}, even though the {Human, Tail Length} tuple does not exist in SpeciesProps. How can I modify this schema so I can enforce the constraints defined in SpeciesProps in OrganismPropsValues, while maintaining adequate normalization?
现在我的问题是:在OrganismPropsValues表中,我想存储每个生物体的属性的“值” - 例如,对于Bob我想存储{Bob,Eye Color,Blue}。对于Rufus,我想存储{Rufus,Eye Color,Brown}和{Rufus,Tail Length,20}(类似于Felix)。然而,我的问题是,在我详细描述的模式中,即使在SpeciesProps中不存在{Human,Tail Length}元组,也完全可以存储{Bob,Tail Length,10}。如何修改此模式,以便我可以强制执行OrganismPropsValues中SpeciesProps中定义的约束,同时保持足够的规范化?
4 个解决方案
#1
4
You're implementing the Entity-Attribute-Value antipattern. This can't be a normalized database design, because it's not relational.
您正在实现Entity-Attribute-Value反模式。这不是规范化的数据库设计,因为它不是关系型的。
What I would suggest instead is the Class Table Inheritance design pattern:
我建议的是Class Table Inheritance设计模式:
- Create one table for Organisms, containing properties common to all species.
- 为Organisms创建一个表,其中包含所有物种共有的属性。
-
Create one table per species, containing properties specific to that species. Each of these tables has a 1-to-1 relationship with Organisms, but each property belongs in its own column.
每个物种创建一个表,包含特定于该物种的属性。这些表中的每一个都与Organisms具有一对一的关系,但每个属性都属于它自己的列。
____________________ ____________________ | Organisms | | Species | |--------------------| |--------------------| |OrganismId (int, PK)| |SpeciesId (int, PK) | |SpeciesId (int, FK) |∞---------1|Name (varchar) | |Name (varchar) | |____________________| |____________________| 1 | | 1 ______________________ | HumanOrganism | |----------------------| |OrganismId (int, FK) | |Sex (enum) | |Race (int, FK) | |EyeColor (int, FK) | |.... | |______________________|
This does mean you will create many tables, but consider this as a tradeoff with the many practical benefits to storing properties in a relationally correct way:
这确实意味着您将创建许多表,但将此视为权衡,以便以关系正确的方式存储属性的许多实际好处:
- You can use SQL data types appropriately, instead of treating everything a free-form varchar.
- 您可以适当地使用SQL数据类型,而不是将所有内容都视为*格式的varchar。
- You can use constraints or lookup tables to restrict certain properties by a predefined set of values.
- 您可以使用约束或查找表通过一组预定义的值来限制某些属性。
- You can make properties mandatory (i.e. NOT NULL) or use other constraints.
- 您可以强制使用属性(即NOT NULL)或使用其他约束。
- Data and indexes are stored more efficiently.
- 数据和索引的存储效率更高。
- Queries are easier for you to write and easier for the RDBMS to execute.
- 查询更容易编写,更容易执行RDBMS。
For more on this design, see Martin Fowler's book Patterns of Enterprise Application Architecture, or my presentation Practical Object-Oriented Models in SQL, or my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.
有关此设计的更多信息,请参阅Martin Fowler的书籍“企业应用程序架构模式”或我的演示文稿SQL中的实用面向对象模型,或者我的书“SQL反模式:避免数据库编程的陷阱”。
#2
2
Hmm...
Here is one way to do it:
Add SpeciesPropsId into SpeciesProps table.
Replace PropId with SpeciesPropsId in the OrganismPropsValues table.
You will need to change constrains a bit.
Need to add SpeciesProps to OrganismPropsValues constrain.
Need to remove OrganismPropsValues to Props constrain.
嗯......这是一种方法:将SpeciesPropsId添加到SpeciesProps表中。将ProtId替换为OrganismPropsValues表中的SpeciesPropsId。您需要稍微改变约束。需要将SpeciesProps添加到OrganismPropsValues约束。需要删除OrganismPropsValues到Props约束。
Technically you do not have to remove PropId from OrganismPropsValues, but if you keep it it will make data redundat.
从技术上讲,您不必从OrganismPropsValues中删除PropId,但如果保留它将使数据冗余。
#3
2
Whenever you have a diamond-shaped dependency like this, consider putting more emphasis on composite PRIMARY KEYS.
每当你有这样的菱形依赖时,考虑更多地强调复合PRIMARY KEYS。
Specifically, identify the Organism not just by OrganismId
, but by the combination of SpeciesId
and OrganismSubId
(you can still have OrganismId
, but keep it as an alternate key - not show here for brevity).
具体来说,不仅仅通过OrganismId识别生物体,而是通过SpeciesId和OrganismSubId的组合识别(您仍然可以使用OrganismId,但将其保留为备用键 - 为简洁起见,此处不显示)。
Once you do that, your model can be made to look like this:
一旦你这样做,你的模型可以看起来像这样:
The key thing to note here is that SpeciesId
is "propagated" down both edges of this diamond-shaped graph. This is what gives you the desired restriction of not being able "assign a value" to a property that was not "declared" for the given species.
这里要注意的关键是SpeciesId在这个菱形图的两个边缘“传播”。这就是为您提供所需的限制,即无法为未给定物种“声明”的属性“赋值”。
BTW, use singular when naming your tables. Also, consider using natural primary keys (e.g. SpeciesName
instead of SpeciesId
as PK) - if done right it can significantly increase the speed of your JOINs (especially in conjunction with clustering).
BTW,在命名表时使用单数。另外,考虑使用自然主键(例如SpeciesName而不是SpeciesId作为PK) - 如果完成,它可以显着提高JOIN的速度(特别是与聚类一起使用)。
#4
1
Another way to achieve these constraints would be to change the PK of Organism
table by dropping OrganismId
and adding a No
. Then make PK the compound (SpeciesId, No)
. So, "Bob"
would be (Human, 1)
, "Rufus"
would be (Dog, 1)
, etc.
实现这些约束的另一种方法是通过删除OrganismId并添加一个No来改变生物体PK表。然后使PK成为化合物(SpeciesId,No)。所以,“鲍勃”将是(人类,1),“鲁弗斯”将是(狗,1)等。
Then, add in the OrganismPropsValues
table, the SpeciesId
and the No
(removing the OrganismId
.)
然后,添加OrganismPropsValues表,SpeciesId和No(删除OrganismId。)
This will allow to change the FK from OrganismPropsValues
to Props
to reference SpeciesProps
instead:
这将允许将FK从OrganismPropsValues更改为Props,以引用SpeciesProps:
____________________ ____________________
| Organisms | | Species |
|--------------------| |--------------------|
|SpeciesId (int, FK) | |SpeciesId (int, PK) |
|No (int) |∞---------1|Name (varchar) |
|Name (varchar) | |____________________|
|PK (SpeciedId,No) | 1
|____________________| |
1 |
| |
| |
∞ ∞
______________________ ____________________ _______________
| OrganismPropsValues | | SpeciesProps | | Props |
|----------------------| |--------------------| |---------------|
|SpeciesId (int, PK) | |PropId (int,PK,FK) | ∞-----1|PropId (int,PK)|
|No (int, PK) | |SpeciesId(int,PK,FK)| |Name (varchar) |
|PropId (int, PK) | |____________________| |_______________|
|Value (varchar) | 1
|FK (SpeciesId,No) | |
|FK (SpeciesId,PropId) | |
|______________________| |
∞ |
| |
-------------------------------
#1
4
You're implementing the Entity-Attribute-Value antipattern. This can't be a normalized database design, because it's not relational.
您正在实现Entity-Attribute-Value反模式。这不是规范化的数据库设计,因为它不是关系型的。
What I would suggest instead is the Class Table Inheritance design pattern:
我建议的是Class Table Inheritance设计模式:
- Create one table for Organisms, containing properties common to all species.
- 为Organisms创建一个表,其中包含所有物种共有的属性。
-
Create one table per species, containing properties specific to that species. Each of these tables has a 1-to-1 relationship with Organisms, but each property belongs in its own column.
每个物种创建一个表,包含特定于该物种的属性。这些表中的每一个都与Organisms具有一对一的关系,但每个属性都属于它自己的列。
____________________ ____________________ | Organisms | | Species | |--------------------| |--------------------| |OrganismId (int, PK)| |SpeciesId (int, PK) | |SpeciesId (int, FK) |∞---------1|Name (varchar) | |Name (varchar) | |____________________| |____________________| 1 | | 1 ______________________ | HumanOrganism | |----------------------| |OrganismId (int, FK) | |Sex (enum) | |Race (int, FK) | |EyeColor (int, FK) | |.... | |______________________|
This does mean you will create many tables, but consider this as a tradeoff with the many practical benefits to storing properties in a relationally correct way:
这确实意味着您将创建许多表,但将此视为权衡,以便以关系正确的方式存储属性的许多实际好处:
- You can use SQL data types appropriately, instead of treating everything a free-form varchar.
- 您可以适当地使用SQL数据类型,而不是将所有内容都视为*格式的varchar。
- You can use constraints or lookup tables to restrict certain properties by a predefined set of values.
- 您可以使用约束或查找表通过一组预定义的值来限制某些属性。
- You can make properties mandatory (i.e. NOT NULL) or use other constraints.
- 您可以强制使用属性(即NOT NULL)或使用其他约束。
- Data and indexes are stored more efficiently.
- 数据和索引的存储效率更高。
- Queries are easier for you to write and easier for the RDBMS to execute.
- 查询更容易编写,更容易执行RDBMS。
For more on this design, see Martin Fowler's book Patterns of Enterprise Application Architecture, or my presentation Practical Object-Oriented Models in SQL, or my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.
有关此设计的更多信息,请参阅Martin Fowler的书籍“企业应用程序架构模式”或我的演示文稿SQL中的实用面向对象模型,或者我的书“SQL反模式:避免数据库编程的陷阱”。
#2
2
Hmm...
Here is one way to do it:
Add SpeciesPropsId into SpeciesProps table.
Replace PropId with SpeciesPropsId in the OrganismPropsValues table.
You will need to change constrains a bit.
Need to add SpeciesProps to OrganismPropsValues constrain.
Need to remove OrganismPropsValues to Props constrain.
嗯......这是一种方法:将SpeciesPropsId添加到SpeciesProps表中。将ProtId替换为OrganismPropsValues表中的SpeciesPropsId。您需要稍微改变约束。需要将SpeciesProps添加到OrganismPropsValues约束。需要删除OrganismPropsValues到Props约束。
Technically you do not have to remove PropId from OrganismPropsValues, but if you keep it it will make data redundat.
从技术上讲,您不必从OrganismPropsValues中删除PropId,但如果保留它将使数据冗余。
#3
2
Whenever you have a diamond-shaped dependency like this, consider putting more emphasis on composite PRIMARY KEYS.
每当你有这样的菱形依赖时,考虑更多地强调复合PRIMARY KEYS。
Specifically, identify the Organism not just by OrganismId
, but by the combination of SpeciesId
and OrganismSubId
(you can still have OrganismId
, but keep it as an alternate key - not show here for brevity).
具体来说,不仅仅通过OrganismId识别生物体,而是通过SpeciesId和OrganismSubId的组合识别(您仍然可以使用OrganismId,但将其保留为备用键 - 为简洁起见,此处不显示)。
Once you do that, your model can be made to look like this:
一旦你这样做,你的模型可以看起来像这样:
The key thing to note here is that SpeciesId
is "propagated" down both edges of this diamond-shaped graph. This is what gives you the desired restriction of not being able "assign a value" to a property that was not "declared" for the given species.
这里要注意的关键是SpeciesId在这个菱形图的两个边缘“传播”。这就是为您提供所需的限制,即无法为未给定物种“声明”的属性“赋值”。
BTW, use singular when naming your tables. Also, consider using natural primary keys (e.g. SpeciesName
instead of SpeciesId
as PK) - if done right it can significantly increase the speed of your JOINs (especially in conjunction with clustering).
BTW,在命名表时使用单数。另外,考虑使用自然主键(例如SpeciesName而不是SpeciesId作为PK) - 如果完成,它可以显着提高JOIN的速度(特别是与聚类一起使用)。
#4
1
Another way to achieve these constraints would be to change the PK of Organism
table by dropping OrganismId
and adding a No
. Then make PK the compound (SpeciesId, No)
. So, "Bob"
would be (Human, 1)
, "Rufus"
would be (Dog, 1)
, etc.
实现这些约束的另一种方法是通过删除OrganismId并添加一个No来改变生物体PK表。然后使PK成为化合物(SpeciesId,No)。所以,“鲍勃”将是(人类,1),“鲁弗斯”将是(狗,1)等。
Then, add in the OrganismPropsValues
table, the SpeciesId
and the No
(removing the OrganismId
.)
然后,添加OrganismPropsValues表,SpeciesId和No(删除OrganismId。)
This will allow to change the FK from OrganismPropsValues
to Props
to reference SpeciesProps
instead:
这将允许将FK从OrganismPropsValues更改为Props,以引用SpeciesProps:
____________________ ____________________
| Organisms | | Species |
|--------------------| |--------------------|
|SpeciesId (int, FK) | |SpeciesId (int, PK) |
|No (int) |∞---------1|Name (varchar) |
|Name (varchar) | |____________________|
|PK (SpeciedId,No) | 1
|____________________| |
1 |
| |
| |
∞ ∞
______________________ ____________________ _______________
| OrganismPropsValues | | SpeciesProps | | Props |
|----------------------| |--------------------| |---------------|
|SpeciesId (int, PK) | |PropId (int,PK,FK) | ∞-----1|PropId (int,PK)|
|No (int, PK) | |SpeciesId(int,PK,FK)| |Name (varchar) |
|PropId (int, PK) | |____________________| |_______________|
|Value (varchar) | 1
|FK (SpeciesId,No) | |
|FK (SpeciesId,PropId) | |
|______________________| |
∞ |
| |
-------------------------------