SQL:在保留约束的同时规范化数据库

时间:2021-09-23 02:12:45

Suppose I have the following tables:

假设我有以下表格:

     ____________________             ____________________
    |     Organisms      |           |       Species      |
    |--------------------|           |--------------------|
    |OrganismId (int, PK)|           |SpeciesId (int, PK) |
    |SpeciesId (int, FK) |∞---------1|Name (varchar)      |
    |Name (varchar)      |           |____________________|
    |____________________|                      1
              1                                 |
              |                                 |
              |                                 |
              ∞                                 ∞
    ______________________        ____________________          _______________
   | OrganismPropsValues  |      |   SpeciesProps     |        |     Props     |
   |----------------------|      |--------------------|        |---------------|
   |OrganismId (int, FK)  |      |PropId (int,PK,FK)  | ∞-----1|PropId (int,PK)|
   |PropId (int, FK)      |      |SpeciesId(int,PK,FK)|        |Name (varchar) |
   |Value (varchar)       |      |____________________|        |_______________|
   |______________________|                                             1
              ∞                                                         |
              |                                                         |
              -----------------------------------------------------------

A quick explanation of what I am trying to represent here: suppose we have a list of species, such as cat, dog, human, etc. We also have a set of properties (abbreviated Props so I could fit it more easily in the diagram) which apply to some but not necessarily all species--for example, this may be tail length (for species with tails), eye color (for those with eyes), etc.

快速解释我在这里要表达的内容:假设我们有一个物种列表,例如猫,狗,人等等。我们还有一组属性(缩写为Props,所以我可以更容易地在图中使用它)适用于某些但不一定适用于所有物种 - 例如,这可能是尾长(对于有尾巴的物种),眼睛颜色(对于有眼睛的人)等。

SpeciesProps is a linker table that defines which properties apply to which species-- so here we would have {Human, Eye Color}, {Dog, Eye Color}, {Cat, Eye Color}, {Dog, Tail Length}, {Cat, Tail Length}. We do not have {Human, Tail Length} because Tail Length is obviously not a valid property to apply to a human.

SpeciesProps是一个链接表,用于定义哪些属性适用于哪些物种 - 所以这里我们会有{人类,眼睛颜色},{狗,眼睛颜色},{猫,眼睛颜色},{狗,尾长},{猫,尾长}。我们没有{Human,Tail Length},因为Tail Length显然不是适用于人类的有效属性。

The Organisms table holds actual "implementations" of the species-- So here we might have {Human, Bob}, {Dog, Rufus}, and {Cat, Felix}.

“生物”表格包含物种的实际“实施” - 所以我们可能会有{人类,鲍勃},{狗,鲁弗斯}和{猫,菲利克斯}。

Here is now my issue: in the OrganismPropsValues table, I want to store the 'values' of the properties for each organism--so for example, for Bob I want to store {Bob, Eye Color, Blue}. For Rufus, I would want to store {Rufus, Eye Color, Brown} and {Rufus, Tail Length, 20} (similar for Felix). My problem however, is that in the schema that I have detailed, it is perfectly possible to store {Bob, Tail Length, 10}, even though the {Human, Tail Length} tuple does not exist in SpeciesProps. How can I modify this schema so I can enforce the constraints defined in SpeciesProps in OrganismPropsValues, while maintaining adequate normalization?

现在我的问题是:在OrganismPropsValues表中,我想存储每个生物体的属性的“值” - 例如,对于Bob我想存储{Bob,Eye Color,Blue}。对于Rufus,我想存储{Rufus,Eye Color,Brown}和{Rufus,Tail Length,20}(类似于Felix)。然而,我的问题是,在我详细描述的模式中,即使在SpeciesProps中不存在{Human,Tail Length}元组,也完全可以存储{Bob,Tail Length,10}。如何修改此模式,以便我可以强制执行OrganismPropsValues中SpeciesProps中定义的约束,同时保持足够的规范化?

4 个解决方案

#1


4  

You're implementing the Entity-Attribute-Value antipattern. This can't be a normalized database design, because it's not relational.

您正在实现Entity-Attribute-Value反模式。这不是规范化的数据库设计,因为它不是关系型的。

What I would suggest instead is the Class Table Inheritance design pattern:

我建议的是Class Table Inheritance设计模式:

  • Create one table for Organisms, containing properties common to all species.
  • 为Organisms创建一个表,其中包含所有物种共有的属性。
  • Create one table per species, containing properties specific to that species. Each of these tables has a 1-to-1 relationship with Organisms, but each property belongs in its own column.

    每个物种创建一个表,包含特定于该物种的属性。这些表中的每一个都与Organisms具有一对一的关系,但每个属性都属于它自己的列。

     ____________________             ____________________
    |     Organisms      |           |       Species      |
    |--------------------|           |--------------------|
    |OrganismId (int, PK)|           |SpeciesId (int, PK) |
    |SpeciesId (int, FK) |∞---------1|Name (varchar)      |
    |Name (varchar)      |           |____________________|
    |____________________|
              1
              |
              |
              1
     ______________________ 
    |    HumanOrganism     |
    |----------------------|
    |OrganismId (int, FK)  |
    |Sex      (enum)       |
    |Race     (int, FK)    |
    |EyeColor (int, FK)    |
    |....                  |
    |______________________|
    

This does mean you will create many tables, but consider this as a tradeoff with the many practical benefits to storing properties in a relationally correct way:

这确实意味着您将创建许多表,但将此视为权衡,以便以关系正确的方式存储属性的许多实际好处:

  • You can use SQL data types appropriately, instead of treating everything a free-form varchar.
  • 您可以适当地使用SQL数据类型,而不是将所有内容都视为*格式的varchar。
  • You can use constraints or lookup tables to restrict certain properties by a predefined set of values.
  • 您可以使用约束或查找表通过一组预定义的值来限制某些属性。
  • You can make properties mandatory (i.e. NOT NULL) or use other constraints.
  • 您可以强制使用属性(即NOT NULL)或使用其他约束。
  • Data and indexes are stored more efficiently.
  • 数据和索引的存储效率更高。
  • Queries are easier for you to write and easier for the RDBMS to execute.
  • 查询更容易编写,更容易执行RDBMS。

For more on this design, see Martin Fowler's book Patterns of Enterprise Application Architecture, or my presentation Practical Object-Oriented Models in SQL, or my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

有关此设计的更多信息,请参阅Martin Fowler的书籍“企业应用程序架构模式”或我的演示文稿SQL中的实用面向对象模型,或者我的书“SQL反模式:避免数据库编程的陷阱”。

#2


2  

Hmm...
Here is one way to do it:
Add SpeciesPropsId into SpeciesProps table.
Replace PropId with SpeciesPropsId in the OrganismPropsValues table.
You will need to change constrains a bit.
Need to add SpeciesProps to OrganismPropsValues constrain.
Need to remove OrganismPropsValues to Props constrain.

嗯......这是一种方法:将SpeciesPropsId添加到SpeciesProps表中。将ProtId替换为OrganismPropsValues表中的SpeciesPropsId。您需要稍微改变约束。需要将SpeciesProps添加到OrganismPropsValues约束。需要删除OrganismPropsValues到Props约束。

Technically you do not have to remove PropId from OrganismPropsValues, but if you keep it it will make data redundat.

从技术上讲,您不必从OrganismPropsValues中删除PropId,但如果保留它将使数据冗余。

#3


2  

Whenever you have a diamond-shaped dependency like this, consider putting more emphasis on composite PRIMARY KEYS.

每当你有这样的菱形依赖时,考虑更多地强调复合PRIMARY KEYS。

Specifically, identify the Organism not just by OrganismId, but by the combination of SpeciesId and OrganismSubId (you can still have OrganismId, but keep it as an alternate key - not show here for brevity).

具体来说,不仅仅通过OrganismId识别生物体,而是通过SpeciesId和OrganismSubId的组合识别(您仍然可以使用OrganismId,但将其保留为备用键 - 为简洁起见,此处不显示)。

Once you do that, your model can be made to look like this:

一旦你这样做,你的模型可以看起来像这样:

SQL:在保留约束的同时规范化数据库

The key thing to note here is that SpeciesId is "propagated" down both edges of this diamond-shaped graph. This is what gives you the desired restriction of not being able "assign a value" to a property that was not "declared" for the given species.

这里要注意的关键是SpeciesId在这个菱形图的两个边缘“传播”。这就是为您提供所需的限制,即无法为未给定物种“声明”的属性“赋值”。

BTW, use singular when naming your tables. Also, consider using natural primary keys (e.g. SpeciesName instead of SpeciesId as PK) - if done right it can significantly increase the speed of your JOINs (especially in conjunction with clustering).

BTW,在命名表时使用单数。另外,考虑使用自然主键(例如SpeciesName而不是SpeciesId作为PK) - 如果完成,它可以显着提高JOIN的速度(特别是与聚类一起使用)。

#4


1  

Another way to achieve these constraints would be to change the PK of Organism table by dropping OrganismId and adding a No. Then make PK the compound (SpeciesId, No). So, "Bob" would be (Human, 1), "Rufus" would be (Dog, 1), etc.

实现这些约束的另一种方法是通过删除OrganismId并添加一个No来改变生物体PK表。然后使PK成为化合物(SpeciesId,No)。所以,“鲍勃”将是(人类,1),“鲁弗斯”将是(狗,1)等。

Then, add in the OrganismPropsValues table, the SpeciesId and the No (removing the OrganismId.)

然后,添加OrganismPropsValues表,SpeciesId和No(删除OrganismId。)

This will allow to change the FK from OrganismPropsValues to Props to reference SpeciesProps instead:

这将允许将FK从OrganismPropsValues更改为Props,以引用SpeciesProps:

     ____________________             ____________________
    |     Organisms      |           |       Species      |
    |--------------------|           |--------------------|
    |SpeciesId (int, FK) |           |SpeciesId (int, PK) |
    |No (int)            |∞---------1|Name (varchar)      |
    |Name (varchar)      |           |____________________|
    |PK (SpeciedId,No)   |                      1
    |____________________|                      |
              1                                 |
              |                                 |
              |                                 |
              ∞                                 ∞
    ______________________        ____________________          _______________
   | OrganismPropsValues  |      |   SpeciesProps     |        |     Props     |
   |----------------------|      |--------------------|        |---------------|
   |SpeciesId (int, PK)   |      |PropId (int,PK,FK)  | ∞-----1|PropId (int,PK)|
   |No (int, PK)          |      |SpeciesId(int,PK,FK)|        |Name (varchar) |
   |PropId (int, PK)      |      |____________________|        |_______________|
   |Value (varchar)       |                 1
   |FK (SpeciesId,No)     |                 |
   |FK (SpeciesId,PropId) |                 |
   |______________________|                 |
              ∞                             |
              |                             |
              -------------------------------

#1


4  

You're implementing the Entity-Attribute-Value antipattern. This can't be a normalized database design, because it's not relational.

您正在实现Entity-Attribute-Value反模式。这不是规范化的数据库设计,因为它不是关系型的。

What I would suggest instead is the Class Table Inheritance design pattern:

我建议的是Class Table Inheritance设计模式:

  • Create one table for Organisms, containing properties common to all species.
  • 为Organisms创建一个表,其中包含所有物种共有的属性。
  • Create one table per species, containing properties specific to that species. Each of these tables has a 1-to-1 relationship with Organisms, but each property belongs in its own column.

    每个物种创建一个表,包含特定于该物种的属性。这些表中的每一个都与Organisms具有一对一的关系,但每个属性都属于它自己的列。

     ____________________             ____________________
    |     Organisms      |           |       Species      |
    |--------------------|           |--------------------|
    |OrganismId (int, PK)|           |SpeciesId (int, PK) |
    |SpeciesId (int, FK) |∞---------1|Name (varchar)      |
    |Name (varchar)      |           |____________________|
    |____________________|
              1
              |
              |
              1
     ______________________ 
    |    HumanOrganism     |
    |----------------------|
    |OrganismId (int, FK)  |
    |Sex      (enum)       |
    |Race     (int, FK)    |
    |EyeColor (int, FK)    |
    |....                  |
    |______________________|
    

This does mean you will create many tables, but consider this as a tradeoff with the many practical benefits to storing properties in a relationally correct way:

这确实意味着您将创建许多表,但将此视为权衡,以便以关系正确的方式存储属性的许多实际好处:

  • You can use SQL data types appropriately, instead of treating everything a free-form varchar.
  • 您可以适当地使用SQL数据类型,而不是将所有内容都视为*格式的varchar。
  • You can use constraints or lookup tables to restrict certain properties by a predefined set of values.
  • 您可以使用约束或查找表通过一组预定义的值来限制某些属性。
  • You can make properties mandatory (i.e. NOT NULL) or use other constraints.
  • 您可以强制使用属性(即NOT NULL)或使用其他约束。
  • Data and indexes are stored more efficiently.
  • 数据和索引的存储效率更高。
  • Queries are easier for you to write and easier for the RDBMS to execute.
  • 查询更容易编写,更容易执行RDBMS。

For more on this design, see Martin Fowler's book Patterns of Enterprise Application Architecture, or my presentation Practical Object-Oriented Models in SQL, or my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

有关此设计的更多信息,请参阅Martin Fowler的书籍“企业应用程序架构模式”或我的演示文稿SQL中的实用面向对象模型,或者我的书“SQL反模式:避免数据库编程的陷阱”。

#2


2  

Hmm...
Here is one way to do it:
Add SpeciesPropsId into SpeciesProps table.
Replace PropId with SpeciesPropsId in the OrganismPropsValues table.
You will need to change constrains a bit.
Need to add SpeciesProps to OrganismPropsValues constrain.
Need to remove OrganismPropsValues to Props constrain.

嗯......这是一种方法:将SpeciesPropsId添加到SpeciesProps表中。将ProtId替换为OrganismPropsValues表中的SpeciesPropsId。您需要稍微改变约束。需要将SpeciesProps添加到OrganismPropsValues约束。需要删除OrganismPropsValues到Props约束。

Technically you do not have to remove PropId from OrganismPropsValues, but if you keep it it will make data redundat.

从技术上讲,您不必从OrganismPropsValues中删除PropId,但如果保留它将使数据冗余。

#3


2  

Whenever you have a diamond-shaped dependency like this, consider putting more emphasis on composite PRIMARY KEYS.

每当你有这样的菱形依赖时,考虑更多地强调复合PRIMARY KEYS。

Specifically, identify the Organism not just by OrganismId, but by the combination of SpeciesId and OrganismSubId (you can still have OrganismId, but keep it as an alternate key - not show here for brevity).

具体来说,不仅仅通过OrganismId识别生物体,而是通过SpeciesId和OrganismSubId的组合识别(您仍然可以使用OrganismId,但将其保留为备用键 - 为简洁起见,此处不显示)。

Once you do that, your model can be made to look like this:

一旦你这样做,你的模型可以看起来像这样:

SQL:在保留约束的同时规范化数据库

The key thing to note here is that SpeciesId is "propagated" down both edges of this diamond-shaped graph. This is what gives you the desired restriction of not being able "assign a value" to a property that was not "declared" for the given species.

这里要注意的关键是SpeciesId在这个菱形图的两个边缘“传播”。这就是为您提供所需的限制,即无法为未给定物种“声明”的属性“赋值”。

BTW, use singular when naming your tables. Also, consider using natural primary keys (e.g. SpeciesName instead of SpeciesId as PK) - if done right it can significantly increase the speed of your JOINs (especially in conjunction with clustering).

BTW,在命名表时使用单数。另外,考虑使用自然主键(例如SpeciesName而不是SpeciesId作为PK) - 如果完成,它可以显着提高JOIN的速度(特别是与聚类一起使用)。

#4


1  

Another way to achieve these constraints would be to change the PK of Organism table by dropping OrganismId and adding a No. Then make PK the compound (SpeciesId, No). So, "Bob" would be (Human, 1), "Rufus" would be (Dog, 1), etc.

实现这些约束的另一种方法是通过删除OrganismId并添加一个No来改变生物体PK表。然后使PK成为化合物(SpeciesId,No)。所以,“鲍勃”将是(人类,1),“鲁弗斯”将是(狗,1)等。

Then, add in the OrganismPropsValues table, the SpeciesId and the No (removing the OrganismId.)

然后,添加OrganismPropsValues表,SpeciesId和No(删除OrganismId。)

This will allow to change the FK from OrganismPropsValues to Props to reference SpeciesProps instead:

这将允许将FK从OrganismPropsValues更改为Props,以引用SpeciesProps:

     ____________________             ____________________
    |     Organisms      |           |       Species      |
    |--------------------|           |--------------------|
    |SpeciesId (int, FK) |           |SpeciesId (int, PK) |
    |No (int)            |∞---------1|Name (varchar)      |
    |Name (varchar)      |           |____________________|
    |PK (SpeciedId,No)   |                      1
    |____________________|                      |
              1                                 |
              |                                 |
              |                                 |
              ∞                                 ∞
    ______________________        ____________________          _______________
   | OrganismPropsValues  |      |   SpeciesProps     |        |     Props     |
   |----------------------|      |--------------------|        |---------------|
   |SpeciesId (int, PK)   |      |PropId (int,PK,FK)  | ∞-----1|PropId (int,PK)|
   |No (int, PK)          |      |SpeciesId(int,PK,FK)|        |Name (varchar) |
   |PropId (int, PK)      |      |____________________|        |_______________|
   |Value (varchar)       |                 1
   |FK (SpeciesId,No)     |                 |
   |FK (SpeciesId,PropId) |                 |
   |______________________|                 |
              ∞                             |
              |                             |
              -------------------------------