将表标准化到第三范式。

时间:2022-10-03 23:06:53

This questions is obviously a homework question. I can't understand my professor and have no idea what he said during the election. I need to make step by step instructions to normalize the following table first into 1NF, then 2NF, then 3NF.

这个问题显然是家庭作业的问题。我不懂我的教授,也不知道他在选举期间说了些什么。我需要一步一步地做指令,先将下表归一为1NF,然后是2NF,然后是3NF。

将表标准化到第三范式。

I appreciate any help and instruction.

感谢您的帮助和指导。

4 个解决方案

#1


18  

Okay, I hope I remember all of them correctly, let's start...

好吧,我希望我能正确地记住它们,让我们开始吧…

Rules

To make them very short (and not very precise, just to give you a first idea of what it's all about):

使它们非常短(不是非常精确,只是为了让你初步了解它是关于什么的):

  • NF1: A table cell must not contain more than one value.
  • NF1:表单元格不能包含一个以上的值。
  • NF2: NF1, plus all non-primary-key columns must depend on all primary key columns.
  • NF2: NF1,加上所有非主键列必须依赖于所有主键列。
  • NF3: NF2, plus non-primary key columns may not depend on each other.
  • NF3: NF2,加上非主键列可能不相互依赖。

Instructions

  • NF1: find table cells containing more than one value, put those into separate columns.
  • NF1:查找包含多个值的表单元格,将它们放在单独的列中。
  • NF2: find columns depending on less then all primary key columns, put them into another table which has only those primary key columns they really depend on.
  • NF2:找到依赖于比所有主键列更少的列,将它们放到另一个只有它们真正依赖的主键列的表中。
  • NF3: find columns which depend on other non-primary-key columns, in addition to depending on the primary key. Put the dependent columns into another table.
  • NF3:查找依赖于其他非主键列的列,以及依赖于主键的列。将相关列放到另一个表中。

Examples

NF1

a column "state" has values like "WA, Washington". NF1 is violated, because that's two values, abbreviation and name.

“state”一栏有“WA, Washington”这样的价值观。违反了NF1,因为它有两个值,缩写和名称。

Solution: To fulfill NF1, create two columns, STATE_ABBREVIATION and STATE_NAME.

解决方案:要实现NF1,创建两个列,state_缩写和STATE_NAME。

NF2

Imagine you've got a table with these 4 columns, expressing international names of car models:

假设你有一个表格,上面有这4个栏,上面写着国际车型的名字:

  • COUNTRY_ID (numeric, primary key)
  • COUNTRY_ID(数字、主键)
  • CAR_MODEL_ID (numeric, primary key)
  • CAR_MODEL_ID(数字、主键)
  • COUNTRY_NAME (varchar)
  • COUNTRY_NAME(varchar)
  • CAR_MODEL_NAME (varchar)
  • CAR_MODEL_NAME(varchar)

The table may have these two data rows:

该表可能有这两行数据:

  • Row 1: COUNTRY_ID=1, CAR_MODEL_ID=5, COUNTRY_NAME=USA, CAR_MODEL_NAME=Fox
  • 第1行:COUNTRY_ID=1, CAR_MODEL_ID=5, COUNTRY_NAME=USA, CAR_MODEL_NAME=Fox
  • Row 2: COUNTRY_ID=2, CAR_MODEL_ID=5, COUNTRY_NAME=Germany, CAR_MODEL_NAME=Polo
  • 第2行:COUNTRY_ID=2, CAR_MODEL_ID=5, COUNTRY_NAME=Germany, CAR_MODEL_NAME=Polo

That says, model "Fox" is called "Fox" in USA, but the same car model is called "Polo" in Germany (don't remember if that's actually true).

也就是说,在美国,模特“狐狸”被称为“狐狸”,但在德国,同样的车型被称为“Polo”(不记得这是不是真的)。

NF2 is violated, because the country name does not depend on both car model ID and country ID, but only on the country ID.

NF2被违反,因为国家名称不依赖于汽车型号ID和国家ID,而仅依赖于国家ID。

Solution: To fulfill NF2, move COUNTRY_NAME into a separate table "COUNTRY" with columns COUNTRY_ID (primary key) and COUNTRY_NAME. To get a result set including the country name, you'll need to connect the two tables using a JOIN.

解决方案:要实现NF2,将COUNTRY_NAME移动到一个单独的表“COUNTRY”中,列是COUNTRY_ID(主键)和COUNTRY_NAME。要获得包含国家名的结果集,需要使用连接连接连接两个表。

NF3

Say you've got a table with these columns, expressing climatic conditions of states:

假设你有一个表格,上面有这些列,表示状态的气候条件:

  • STATE_ID (varchar, primary key)
  • STATE_ID(varchar、主键)
  • CLIME_ID (foreign key, ID of a climate zone like "desert", "rainforest", etc.)
  • CLIME_ID(外键,气候区ID,如“沙漠”、“雨林”等)
  • IS_MOSTLY_DRY (bool)
  • IS_MOSTLY_DRY(保龄球)

NF3 is violated, because IS_MOSTLY_DRY only depends on the CLIME_ID (let's at least assume that), but not on the STATE_ID (primary key).

违反了NF3,因为IS_MOSTLY_DRY只依赖于CLIME_ID(让我们至少假设),而不是STATE_ID(主键)。

Solution: to fulfill NF3, put the column MOSTLY_DRY into the climate zone table.

解决方案:为了实现NF3,将最干燥的列放入气候区表中。


Here are some thoughts regarding the actual table given in the exercise:

以下是对练习中给出的实际表格的一些想法:

I apply the above mentioned NF rules without to challenge the primary key columns. But they actually don't make sense, as we will see later.

我应用上面提到的NF规则而不去质疑主键列。但是它们实际上没有意义,我们以后会看到。

  • NF1 isn't violated, each cell holds just one value.
  • NF1没有被违反,每个单元格只有一个值。
  • NF2 is violated by EMP_NM and all the phone numbers, because all of these columns don't depend on the full primary key. They all depend on EMP_ID (PK), but not on DEPT_CD (PK). I assume that phone numbers stay the same when an employee moves to another department.
  • NF2被EMP_NM和所有的电话号码所侵犯,因为所有这些列都不依赖于完整的主键。它们都依赖于EMP_ID (PK),而不是DEPT_CD (PK)。我假设当一个员工转到另一个部门时,电话号码保持不变。
  • NF2 is also violated by DEPT_NM, because DEPT_NM does not depend on the full primary key. It depends on DEPT_CD, but not on EMP_ID.
  • NF2也被违反了DEPT_NM,因为DEPT_NM不依赖于完整的主键。它依赖于DEPT_CD,而不是EMP_ID。
  • NF2 is also violated by all the skill columns, because they are not department- but only employee-specific.
  • NF2也被所有技能列所违反,因为它们不是部门,而是针对员工的。
  • NF3 is violated by SKILL_NM, because the skill name only depends on the skill code, which is not even part of the composite primary key.
  • NF3被SKILL_NM所违反,因为技能名仅依赖于技能代码,它甚至不是复合主键的一部分。
  • SKILL_YRS violates NF3, because it depends on a primary key member (EMP_ID) and a non-primary key member (SKILL_CD). So it is partly dependent on a non-primary-key attribute.
  • SKILL_YRS违反了NF3,因为它依赖于主键成员(EMP_ID)和非主键成员(SKILL_CD)。所以它部分依赖于非主键属性。

So if you remove all columns which violate NF2 or NF3, only the primary key remains (EMP_ID and DEPT_CD). That remaining part violates the given business rules: this structure would allow an employee to work in multiple departments at the same time.

因此,如果删除所有违反NF2或NF3的列,则只保留主键(EMP_ID和DEPT_CD)。剩下的部分违反了给定的业务规则:这个结构允许员工同时在多个部门工作。

Let's review it from a distance. Your data model is about employees, departments, skills and the relationships between these entities. If you normalize that, you'll end up with one table for the employees (containing DEPT_CD as a foreign key), one for the departments, one for the skills, and another one for the relationship between employees and skills, holding the "skill years" for each tuple of EMP_ID and SKILL_CD (my teacher would have called the latter an "associative entity").

让我们从远处来复习一下。您的数据模型是关于员工、部门、技能以及这些实体之间的关系。如果你正常化,你最终会得到一个表为员工(包含DEPT_CD作为外键),一个部门,一个技能,和另一个员工和技能之间的关系,保持每个元组的“能力年”EMP_ID和SKILL_CD(我的老师称后者为“关联实体”)。

#2


4  

Looking at the first two rows in your table,
and looking at which columns are tagged "PK" in that table,
and assuming that "PK" stands for "Primary Key",
and looking at the values that appear for those two columns in those two rows,
I would recommend your professor to get the hell out of database teaching and not come back until he got himself educated properly on the subject.

看着第一个两排在你的表,并查看哪些列标记“PK”表,并假设“PK”代表“主键”,看这些两列的值出现在那些两行,我建议你离开数据库教学教授和回来,直到他自己的教育不当。

This exercise cannot be taken seriously because the problem statement itself contains hopelessly contradictory information.

这个练习不能被认真对待,因为问题陈述本身包含了无可救药的矛盾信息。

(Observe that as a consequence, there simply is not any such thing as a "good" or "right" answer to this question !!!)

(请注意,这个问题根本不存在“好”或“正确”的答案!!)

#3


1  

Another oversimplified answer coming up.

另一个过于简化的答案出现了。

In a 3NF relational table, every nonkey value is determined by the key, the whole key, and nothing but the key (so help me Codd ;)).

在3NF关系表中,每个非键值都是由键、整个键决定的,只有键(所以请帮助我编写Codd;)。

1NF: The key. This means that if you specify the key value, and a named column, there will be at most one value at the intersection of the row and the column. A multivalue, like a series of values separated by commas, is disallowed, because you can't get directly to the value with just a key and acolumn name.

1 nf:关键。这意味着,如果您指定了键值和一个已命名列,那么在行与列的交汇处最多将有一个值。不允许使用多值,比如由逗号分隔的一系列值,因为您不能仅使用键和acolumn名称直接访问该值。

2NF: The whole key. If a column that is not part of the key is determined by a proper subset of the key columns, then 2NF is being violated.

2 nf:整个关键。如果不是键的一部分的列是由键列的适当子集决定的,那么就违反了2NF。

3NF: And nothing but the key. If a column is determined by some set of non key columns, then 3NF is being violated.

只有钥匙。如果一个列是由一组非键列决定的,那么就违反了3NF。

#4


0  

3NF satisfies only if it is in 2nd normal form and doesnot have any transitive dependency and all the non-key attributes should depend on the primary key.

3NF只满足于第二标准形式,没有任何传递依赖关系,所有非键属性都应该依赖于主键。

Transitive dependency: R=(A,B,C). A->B AND B->C THEN A->C

传递依赖:R =(A,B,C)。A->B和B->C然后A->C。

#1


18  

Okay, I hope I remember all of them correctly, let's start...

好吧,我希望我能正确地记住它们,让我们开始吧…

Rules

To make them very short (and not very precise, just to give you a first idea of what it's all about):

使它们非常短(不是非常精确,只是为了让你初步了解它是关于什么的):

  • NF1: A table cell must not contain more than one value.
  • NF1:表单元格不能包含一个以上的值。
  • NF2: NF1, plus all non-primary-key columns must depend on all primary key columns.
  • NF2: NF1,加上所有非主键列必须依赖于所有主键列。
  • NF3: NF2, plus non-primary key columns may not depend on each other.
  • NF3: NF2,加上非主键列可能不相互依赖。

Instructions

  • NF1: find table cells containing more than one value, put those into separate columns.
  • NF1:查找包含多个值的表单元格,将它们放在单独的列中。
  • NF2: find columns depending on less then all primary key columns, put them into another table which has only those primary key columns they really depend on.
  • NF2:找到依赖于比所有主键列更少的列,将它们放到另一个只有它们真正依赖的主键列的表中。
  • NF3: find columns which depend on other non-primary-key columns, in addition to depending on the primary key. Put the dependent columns into another table.
  • NF3:查找依赖于其他非主键列的列,以及依赖于主键的列。将相关列放到另一个表中。

Examples

NF1

a column "state" has values like "WA, Washington". NF1 is violated, because that's two values, abbreviation and name.

“state”一栏有“WA, Washington”这样的价值观。违反了NF1,因为它有两个值,缩写和名称。

Solution: To fulfill NF1, create two columns, STATE_ABBREVIATION and STATE_NAME.

解决方案:要实现NF1,创建两个列,state_缩写和STATE_NAME。

NF2

Imagine you've got a table with these 4 columns, expressing international names of car models:

假设你有一个表格,上面有这4个栏,上面写着国际车型的名字:

  • COUNTRY_ID (numeric, primary key)
  • COUNTRY_ID(数字、主键)
  • CAR_MODEL_ID (numeric, primary key)
  • CAR_MODEL_ID(数字、主键)
  • COUNTRY_NAME (varchar)
  • COUNTRY_NAME(varchar)
  • CAR_MODEL_NAME (varchar)
  • CAR_MODEL_NAME(varchar)

The table may have these two data rows:

该表可能有这两行数据:

  • Row 1: COUNTRY_ID=1, CAR_MODEL_ID=5, COUNTRY_NAME=USA, CAR_MODEL_NAME=Fox
  • 第1行:COUNTRY_ID=1, CAR_MODEL_ID=5, COUNTRY_NAME=USA, CAR_MODEL_NAME=Fox
  • Row 2: COUNTRY_ID=2, CAR_MODEL_ID=5, COUNTRY_NAME=Germany, CAR_MODEL_NAME=Polo
  • 第2行:COUNTRY_ID=2, CAR_MODEL_ID=5, COUNTRY_NAME=Germany, CAR_MODEL_NAME=Polo

That says, model "Fox" is called "Fox" in USA, but the same car model is called "Polo" in Germany (don't remember if that's actually true).

也就是说,在美国,模特“狐狸”被称为“狐狸”,但在德国,同样的车型被称为“Polo”(不记得这是不是真的)。

NF2 is violated, because the country name does not depend on both car model ID and country ID, but only on the country ID.

NF2被违反,因为国家名称不依赖于汽车型号ID和国家ID,而仅依赖于国家ID。

Solution: To fulfill NF2, move COUNTRY_NAME into a separate table "COUNTRY" with columns COUNTRY_ID (primary key) and COUNTRY_NAME. To get a result set including the country name, you'll need to connect the two tables using a JOIN.

解决方案:要实现NF2,将COUNTRY_NAME移动到一个单独的表“COUNTRY”中,列是COUNTRY_ID(主键)和COUNTRY_NAME。要获得包含国家名的结果集,需要使用连接连接连接两个表。

NF3

Say you've got a table with these columns, expressing climatic conditions of states:

假设你有一个表格,上面有这些列,表示状态的气候条件:

  • STATE_ID (varchar, primary key)
  • STATE_ID(varchar、主键)
  • CLIME_ID (foreign key, ID of a climate zone like "desert", "rainforest", etc.)
  • CLIME_ID(外键,气候区ID,如“沙漠”、“雨林”等)
  • IS_MOSTLY_DRY (bool)
  • IS_MOSTLY_DRY(保龄球)

NF3 is violated, because IS_MOSTLY_DRY only depends on the CLIME_ID (let's at least assume that), but not on the STATE_ID (primary key).

违反了NF3,因为IS_MOSTLY_DRY只依赖于CLIME_ID(让我们至少假设),而不是STATE_ID(主键)。

Solution: to fulfill NF3, put the column MOSTLY_DRY into the climate zone table.

解决方案:为了实现NF3,将最干燥的列放入气候区表中。


Here are some thoughts regarding the actual table given in the exercise:

以下是对练习中给出的实际表格的一些想法:

I apply the above mentioned NF rules without to challenge the primary key columns. But they actually don't make sense, as we will see later.

我应用上面提到的NF规则而不去质疑主键列。但是它们实际上没有意义,我们以后会看到。

  • NF1 isn't violated, each cell holds just one value.
  • NF1没有被违反,每个单元格只有一个值。
  • NF2 is violated by EMP_NM and all the phone numbers, because all of these columns don't depend on the full primary key. They all depend on EMP_ID (PK), but not on DEPT_CD (PK). I assume that phone numbers stay the same when an employee moves to another department.
  • NF2被EMP_NM和所有的电话号码所侵犯,因为所有这些列都不依赖于完整的主键。它们都依赖于EMP_ID (PK),而不是DEPT_CD (PK)。我假设当一个员工转到另一个部门时,电话号码保持不变。
  • NF2 is also violated by DEPT_NM, because DEPT_NM does not depend on the full primary key. It depends on DEPT_CD, but not on EMP_ID.
  • NF2也被违反了DEPT_NM,因为DEPT_NM不依赖于完整的主键。它依赖于DEPT_CD,而不是EMP_ID。
  • NF2 is also violated by all the skill columns, because they are not department- but only employee-specific.
  • NF2也被所有技能列所违反,因为它们不是部门,而是针对员工的。
  • NF3 is violated by SKILL_NM, because the skill name only depends on the skill code, which is not even part of the composite primary key.
  • NF3被SKILL_NM所违反,因为技能名仅依赖于技能代码,它甚至不是复合主键的一部分。
  • SKILL_YRS violates NF3, because it depends on a primary key member (EMP_ID) and a non-primary key member (SKILL_CD). So it is partly dependent on a non-primary-key attribute.
  • SKILL_YRS违反了NF3,因为它依赖于主键成员(EMP_ID)和非主键成员(SKILL_CD)。所以它部分依赖于非主键属性。

So if you remove all columns which violate NF2 or NF3, only the primary key remains (EMP_ID and DEPT_CD). That remaining part violates the given business rules: this structure would allow an employee to work in multiple departments at the same time.

因此,如果删除所有违反NF2或NF3的列,则只保留主键(EMP_ID和DEPT_CD)。剩下的部分违反了给定的业务规则:这个结构允许员工同时在多个部门工作。

Let's review it from a distance. Your data model is about employees, departments, skills and the relationships between these entities. If you normalize that, you'll end up with one table for the employees (containing DEPT_CD as a foreign key), one for the departments, one for the skills, and another one for the relationship between employees and skills, holding the "skill years" for each tuple of EMP_ID and SKILL_CD (my teacher would have called the latter an "associative entity").

让我们从远处来复习一下。您的数据模型是关于员工、部门、技能以及这些实体之间的关系。如果你正常化,你最终会得到一个表为员工(包含DEPT_CD作为外键),一个部门,一个技能,和另一个员工和技能之间的关系,保持每个元组的“能力年”EMP_ID和SKILL_CD(我的老师称后者为“关联实体”)。

#2


4  

Looking at the first two rows in your table,
and looking at which columns are tagged "PK" in that table,
and assuming that "PK" stands for "Primary Key",
and looking at the values that appear for those two columns in those two rows,
I would recommend your professor to get the hell out of database teaching and not come back until he got himself educated properly on the subject.

看着第一个两排在你的表,并查看哪些列标记“PK”表,并假设“PK”代表“主键”,看这些两列的值出现在那些两行,我建议你离开数据库教学教授和回来,直到他自己的教育不当。

This exercise cannot be taken seriously because the problem statement itself contains hopelessly contradictory information.

这个练习不能被认真对待,因为问题陈述本身包含了无可救药的矛盾信息。

(Observe that as a consequence, there simply is not any such thing as a "good" or "right" answer to this question !!!)

(请注意,这个问题根本不存在“好”或“正确”的答案!!)

#3


1  

Another oversimplified answer coming up.

另一个过于简化的答案出现了。

In a 3NF relational table, every nonkey value is determined by the key, the whole key, and nothing but the key (so help me Codd ;)).

在3NF关系表中,每个非键值都是由键、整个键决定的,只有键(所以请帮助我编写Codd;)。

1NF: The key. This means that if you specify the key value, and a named column, there will be at most one value at the intersection of the row and the column. A multivalue, like a series of values separated by commas, is disallowed, because you can't get directly to the value with just a key and acolumn name.

1 nf:关键。这意味着,如果您指定了键值和一个已命名列,那么在行与列的交汇处最多将有一个值。不允许使用多值,比如由逗号分隔的一系列值,因为您不能仅使用键和acolumn名称直接访问该值。

2NF: The whole key. If a column that is not part of the key is determined by a proper subset of the key columns, then 2NF is being violated.

2 nf:整个关键。如果不是键的一部分的列是由键列的适当子集决定的,那么就违反了2NF。

3NF: And nothing but the key. If a column is determined by some set of non key columns, then 3NF is being violated.

只有钥匙。如果一个列是由一组非键列决定的,那么就违反了3NF。

#4


0  

3NF satisfies only if it is in 2nd normal form and doesnot have any transitive dependency and all the non-key attributes should depend on the primary key.

3NF只满足于第二标准形式,没有任何传递依赖关系,所有非键属性都应该依赖于主键。

Transitive dependency: R=(A,B,C). A->B AND B->C THEN A->C

传递依赖:R =(A,B,C)。A->B和B->C然后A->C。