When creating a database structure, what are good guidelines to follow or good ways to determine how far a database should be normalized? Should you create an un-normalized database and split it apart as the project progresses? Should you create it fully normalized and combine tables as needed for performance?
在创建数据库结构时,要遵循哪些好的指导方针或确定数据库应该规范化的好方法?您是否应该创建一个未规范化的数据库并在项目进展时将其拆分?您是否应该将其完全标准化并根据性能需要组合表格?
14 个解决方案
#1
16
You want to start designing a normalized database up to 3rd normal form. As you develop the business logic layer you may decide you have to denormalize a bit but never, never go below the 3rd form. Always, keep 1rd and 2nd form compliant. You want to denormalize for simplicity of code, not for performance. Use indexes and stored procedures for that :)
您希望开始设计规范化数据库,直到第3范式。当您开发业务逻辑层时,您可能会决定必须对其进行非规范化,但绝不会低于第3种形式。始终,保持第1和第2表格合规。您希望对代码的简单性进行非规范化,而不是为了提高性能。使用索引和存储过程:)
The reason not "normalize as you go" is that you would have to modify the code you already have written most every time you modify the database design.
不是“随意标准化”的原因是每次修改数据库设计时都必须修改已经编写的代码。
There are a couple of good articles:
有几篇好文章:
#2
10
@GrizzlyGuru A wise man once told me "normalize till it hurts, denormalize till it works".
@GrizzlyGuru一位聪明的人曾告诉我“正常化直到它受伤,反正常化直到它起作用”。
It hasn't failed me yet :)
它还没有让我失望:)
I disagree about starting with it in un-normalized form however, in my experience its' been easier to adapt your application to deal with a less normalized database than a more-normalized one. It could also lead to situations where its' working "well enough" so you never get around to normalizing it (until its' too late!)
我不同意以非规范化的形式开始它,但是根据我的经验,它更容易调整你的应用程序以处理规范化程度较低的数据库而不是更规范化的数据库。它还可能导致其“工作”得足够好的情况“所以你永远不会让它正常化(直到它'太晚了!)
#3
7
Normalization means eliminating redundant data. In other words, an un-normalized or de-normalized database is a database where the same information will be repeated in multiple different places. This means you have to write more complex update statement to ensure you update the same data everywhere, otherwise you get inconsistent data which in turn means the output of queries is unrealiable.
标准化意味着消除冗余数据。换句话说,非规范化或非规范化数据库是一个数据库,其中相同的信息将在多个不同的地方重复。这意味着您必须编写更复杂的更新语句以确保在任何地方更新相同的数据,否则您将获得不一致的数据,这反过来意味着查询的输出是不可实现的。
This is a pretty huge problem, so I would say denormalization hurts, not the other way around.
这是一个非常大的问题,所以我会说非规范化会伤害,而不是相反。
In some case you may deliberately decide to denormalize specific parts of a database, if you judge that the benefit outweighs the extra work in updating data and the risk of data corruption. For example with datawarehouses, where data is aggregated for performance reasons, and data if often not updated after the initial entry which reduce the risk of inconsistencies.
在某些情况下,如果您判断利益超过更新数据的额外工作和数据损坏的风险,您可能会故意决定对数据库的特定部分进行非规范化。例如,对于数据仓库,其中出于性能原因聚合数据,以及在初始进入后经常不更新的数据,这样可以降低不一致的风险。
But in general be weary of denormalizing for performance. For example the performance benefit of a denormalized join can typically be achieved by using materialized view (also called indexed view), which will be as fast as querying a denormalized table, but still protects the consistency of the data.
但总的来说,厌倦了对性能进行非规范化。例如,非规范化连接的性能优势通常可以通过使用物化视图(也称为索引视图)来实现,该视图与查询非规范化表一样快,但仍然保护数据的一致性。
#4
3
Jeff has a pretty good overview of his philosophy on his blog: Maybe normalization isn't normal. The main thing is: don't overdo normalization. But I think an even bigger point to take away is that it probably doesn't matter too much. Unless you're running the next Google, you probably won't notice much of a difference until your application grows.
杰夫在他的博客上对他的哲学有一个非常好的概述:也许正常化不正常。主要的是:不要过度规范化。但我认为更重要的一点是它可能并不重要。除非您正在运行下一个Google,否则在您的应用程序增长之前,您可能不会发现太多差异。
#5
3
Database normizational I feel is an art form.
我认为数据库规范化是一种艺术形式。
You don't want to over normalize your database because you will have too many tables and it will cause your queries of even simple objects take longer than they should.
您不希望过度规范化数据库,因为您将拥有太多的表,这将导致您对简单对象的查询所需的时间超过应有的时间。
A good rule of thumb I follow is to normalize the same information repeated over and over again.
我遵循的一个好的经验法则是将一遍又一遍重复的相同信息规范化。
For example if you are creating a contact management application it would make sense to have Address (Street, City, State, Zip, etc. . ) as its own table.
例如,如果您要创建联系人管理应用程序,那么将地址(街道,城市,州,邮政等)作为自己的表是有意义的。
However if you have only 2 types of contacts, Business or personal, do you need a contact type table if you know you are only going to have 2? For me no.
但是,如果您只有2种类型的联系人,无论是商务还是个人,如果您知道自己只有2个,那么您是否需要联系人类型表?对我来说没有。
I would start by first figuring out the datatypes you need. Use a modeling program to help like Visio. You don't want to start with a non-normalized database because you will eventually normalize. Start by putting objects in there logical groupings, as you see data repeated take that data into a new table. I would keep up with that process until you feel you have the database designed.
我首先要弄清楚你需要的数据类型。使用建模程序来帮助喜欢Visio。您不希望以非规范化数据库开头,因为您最终会规范化。首先将对象放在逻辑分组中,如您所见,重复将数据转换为新表。我会跟上这个过程,直到你觉得你有数据库设计。
Let testing tell you if you need to combine tables. A well written query can cover any over normalization.
让测试告诉您是否需要组合表。一个写得很好的查询可以涵盖任何过度规范化。
#6
2
I believe starting with an un-normalized database and moving toward normalized as you progress is usually easiest to get started. To the question of how far to normalize, my philosophy is to normalize until is starts to hurt. That may sound a little flippant, but it generally is a good way to gauge how far to take it.
我相信从一个非规范化的数据库开始,随着你的进步而向标准化的方向发展通常是最容易开始的。对于规范化程度的问题,我的理念是规范化,直到开始受到伤害。这可能听起来有点轻率,但它通常是衡量它走多远的好方法。
#7
2
Having a normalized database will give you the most flexibility and the easiest maintenance. I always start with a normalized database and then un-normalize only when there is an real life problem that needs addressing.
拥有标准化数据库将为您提供最大的灵活性和最简单的维护。我总是从一个规范化的数据库开始,然后只有在存在需要解决的现实生活问题时才进行非规范化。
I view this similarly to code performance i.e. write maintainable, flexible code and make compromises for performance when you know that there is a performance problem.
我认为这类似于代码性能,即编写可维护的,灵活的代码,并在您知道存在性能问题时对性能做出妥协。
#8
2
The original poster never described in what situation the database will be used. If it's going to be any type of data warehousing project where at some point you will need cubes (OLAP) processing data for some front-end, it would be wiser to start off with star schema (fact tables + dimension) rather than looking into normalization. The Kimball books will be of great help in this case.
原始海报从未描述数据库将在何种情况下使用。如果它将成为任何类型的数据仓库项目,在某些时候您将需要多维数据集(OLAP)处理数据用于某些前端,那么从明星模式(事实表+维度)开始而不是研究更明智正常化。在这种情况下,金博尔的书籍将会有很大的帮助。
#9
1
I agree that it is typically better to start out with a normalized DB and then denormalize to solve very specific problems, but I'd probably start at Boyce-Codd Normal Form instead of 3rd Normal Form.
我同意通常最好先使用规范化的数据库,然后进行非规范化以解决非常具体的问题,但我可能会从Boyce-Codd Normal Form而不是3rd Normal Form开始。
#10
1
The truth is that "it depends." It depends on a lot of factors including:
事实是“它取决于”。这取决于很多因素,包括:
- Code (Hand-coded or Tool driven (like ETL packages))
- Primary Application (Transaction Processing, Data Warehousing, Reporting)
- Type of Database (MySQL, DB/2, Oracle, Netezza, etc.)
- Database Architecture (Tablular, Columnar)
- DBA Quality (proactive, reactive, inactive)
- Expected Data Quality (do you want to enforce data quality at the application level or the database level?)
代码(手工编码或工具驱动(如ETL包))
主要应用程序(事务处理,数据仓库,报告)
数据库类型(MySQL,DB / 2,Oracle,Netezza等)
数据库架构(Tablular,Columnar)
DBA质量(主动,被动,不活动)
预期的数据质量(您是否希望在应用程序级别或数据库级别强制执行数据质量?)
#11
1
I agree that you should normalise as much as possible and only denormalise if absolutely necessary for performance. And with materialised views or caching schemes this is often not necessary.
我同意你应尽可能正常化,只有在表现绝对必要的情况下才能反规范化。通过物化视图或缓存方案,这通常是不必要的。
The thing to bare in mind is that by normalising your model you are giving the database more information on how to constrain your data so that you can remove the risk of update anomalies that can occur in incompletely normalised models.
要记住的是,通过规范化模型,您可以为数据库提供有关如何约束数据的更多信息,以便您可以消除在不完全规范化的模型中可能发生的更新异常的风险。
If you denormalise then you either need to live with the fact that you may get update anomolies or you need to implement the constraint validation yourself in your application code. This takes away a lot of the benefit of using a DBMS which lets you define these constraints declaratively.
如果你反规范化,那么你需要接受这样一个事实:你可能会得到更新的异常,或者你需要在你的应用程序代码中自己实现约束验证。这消除了使用DBMS的许多好处,它允许您以声明方式定义这些约束。
So assuming the same quality of code, denormalising may not actually give you better performance.
因此,假设代码质量相同,非规范化实际上可能无法为您提供更好的性能。
Another thing to mention is that hardware is cheap these days so throwing extra processing power at the problem is often more cost effective than accepting the potential costs of cleaning up corrupted data.
另外需要提及的是,硬件现在很便宜,因此在解决问题时投入额外的处理能力通常比接受清理损坏数据的潜在成本更具成本效益。
#12
-1
Often if you normalize as far as your other software will let you, you'll be done.
通常,如果你的其他软件会让你正常化,你就会完成。
For example, when using Object-Relational mapping technology, you'll have a rich set of semantics for various many-to-one and many-to-many relationships. Under the hood that'll provide join tables with effectively 2 primary keys. While relatively rare, true normalization often gives you relations with 3 or more primary keys. In cases like this, I prefer to stick with the O/R and roll my own code to avoid the various DB anomalies.
例如,在使用对象关系映射技术时,您将拥有丰富的语义集,用于各种多对一和多对多关系。在引擎盖下,将提供有效2个主键的连接表。虽然相对罕见,但真正的规范化通常会为您提供3个或更多主键的关系。在这种情况下,我更喜欢坚持使用O / R并滚动我自己的代码以避免各种数据库异常。
#13
-1
Just try to use common sense.
只是尝试使用常识。
Also some say - and I have to agree with them - that, if you're finding yourself joining 6 (the magic number) tables together in most of your queries - not including reporting related ones- , than you might consider denormalizing a bit.
还有人说 - 我必须同意他们 - 如果你发现自己在大多数查询中加入了6个(神奇数字)表 - 不包括与报告相关的表 - 那么你可能会考虑对其进行非规范化。
#14
-1
Don't forget The mother of all database normalization debates on Coding Horror (summarized on the High Scalability blog).
不要忘记关于Coding Horror的所有数据库规范化争论的母亲(在High Scalability博客上总结)。
#1
16
You want to start designing a normalized database up to 3rd normal form. As you develop the business logic layer you may decide you have to denormalize a bit but never, never go below the 3rd form. Always, keep 1rd and 2nd form compliant. You want to denormalize for simplicity of code, not for performance. Use indexes and stored procedures for that :)
您希望开始设计规范化数据库,直到第3范式。当您开发业务逻辑层时,您可能会决定必须对其进行非规范化,但绝不会低于第3种形式。始终,保持第1和第2表格合规。您希望对代码的简单性进行非规范化,而不是为了提高性能。使用索引和存储过程:)
The reason not "normalize as you go" is that you would have to modify the code you already have written most every time you modify the database design.
不是“随意标准化”的原因是每次修改数据库设计时都必须修改已经编写的代码。
There are a couple of good articles:
有几篇好文章:
#2
10
@GrizzlyGuru A wise man once told me "normalize till it hurts, denormalize till it works".
@GrizzlyGuru一位聪明的人曾告诉我“正常化直到它受伤,反正常化直到它起作用”。
It hasn't failed me yet :)
它还没有让我失望:)
I disagree about starting with it in un-normalized form however, in my experience its' been easier to adapt your application to deal with a less normalized database than a more-normalized one. It could also lead to situations where its' working "well enough" so you never get around to normalizing it (until its' too late!)
我不同意以非规范化的形式开始它,但是根据我的经验,它更容易调整你的应用程序以处理规范化程度较低的数据库而不是更规范化的数据库。它还可能导致其“工作”得足够好的情况“所以你永远不会让它正常化(直到它'太晚了!)
#3
7
Normalization means eliminating redundant data. In other words, an un-normalized or de-normalized database is a database where the same information will be repeated in multiple different places. This means you have to write more complex update statement to ensure you update the same data everywhere, otherwise you get inconsistent data which in turn means the output of queries is unrealiable.
标准化意味着消除冗余数据。换句话说,非规范化或非规范化数据库是一个数据库,其中相同的信息将在多个不同的地方重复。这意味着您必须编写更复杂的更新语句以确保在任何地方更新相同的数据,否则您将获得不一致的数据,这反过来意味着查询的输出是不可实现的。
This is a pretty huge problem, so I would say denormalization hurts, not the other way around.
这是一个非常大的问题,所以我会说非规范化会伤害,而不是相反。
In some case you may deliberately decide to denormalize specific parts of a database, if you judge that the benefit outweighs the extra work in updating data and the risk of data corruption. For example with datawarehouses, where data is aggregated for performance reasons, and data if often not updated after the initial entry which reduce the risk of inconsistencies.
在某些情况下,如果您判断利益超过更新数据的额外工作和数据损坏的风险,您可能会故意决定对数据库的特定部分进行非规范化。例如,对于数据仓库,其中出于性能原因聚合数据,以及在初始进入后经常不更新的数据,这样可以降低不一致的风险。
But in general be weary of denormalizing for performance. For example the performance benefit of a denormalized join can typically be achieved by using materialized view (also called indexed view), which will be as fast as querying a denormalized table, but still protects the consistency of the data.
但总的来说,厌倦了对性能进行非规范化。例如,非规范化连接的性能优势通常可以通过使用物化视图(也称为索引视图)来实现,该视图与查询非规范化表一样快,但仍然保护数据的一致性。
#4
3
Jeff has a pretty good overview of his philosophy on his blog: Maybe normalization isn't normal. The main thing is: don't overdo normalization. But I think an even bigger point to take away is that it probably doesn't matter too much. Unless you're running the next Google, you probably won't notice much of a difference until your application grows.
杰夫在他的博客上对他的哲学有一个非常好的概述:也许正常化不正常。主要的是:不要过度规范化。但我认为更重要的一点是它可能并不重要。除非您正在运行下一个Google,否则在您的应用程序增长之前,您可能不会发现太多差异。
#5
3
Database normizational I feel is an art form.
我认为数据库规范化是一种艺术形式。
You don't want to over normalize your database because you will have too many tables and it will cause your queries of even simple objects take longer than they should.
您不希望过度规范化数据库,因为您将拥有太多的表,这将导致您对简单对象的查询所需的时间超过应有的时间。
A good rule of thumb I follow is to normalize the same information repeated over and over again.
我遵循的一个好的经验法则是将一遍又一遍重复的相同信息规范化。
For example if you are creating a contact management application it would make sense to have Address (Street, City, State, Zip, etc. . ) as its own table.
例如,如果您要创建联系人管理应用程序,那么将地址(街道,城市,州,邮政等)作为自己的表是有意义的。
However if you have only 2 types of contacts, Business or personal, do you need a contact type table if you know you are only going to have 2? For me no.
但是,如果您只有2种类型的联系人,无论是商务还是个人,如果您知道自己只有2个,那么您是否需要联系人类型表?对我来说没有。
I would start by first figuring out the datatypes you need. Use a modeling program to help like Visio. You don't want to start with a non-normalized database because you will eventually normalize. Start by putting objects in there logical groupings, as you see data repeated take that data into a new table. I would keep up with that process until you feel you have the database designed.
我首先要弄清楚你需要的数据类型。使用建模程序来帮助喜欢Visio。您不希望以非规范化数据库开头,因为您最终会规范化。首先将对象放在逻辑分组中,如您所见,重复将数据转换为新表。我会跟上这个过程,直到你觉得你有数据库设计。
Let testing tell you if you need to combine tables. A well written query can cover any over normalization.
让测试告诉您是否需要组合表。一个写得很好的查询可以涵盖任何过度规范化。
#6
2
I believe starting with an un-normalized database and moving toward normalized as you progress is usually easiest to get started. To the question of how far to normalize, my philosophy is to normalize until is starts to hurt. That may sound a little flippant, but it generally is a good way to gauge how far to take it.
我相信从一个非规范化的数据库开始,随着你的进步而向标准化的方向发展通常是最容易开始的。对于规范化程度的问题,我的理念是规范化,直到开始受到伤害。这可能听起来有点轻率,但它通常是衡量它走多远的好方法。
#7
2
Having a normalized database will give you the most flexibility and the easiest maintenance. I always start with a normalized database and then un-normalize only when there is an real life problem that needs addressing.
拥有标准化数据库将为您提供最大的灵活性和最简单的维护。我总是从一个规范化的数据库开始,然后只有在存在需要解决的现实生活问题时才进行非规范化。
I view this similarly to code performance i.e. write maintainable, flexible code and make compromises for performance when you know that there is a performance problem.
我认为这类似于代码性能,即编写可维护的,灵活的代码,并在您知道存在性能问题时对性能做出妥协。
#8
2
The original poster never described in what situation the database will be used. If it's going to be any type of data warehousing project where at some point you will need cubes (OLAP) processing data for some front-end, it would be wiser to start off with star schema (fact tables + dimension) rather than looking into normalization. The Kimball books will be of great help in this case.
原始海报从未描述数据库将在何种情况下使用。如果它将成为任何类型的数据仓库项目,在某些时候您将需要多维数据集(OLAP)处理数据用于某些前端,那么从明星模式(事实表+维度)开始而不是研究更明智正常化。在这种情况下,金博尔的书籍将会有很大的帮助。
#9
1
I agree that it is typically better to start out with a normalized DB and then denormalize to solve very specific problems, but I'd probably start at Boyce-Codd Normal Form instead of 3rd Normal Form.
我同意通常最好先使用规范化的数据库,然后进行非规范化以解决非常具体的问题,但我可能会从Boyce-Codd Normal Form而不是3rd Normal Form开始。
#10
1
The truth is that "it depends." It depends on a lot of factors including:
事实是“它取决于”。这取决于很多因素,包括:
- Code (Hand-coded or Tool driven (like ETL packages))
- Primary Application (Transaction Processing, Data Warehousing, Reporting)
- Type of Database (MySQL, DB/2, Oracle, Netezza, etc.)
- Database Architecture (Tablular, Columnar)
- DBA Quality (proactive, reactive, inactive)
- Expected Data Quality (do you want to enforce data quality at the application level or the database level?)
代码(手工编码或工具驱动(如ETL包))
主要应用程序(事务处理,数据仓库,报告)
数据库类型(MySQL,DB / 2,Oracle,Netezza等)
数据库架构(Tablular,Columnar)
DBA质量(主动,被动,不活动)
预期的数据质量(您是否希望在应用程序级别或数据库级别强制执行数据质量?)
#11
1
I agree that you should normalise as much as possible and only denormalise if absolutely necessary for performance. And with materialised views or caching schemes this is often not necessary.
我同意你应尽可能正常化,只有在表现绝对必要的情况下才能反规范化。通过物化视图或缓存方案,这通常是不必要的。
The thing to bare in mind is that by normalising your model you are giving the database more information on how to constrain your data so that you can remove the risk of update anomalies that can occur in incompletely normalised models.
要记住的是,通过规范化模型,您可以为数据库提供有关如何约束数据的更多信息,以便您可以消除在不完全规范化的模型中可能发生的更新异常的风险。
If you denormalise then you either need to live with the fact that you may get update anomolies or you need to implement the constraint validation yourself in your application code. This takes away a lot of the benefit of using a DBMS which lets you define these constraints declaratively.
如果你反规范化,那么你需要接受这样一个事实:你可能会得到更新的异常,或者你需要在你的应用程序代码中自己实现约束验证。这消除了使用DBMS的许多好处,它允许您以声明方式定义这些约束。
So assuming the same quality of code, denormalising may not actually give you better performance.
因此,假设代码质量相同,非规范化实际上可能无法为您提供更好的性能。
Another thing to mention is that hardware is cheap these days so throwing extra processing power at the problem is often more cost effective than accepting the potential costs of cleaning up corrupted data.
另外需要提及的是,硬件现在很便宜,因此在解决问题时投入额外的处理能力通常比接受清理损坏数据的潜在成本更具成本效益。
#12
-1
Often if you normalize as far as your other software will let you, you'll be done.
通常,如果你的其他软件会让你正常化,你就会完成。
For example, when using Object-Relational mapping technology, you'll have a rich set of semantics for various many-to-one and many-to-many relationships. Under the hood that'll provide join tables with effectively 2 primary keys. While relatively rare, true normalization often gives you relations with 3 or more primary keys. In cases like this, I prefer to stick with the O/R and roll my own code to avoid the various DB anomalies.
例如,在使用对象关系映射技术时,您将拥有丰富的语义集,用于各种多对一和多对多关系。在引擎盖下,将提供有效2个主键的连接表。虽然相对罕见,但真正的规范化通常会为您提供3个或更多主键的关系。在这种情况下,我更喜欢坚持使用O / R并滚动我自己的代码以避免各种数据库异常。
#13
-1
Just try to use common sense.
只是尝试使用常识。
Also some say - and I have to agree with them - that, if you're finding yourself joining 6 (the magic number) tables together in most of your queries - not including reporting related ones- , than you might consider denormalizing a bit.
还有人说 - 我必须同意他们 - 如果你发现自己在大多数查询中加入了6个(神奇数字)表 - 不包括与报告相关的表 - 那么你可能会考虑对其进行非规范化。
#14
-1
Don't forget The mother of all database normalization debates on Coding Horror (summarized on the High Scalability blog).
不要忘记关于Coding Horror的所有数据库规范化争论的母亲(在High Scalability博客上总结)。