MySQL:多表还是多列的表?

时间:2022-06-17 04:23:50

So this is more of a design question. I have one primary key say the user's ID, and I have tons of information associated with that user. I am concerning should I have multiple tables broken down into categories according to the information or should I have just one table with many many columns?

这是一个设计问题。我有一个主键,即用户ID,我有大量与该用户相关的信息。我关心的是,我是否应该根据信息将多个表划分为不同的类别,或者我是否应该只有一个表,其中有许多列?

The way I used to do it was to have multiple tables, so say one table for application usage data, one table for profile info, one table for back end tokens and etc, to keep things look organized. Recently some one told me that it's better not to do it and having a table with lots of column is fine. The thing is all those columns have the same primary key.

我以前使用的方法是有多个表,比如一个表用于应用程序使用数据,一个表用于概要信息,一个表用于后端令牌等等,以保持事物的组织性。最近有人告诉我最好不要这样做,有一个有很多列的表是可以的。所有这些列都有相同的主键。

I'm pretty new to database design so which approach is better and what are the pros and cons? What's the conventional way of doing it?

我对数据库设计很陌生,所以哪种方法更好,有什么优点和缺点?传统的做法是什么?

7 个解决方案

#1


78  

Any time information is one-to-one (each user has one name and password), then it's probably better to have it one table, since it reduces the number of joins the database will need to do to retrieve results. I think some databases have a limit on the number of columns per table, but I wouldn't worry about it in normal cases, and you can always split it later if you need to.

任何时候的信息都是一对一的(每个用户都有一个名字和密码),那么最好使用一个表,因为它减少了数据库检索结果所需的连接数量。我认为有些数据库对每个表的列数有限制,但在一般情况下我不需要担心,如果需要,您可以稍后再对它进行分割。

If the data is one-to-many (each user has thousands of rows of usage info), then it should be split into separate tables to reduce duplicate data (duplicate data wastes storage space, cache space, and makes the database harder to maintain).

如果数据是一对多(每个用户有数千行使用信息),那么应该将数据分割成单独的表,以减少重复的数据(重复的数据浪费存储空间、缓存空间,并使数据库更难维护)。

You might find the Wikipedia article on database normalization interesting, since it discusses the reasons for this in depth:

您可能会发现*关于数据库规范化的文章很有趣,因为它深入地讨论了造成这种情况的原因:

Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

数据库规范化是组织关系数据库的字段和表以最小化冗余和依赖性的过程。规范化通常涉及到将大表划分为更小(和更少冗余)的表,并定义它们之间的关系。目标是隔离数据,以便仅在一个表中添加、删除和修改字段,然后通过定义的关系在数据库的其他部分传播。

Denormalization is also something to be aware of, because there are cases where repeating data is better (since it reduces the amount of work the database needs to do when reading data). I'd highly recommend making your data as normalized as possible to start out, and only denormalize if you're aware of performance problems in specific queries.

非规范化也是需要注意的,因为在某些情况下重复数据更好(因为它减少了数据库在读取数据时需要做的工作量)。我强烈建议开始时尽可能地使数据标准化,并且只有在您意识到特定查询中的性能问题时才进行非规范化。

#2


10  

One big table is often a poor choice. Related tables are what relational database were designed to work with. If you index properly and know how to write performant queries, they are going to perform fine.

一张大桌子往往不是一个好的选择。相关的表是关系数据库设计用来工作的。如果您正确地索引并知道如何编写性能查询,那么它们将执行得很好。

When tables get too many columns, then you can run into issues with the actual size of the page that the database is storing the information on. Either the record can end up being too large for the page, in which can you may end up not being able to create or update a specific record which makes users unhappy or you may (in SQL Server at least) be allowed some overflow for particular datatypes (with a set of rules you need to look up if you are doing this) but if many records will overflow the page size you can create tremedous performance problems. Now how MYSQL handles the pages and whether you have a problem when the potential page size gets too large is something you would have to look up in the documentation for that database.

当表获得太多列时,就会遇到数据库存储信息的页面实际大小的问题。记录可以为页面太大,在这你能最终可能无法创建或更新一个特定的记录使得用户不高兴或你可能(至少在SQL Server)被允许一些溢出为特定的数据类型(一组规则这样做你需要查找)但如果许多记录会溢出页面大小您可以创建巨大的性能问题。现在MYSQL是如何处理页面的,当潜在页面尺寸过大时,你需要查阅数据库的文档。

#3


3  

ask yourself these questions if you put everything in one table, will you have multiple rows for that user? If you have to update a user do you want to keep an audit trail? Can the user have more than one instance of a data element? (like phone number for instance) will you have a case where you might want to add an element or set of elements later? if you answer yes then most likely you want to have child tables with foreign key relationships.

问自己这些问题如果你把所有的东西都放在一个表中,你会为那个用户设置多行吗?如果您必须更新一个用户,您想要保留一个审计跟踪吗?用户是否可以拥有一个以上的数据元素实例?(例如电话号码)您是否会有这样的情况,您可能希望稍后添加一个元素或一组元素?如果您回答是,那么您很可能希望拥有具有外键关系的子表。

Pros of parent/child tables is data integrity, performance via indexes (yes you can do it on a flat table also) and IMO easier to maintain if you need to add a field later, especially if it will be a required field.

父表/子表的优点是数据完整性、通过索引的性能(是的,您也可以在平表上这样做)和IMO更容易维护,如果您以后需要添加一个字段,特别是如果它是必需的字段。

Cons design is harder, queries become slightly more complex

缺点设计比较困难,查询变得稍微复杂一些

But, there are many cases where one big flat table will be appropriate so you have to look at your situation to decide.

但是,有很多情况下,一个大的平桌子是合适的,所以你必须看你的情况来决定。

#4


2  

I have a good example. Overly Normalized database with the following set of relationships:

我有一个很好的例子。过度规范化的数据库与以下一系列关系:

people -> rel_p2staff -> staff

and

people -> rel_p2prosp -> prospects

Where people has names and persons details, staff has just the staff record details, prospects has just prospects details, and the rel tables are relationship tables with foreign keys from people linking to staff and prospects.

在人们有姓名和个人信息的地方,工作人员只有工作人员记录的细节,前景只是潜在的细节,而rel表则是关系表,与来自于工作人员和前景的人的外键。

This sort of design carries on for entire database.

这种设计是对整个数据库进行的。

Now to query this set of relations it's a multi-table join every time, sometimes 8 and more table join. It has been working fine up to mid this year, when it started getting very slow now that we past 40000 records of people.

现在要查询这组关系,它每次都是一个多表连接,有时是8个或更多的表连接。到今年年中,它一直运行得很好,当我们超过40000人的记录时,它开始变得非常缓慢。

Indexing and all low hanging fruits had been used up last year, all queries are optimized to perfection. This is the end of the road for the particular normalized design and management now approved a rebuilt of entire application that depends on it as well as restructure of the database, over a term of 6 months. $$$$ Ouch.

索引和所有的低挂果去年都用完了,所有的查询都优化到完美。这是特定的规范化设计和管理的终点,现在已经批准了依赖于它的整个应用程序的重建,以及对数据库的重组,为期6个月。$ $ $ $哎哟。

The solution will be to have a direct relation for people -> staff and people -> prospect

解决方案将是与>员工和>人员有直接关系。

#5


1  

I'm already done doing some sort of database design. for me, it depends on the difficulty of the system with database management; yeah it is true to have unique data in one place only but it is really hard to make queries with overly normalized database with lots of record. Just combine the two schema; use one huge table if you feel that you'll be having a massive records that are hard to maintain just like facebook,gmail,etc. and use different table for one set of record for simple system... well this is just my opinion .. i hope it could help.. just do it..you can do it... :)

我已经完成了一些数据库设计。对我来说,这取决于系统的数据库管理的难度;是的,确实只有一个地方有唯一的数据,但是用大量记录的过度规范化的数据库进行查询是很困难的。把这两个模式结合起来;如果你觉得你的大量记录很难像facebook、gmail等那样维护,那就使用一个巨大的表格。用不同的表格记录一套简单的系统……这只是我的看法。我希望这能有所帮助。想做就做。你可以做到……:)

#6


0  

The conventional way of doing this would be to use different tables as in a star schema or snowflake schema. Howeevr, I would base this strategy to be two fold. I believe in the theory that data should only exist in one place, there for the schema I mentioned would work well. However, I also believe that for reporting engines and BI suites, a columnar approach would be hugely beneficial becuase it is more supportive of the the reporting needs. Columnar approaches like those with infobright.org have huge performance gains and compression that makes using both approaches incredibly useful. Alot of companies are starting to realize that have just one database architecture in the organization is not supportive of the full range of their needs. Alot of companies are implementing both the concept of having more than one database achitecture.

传统的方法是使用不同的表,如星型模式或雪花模式。但是,我认为这个策略应该是两倍。我相信数据应该只存在于一个地方的理论,因为我提到的模式可以很好地工作。然而,我也相信,对于报告引擎和BI套件,柱状方法将是非常有益的,因为它更支持报告需求。像infobright.org这样的柱状方法有巨大的性能提升和压缩,这使得使用这两种方法非常有用。许多公司开始意识到,在组织中只有一个数据库架构并不支持他们的全部需求。很多公司都在实施不止一个数据库结构的概念。

#7


-1  

i think having a single table is more effective but you should make sure that the table is organised in a manner that it shows the relationship,trend as well as the difference in variables of the same row. for example if the table shows age and grades of the students you should arange the table in a manner that thank highest scorer is well differentiated with the lowest scorer and the difference in the age of students is even.

我认为有一个表是更有效的,但是您应该确保表是按照它显示的关系、趋势以及同一行变量的不同来组织的。例如,如果表格显示了学生的年龄和成绩,你应该在表格中注明感谢评分最高的学生和评分最低的学生的差异是很明显的,学生的年龄差异是均匀的。

#1


78  

Any time information is one-to-one (each user has one name and password), then it's probably better to have it one table, since it reduces the number of joins the database will need to do to retrieve results. I think some databases have a limit on the number of columns per table, but I wouldn't worry about it in normal cases, and you can always split it later if you need to.

任何时候的信息都是一对一的(每个用户都有一个名字和密码),那么最好使用一个表,因为它减少了数据库检索结果所需的连接数量。我认为有些数据库对每个表的列数有限制,但在一般情况下我不需要担心,如果需要,您可以稍后再对它进行分割。

If the data is one-to-many (each user has thousands of rows of usage info), then it should be split into separate tables to reduce duplicate data (duplicate data wastes storage space, cache space, and makes the database harder to maintain).

如果数据是一对多(每个用户有数千行使用信息),那么应该将数据分割成单独的表,以减少重复的数据(重复的数据浪费存储空间、缓存空间,并使数据库更难维护)。

You might find the Wikipedia article on database normalization interesting, since it discusses the reasons for this in depth:

您可能会发现*关于数据库规范化的文章很有趣,因为它深入地讨论了造成这种情况的原因:

Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant) tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

数据库规范化是组织关系数据库的字段和表以最小化冗余和依赖性的过程。规范化通常涉及到将大表划分为更小(和更少冗余)的表,并定义它们之间的关系。目标是隔离数据,以便仅在一个表中添加、删除和修改字段,然后通过定义的关系在数据库的其他部分传播。

Denormalization is also something to be aware of, because there are cases where repeating data is better (since it reduces the amount of work the database needs to do when reading data). I'd highly recommend making your data as normalized as possible to start out, and only denormalize if you're aware of performance problems in specific queries.

非规范化也是需要注意的,因为在某些情况下重复数据更好(因为它减少了数据库在读取数据时需要做的工作量)。我强烈建议开始时尽可能地使数据标准化,并且只有在您意识到特定查询中的性能问题时才进行非规范化。

#2


10  

One big table is often a poor choice. Related tables are what relational database were designed to work with. If you index properly and know how to write performant queries, they are going to perform fine.

一张大桌子往往不是一个好的选择。相关的表是关系数据库设计用来工作的。如果您正确地索引并知道如何编写性能查询,那么它们将执行得很好。

When tables get too many columns, then you can run into issues with the actual size of the page that the database is storing the information on. Either the record can end up being too large for the page, in which can you may end up not being able to create or update a specific record which makes users unhappy or you may (in SQL Server at least) be allowed some overflow for particular datatypes (with a set of rules you need to look up if you are doing this) but if many records will overflow the page size you can create tremedous performance problems. Now how MYSQL handles the pages and whether you have a problem when the potential page size gets too large is something you would have to look up in the documentation for that database.

当表获得太多列时,就会遇到数据库存储信息的页面实际大小的问题。记录可以为页面太大,在这你能最终可能无法创建或更新一个特定的记录使得用户不高兴或你可能(至少在SQL Server)被允许一些溢出为特定的数据类型(一组规则这样做你需要查找)但如果许多记录会溢出页面大小您可以创建巨大的性能问题。现在MYSQL是如何处理页面的,当潜在页面尺寸过大时,你需要查阅数据库的文档。

#3


3  

ask yourself these questions if you put everything in one table, will you have multiple rows for that user? If you have to update a user do you want to keep an audit trail? Can the user have more than one instance of a data element? (like phone number for instance) will you have a case where you might want to add an element or set of elements later? if you answer yes then most likely you want to have child tables with foreign key relationships.

问自己这些问题如果你把所有的东西都放在一个表中,你会为那个用户设置多行吗?如果您必须更新一个用户,您想要保留一个审计跟踪吗?用户是否可以拥有一个以上的数据元素实例?(例如电话号码)您是否会有这样的情况,您可能希望稍后添加一个元素或一组元素?如果您回答是,那么您很可能希望拥有具有外键关系的子表。

Pros of parent/child tables is data integrity, performance via indexes (yes you can do it on a flat table also) and IMO easier to maintain if you need to add a field later, especially if it will be a required field.

父表/子表的优点是数据完整性、通过索引的性能(是的,您也可以在平表上这样做)和IMO更容易维护,如果您以后需要添加一个字段,特别是如果它是必需的字段。

Cons design is harder, queries become slightly more complex

缺点设计比较困难,查询变得稍微复杂一些

But, there are many cases where one big flat table will be appropriate so you have to look at your situation to decide.

但是,有很多情况下,一个大的平桌子是合适的,所以你必须看你的情况来决定。

#4


2  

I have a good example. Overly Normalized database with the following set of relationships:

我有一个很好的例子。过度规范化的数据库与以下一系列关系:

people -> rel_p2staff -> staff

and

people -> rel_p2prosp -> prospects

Where people has names and persons details, staff has just the staff record details, prospects has just prospects details, and the rel tables are relationship tables with foreign keys from people linking to staff and prospects.

在人们有姓名和个人信息的地方,工作人员只有工作人员记录的细节,前景只是潜在的细节,而rel表则是关系表,与来自于工作人员和前景的人的外键。

This sort of design carries on for entire database.

这种设计是对整个数据库进行的。

Now to query this set of relations it's a multi-table join every time, sometimes 8 and more table join. It has been working fine up to mid this year, when it started getting very slow now that we past 40000 records of people.

现在要查询这组关系,它每次都是一个多表连接,有时是8个或更多的表连接。到今年年中,它一直运行得很好,当我们超过40000人的记录时,它开始变得非常缓慢。

Indexing and all low hanging fruits had been used up last year, all queries are optimized to perfection. This is the end of the road for the particular normalized design and management now approved a rebuilt of entire application that depends on it as well as restructure of the database, over a term of 6 months. $$$$ Ouch.

索引和所有的低挂果去年都用完了,所有的查询都优化到完美。这是特定的规范化设计和管理的终点,现在已经批准了依赖于它的整个应用程序的重建,以及对数据库的重组,为期6个月。$ $ $ $哎哟。

The solution will be to have a direct relation for people -> staff and people -> prospect

解决方案将是与>员工和>人员有直接关系。

#5


1  

I'm already done doing some sort of database design. for me, it depends on the difficulty of the system with database management; yeah it is true to have unique data in one place only but it is really hard to make queries with overly normalized database with lots of record. Just combine the two schema; use one huge table if you feel that you'll be having a massive records that are hard to maintain just like facebook,gmail,etc. and use different table for one set of record for simple system... well this is just my opinion .. i hope it could help.. just do it..you can do it... :)

我已经完成了一些数据库设计。对我来说,这取决于系统的数据库管理的难度;是的,确实只有一个地方有唯一的数据,但是用大量记录的过度规范化的数据库进行查询是很困难的。把这两个模式结合起来;如果你觉得你的大量记录很难像facebook、gmail等那样维护,那就使用一个巨大的表格。用不同的表格记录一套简单的系统……这只是我的看法。我希望这能有所帮助。想做就做。你可以做到……:)

#6


0  

The conventional way of doing this would be to use different tables as in a star schema or snowflake schema. Howeevr, I would base this strategy to be two fold. I believe in the theory that data should only exist in one place, there for the schema I mentioned would work well. However, I also believe that for reporting engines and BI suites, a columnar approach would be hugely beneficial becuase it is more supportive of the the reporting needs. Columnar approaches like those with infobright.org have huge performance gains and compression that makes using both approaches incredibly useful. Alot of companies are starting to realize that have just one database architecture in the organization is not supportive of the full range of their needs. Alot of companies are implementing both the concept of having more than one database achitecture.

传统的方法是使用不同的表,如星型模式或雪花模式。但是,我认为这个策略应该是两倍。我相信数据应该只存在于一个地方的理论,因为我提到的模式可以很好地工作。然而,我也相信,对于报告引擎和BI套件,柱状方法将是非常有益的,因为它更支持报告需求。像infobright.org这样的柱状方法有巨大的性能提升和压缩,这使得使用这两种方法非常有用。许多公司开始意识到,在组织中只有一个数据库架构并不支持他们的全部需求。很多公司都在实施不止一个数据库结构的概念。

#7


-1  

i think having a single table is more effective but you should make sure that the table is organised in a manner that it shows the relationship,trend as well as the difference in variables of the same row. for example if the table shows age and grades of the students you should arange the table in a manner that thank highest scorer is well differentiated with the lowest scorer and the difference in the age of students is even.

我认为有一个表是更有效的,但是您应该确保表是按照它显示的关系、趋势以及同一行变量的不同来组织的。例如,如果表格显示了学生的年龄和成绩,你应该在表格中注明感谢评分最高的学生和评分最低的学生的差异是很明显的,学生的年龄差异是均匀的。