哪种更有效:多个MySQL表还是一个大表?

时间:2022-07-11 01:12:52

I store various user details in my MySQL database. Originally it was set up in various tables meaning data is linked with UserIds and outputting via sometimes complicated calls to display and manipulate the data as required. Setting up a new system, it almost makes sense to combine all of these tables into one big table of related content.

我将各种用户细节存储在MySQL数据库中。最初,它被设置在不同的表中,这意味着数据与UserIds连接,通过有时复杂的调用来显示和操作数据。建立一个新的系统,将所有这些表合并到一个相关内容的大表中几乎是有意义的。

  • Is this going to be a help or hindrance?
  • 这是一种帮助还是阻碍?
  • Speed considerations in calling, updating or searching/manipulating?
  • 调用、更新或搜索/操作时的速度考虑?

Here's an example of some of my table structure(s):

下面是我的一些表结构示例:

  • users - UserId, username, email, encrypted password, registration date, ip
  • 用户-用户名,用户名,电子邮件,加密密码,注册日期,ip
  • user_details - cookie data, name, address, contact details, affiliation, demographic data
  • user_details—cookie数据、名称、地址、联系方式、从属关系、人口统计数据
  • user_activity - contributions, last online, last viewing
  • user_activity——贡献,最后一次在线,最后一次查看
  • user_settings - profile display settings
  • user_settings -配置文件显示设置。
  • user_interests - advertising targetable variables
  • user_interest -广告目标变量
  • user_levels - access rights
  • user_levels——访问权限
  • user_stats - hits, tallies
  • user_stats支安打,记录

Edit: I've upvoted all answers so far, they all have elements that essentially answer my question.

编辑:到目前为止,我对所有的答案都投了赞成票,它们都有基本能回答我问题的元素。

Most of the tables have a 1:1 relationship which was the main reason for denormalising them.

大多数表格都有1:1的关系,这是使它们去核化的主要原因。

Are there going to be issues if the table spans across 100+ columns when a large portion of these cells are likely to remain empty?

如果表跨越100多列,而这些单元格中的很大一部分可能仍然是空的,那么是否会出现问题?

8 个解决方案

#1


47  

Multiple tables help in the following ways / cases:

多张表格有助于以下方面:

(a) if different people are going to be developing applications involving different tables, it makes sense to split them.

(a)如果不同的人要开发涉及不同表的应用程序,那么将它们分开是有意义的。

(b) If you want to give different kind of authorities to different people for different part of the data collection, it may be more convenient to split them. (Of course, you can look at defining views and giving authorization on them appropriately).

(b)如果你想在数据收集的不同部分给不同的人不同的权限,把它们分开会更方便。(当然,您可以查看定义视图并适当地授权它们)。

(c) For moving data to different places, especially during development, it may make sense to use tables resulting in smaller file sizes.

(c)为了将数据移到不同的地方,特别是在开发过程中,使用表导致文件大小更小可能是有意义的。

(d) Smaller foot print may give comfort while you develop applications on specific data collection of a single entity.

(d)当您开发针对单个实体的特定数据收集的应用程序时,较小的足印可能会给您带来安慰。

(e) It is a possibility: what you thought as a single value data may turn out to be really multiple values in future. e.g. credit limit is a single value field as of now. But tomorrow, you may decide to change the values as (date from, date to, credit value). Split tables might come handy now.

(e)这是一种可能性:你认为的单值数据在未来可能会变成多个值。信用额度是目前唯一的价值领域。但是明天,您可能会决定将值更改为(日期,日期到,信用值)。分开的桌子现在可能会派上用场。

My vote would be for multiple tables - with data appropriately split.

我的投票将是多表——数据适当地分割。

Good luck.

祝你好运。

#2


30  

Combining the tables is called denormalizing.

组合表称为非规范化。

It may (or may not) help to make some queries (which make lots of JOINs) to run faster at the expense of creating a maintenance hell.

它可以(也可以不可以)帮助一些查询(生成大量连接)以更快的速度运行,代价是创建一个维护地狱。

MySQL is capable of using only JOIN method, namely NESTED LOOPS.

MySQL只能使用JOIN方法,即嵌套循环。

This means that for each record in the driving table, MySQL locates a matching record in the driven table in a loop.

这意味着,对于驱动表中的每个记录,MySQL都在循环中在驱动表中找到一个匹配的记录。

Locating a record is quite a costly operation which may take dozens times as long as the pure record scanning.

定位一个记录是一个相当昂贵的操作,可能需要几十倍的纯记录扫描。

Moving all your records into one table will help you to get rid of this operation, but the table itself grows larger, and the table scan takes longer.

将所有记录移动到一个表中可以帮助您摆脱这个操作,但是表本身会变得更大,并且表扫描需要更长的时间。

If you have lots of records in other tables, then increase in the table scan can overweight benefits of the records being scanned sequentially.

如果您在其他表中有很多记录,那么增加表扫描可以增加按顺序扫描记录的好处。

Maintenance hell, on the other hand, is guaranteed.

另一方面,维护地狱是有保障的。

#3


16  

Are all of them 1:1 relationships? I mean, if a user could belong to, say, different user levels, or if the users interests are represented as several records in the user interests table, then merging those tables would be out of the question immediately.

他们都是一对一的关系吗?我的意思是,如果一个用户可以属于,比如说,不同的用户级别,或者如果用户的兴趣被表示为用户兴趣表中的几个记录,那么合并这些表将是不可能的。

Regarding previous answers about normalization, it must be said that the database normalization rules have completely disregarded performance, and is only looking at what is a neat database design. That is often what you want to achieve, but there are times when it makes sense to actively denormalize in pursuit of performance.

关于前面关于规范化的回答,必须指出,数据库规范化规则完全忽略了性能,只关注于什么是整洁的数据库设计。这通常是你想要达到的目标,但有时积极地去规格化以追求性能是有意义的。

All in all, I'd say the question comes down to how many fields there are in the tables, and how often they are accessed. If user activity is often not very interesting, then it might just be a nuisance to always have it on the same record, for performance and maintenance reasons. If some data, like settings, say, are accessed very often, but simply contains too many fields, it might also not be convenient to merge the tables. If you're only interested in the performance gain, you might consider other approaches, such as keeping the settings separate, but saving them in a session variable of their own so that you don't have to query the database for them very often.

总之,我要说的问题是表中有多少字段,以及它们被访问的频率。如果用户活动常常不是很有趣,那么出于性能和维护的原因,总是将其放在相同的记录上可能会令人讨厌。如果某些数据(比如设置)经常被访问,但仅仅包含太多字段,那么合并表可能也不太方便。如果您只对性能增益感兴趣,您可以考虑其他方法,例如将设置分开,但是将它们保存在它们自己的会话变量中,这样您就不必经常查询数据库了。

#4


9  

Do all of those tables have a 1-to-1 relationship? For example, will each user row only have one corresponding row in user_stats or user_levels? If so, it might make sense to combine them into one table. If the relationship is not 1 to 1 though, it probably wouldn't make sense to combine (denormalize) them.

所有这些表都有1比1的关系吗?例如,每个用户行在user_stats或user_levels中是否只有一个对应的行?如果是这样,将它们合并到一个表中可能是有意义的。如果关系不是1到1,那么合并(反规范化)它们可能没有意义。

Having them in separate tables vs. one table is probably going to have little effect on performance though unless you have hundreds of thousands or millions of user records. The only real gain you'll get is from simplifying your queries by combining them.

将它们放在单独的表中而不是一个表中可能对性能影响不大,除非您有数十万或数百万的用户记录。您将获得的唯一真正好处是通过组合查询来简化查询。

ETA:

埃塔:

If your concern is about having too many columns, then think about what stuff you typically use together and combine those, leaving the rest in a separate table (or several separate tables if needed).

如果您关心的是有太多的列,那么考虑一下您通常一起使用的内容,并将它们组合在一起,将其余的放在一个单独的表中(如果需要,也可以放在几个单独的表中)。

If you look at the way you use the data, my guess is that you'll find that something like 80% of your queries use 20% of that data with the remaining 80% of the data being used only occasionally. Combine that frequently used 20% into one table, and leave the 80% that you don't often use in separate tables and you'll probably have a good compromise.

如果你看看你使用数据的方式,我猜你会发现80%的查询使用20%的数据而剩下的80%的数据只是偶尔使用。将经常使用的20%合并到一个表中,剩下的80%在单独的表中使用,你可能会有一个很好的折中方案。

#5


6  

Creating one massive table goes against relational database principals. I wouldn't combine all them into one table. Your going to get multiple instances of repeated data. If your user has three interests for example, you will have 3 rows, with the same user data in just to store the three different interests. Definatly go for the multiple 'normalized' table approach. See this Wiki page for database normalization.

创建一个大型表会违反关系数据库主体。我不会把它们合并到一张桌子上。你会得到多个重复数据的实例。例如,如果你的用户有三个兴趣爱好,你将有3行,相同的用户数据只是为了存储三个不同的兴趣爱好。一定要使用多重“规范化”表方法。请参见这个Wiki页面以实现数据库规范化。

Edit: I have updated my answer, as you have updated your question... I agree with my initial answer even more now since...

编辑:我更新了我的答案,因为你更新了你的问题……我现在更同意我最初的答案,因为……

a large portion of these cells are likely to remain empty

这些细胞的很大一部分可能仍然是空的

If for example, a user didn't have any interests, if you normalize then you simple wont have a row in the interest table for that user. If you have everything in one massive table, then you will have columns (and apparently a lot of them) that contain just NULL's.

例如,如果用户没有任何兴趣,如果您规范化,那么您就不会为该用户在interest表中有一行。如果所有内容都在一个大表中,那么就会有包含NULL'的列(显然很多列)。

I have worked for a telephony company where there has been tons of tables, getting data could require many joins. When the performance of reading from these tables was critical then procedures where created that could generate a flat table (i.e. a denormalized table) that would require no joins, calculations etc that reports could point to. These where then used in conjunction with a SQL server agent to run the job at certain intervals (i.e. a weekly view of some stats would run once a week and so on).

我曾在一家电话公司工作,那里有很多表格,获取数据可能需要很多连接。当从这些表中读取的性能是关键的时候,创建可以生成一个平表的程序(即一个非规范化的表),它不需要连接、计算等,而报告可以指向。然后与SQL server代理一起使用它们以特定的间隔运行作业(例如,每周运行一次某些统计数据的视图,等等)。

#6


6  

Why not use the same approach Wordpress does by having a users table with basic user information that everyone has and then adding a "user_meta" table that can basically be any key, value pair associated with the user id. So if you need to find all the meta information for the user you could just add that to your query. You would also not always have to add the extra query if not needed for things like logging in. The benefit to this approach also leaves your table open to adding new features to your users such as storing their twitter handle or each individual interest. You also won't have to deal with a maze of associated ID's because you have one table that rules all metadata and you will limit it to only one association instead of 50.

为什么不使用相同的方法Wordpress通过用户与基本的用户信息表,每个人都有,然后添加一个“user_meta”表,基本上可以任意键,值对与用户id相关联。如果你需要找到所有的元信息用户你可以添加到您的查询。如果不需要登录之类的东西,您也不必总是添加额外的查询。这种方法的好处还在于,您的表可以向用户添加新特性,比如存储他们的twitter句柄或每个人的兴趣。您也不需要处理关联ID的迷宫,因为您有一个表来管理所有元数据,您将只将其限制为一个关联而不是50。

Wordpress specifically does this to allow for features to be added via plugins, therefore allowing for your project to be more scalable and will not require a complete database overhaul if you need to add a new feature.

Wordpress特别允许通过插件添加特性,因此允许您的项目具有更强的可扩展性,如果需要添加新特性,则不需要对数据库进行彻底的修改。

#7


2  

I think this is one of those "it depends" situation. Having multiple tables is cleaner and probably theoretically better. But when you have to join 6-7 tables to get information about a single user, you might start to rethink that approach.

我认为这是一种“视情况而定”的情况。拥有多个表更简洁,理论上可能更好。但是,当您必须连接6-7个表以获取关于单个用户的信息时,您可能会开始重新考虑这种方法。

#8


1  

I would say it depends on what the other tables really mean. Does a user_details contain more then 1 more / users and so on. What level on normalization is best suited for your needs depends on your demands.

我认为这取决于其他表的真正含义。user_details是否包含更多的1个/用户等等。什么级别的标准化最适合您的需求取决于您的需求。

If you have one table with good index that would probably be faster. But on the other hand probably more difficult to maintain.

如果你有一个索引良好的表,可能会更快。但另一方面,可能更难维持。

To me it look like you could skip User_Details as it probably is 1 to 1 relation with Users. But the rest are probably alot of rows per user?

对我来说,似乎可以跳过User_Details,因为它与用户的关系可能是1比1。但是剩下的可能是每个用户的行数?

#1


47  

Multiple tables help in the following ways / cases:

多张表格有助于以下方面:

(a) if different people are going to be developing applications involving different tables, it makes sense to split them.

(a)如果不同的人要开发涉及不同表的应用程序,那么将它们分开是有意义的。

(b) If you want to give different kind of authorities to different people for different part of the data collection, it may be more convenient to split them. (Of course, you can look at defining views and giving authorization on them appropriately).

(b)如果你想在数据收集的不同部分给不同的人不同的权限,把它们分开会更方便。(当然,您可以查看定义视图并适当地授权它们)。

(c) For moving data to different places, especially during development, it may make sense to use tables resulting in smaller file sizes.

(c)为了将数据移到不同的地方,特别是在开发过程中,使用表导致文件大小更小可能是有意义的。

(d) Smaller foot print may give comfort while you develop applications on specific data collection of a single entity.

(d)当您开发针对单个实体的特定数据收集的应用程序时,较小的足印可能会给您带来安慰。

(e) It is a possibility: what you thought as a single value data may turn out to be really multiple values in future. e.g. credit limit is a single value field as of now. But tomorrow, you may decide to change the values as (date from, date to, credit value). Split tables might come handy now.

(e)这是一种可能性:你认为的单值数据在未来可能会变成多个值。信用额度是目前唯一的价值领域。但是明天,您可能会决定将值更改为(日期,日期到,信用值)。分开的桌子现在可能会派上用场。

My vote would be for multiple tables - with data appropriately split.

我的投票将是多表——数据适当地分割。

Good luck.

祝你好运。

#2


30  

Combining the tables is called denormalizing.

组合表称为非规范化。

It may (or may not) help to make some queries (which make lots of JOINs) to run faster at the expense of creating a maintenance hell.

它可以(也可以不可以)帮助一些查询(生成大量连接)以更快的速度运行,代价是创建一个维护地狱。

MySQL is capable of using only JOIN method, namely NESTED LOOPS.

MySQL只能使用JOIN方法,即嵌套循环。

This means that for each record in the driving table, MySQL locates a matching record in the driven table in a loop.

这意味着,对于驱动表中的每个记录,MySQL都在循环中在驱动表中找到一个匹配的记录。

Locating a record is quite a costly operation which may take dozens times as long as the pure record scanning.

定位一个记录是一个相当昂贵的操作,可能需要几十倍的纯记录扫描。

Moving all your records into one table will help you to get rid of this operation, but the table itself grows larger, and the table scan takes longer.

将所有记录移动到一个表中可以帮助您摆脱这个操作,但是表本身会变得更大,并且表扫描需要更长的时间。

If you have lots of records in other tables, then increase in the table scan can overweight benefits of the records being scanned sequentially.

如果您在其他表中有很多记录,那么增加表扫描可以增加按顺序扫描记录的好处。

Maintenance hell, on the other hand, is guaranteed.

另一方面,维护地狱是有保障的。

#3


16  

Are all of them 1:1 relationships? I mean, if a user could belong to, say, different user levels, or if the users interests are represented as several records in the user interests table, then merging those tables would be out of the question immediately.

他们都是一对一的关系吗?我的意思是,如果一个用户可以属于,比如说,不同的用户级别,或者如果用户的兴趣被表示为用户兴趣表中的几个记录,那么合并这些表将是不可能的。

Regarding previous answers about normalization, it must be said that the database normalization rules have completely disregarded performance, and is only looking at what is a neat database design. That is often what you want to achieve, but there are times when it makes sense to actively denormalize in pursuit of performance.

关于前面关于规范化的回答,必须指出,数据库规范化规则完全忽略了性能,只关注于什么是整洁的数据库设计。这通常是你想要达到的目标,但有时积极地去规格化以追求性能是有意义的。

All in all, I'd say the question comes down to how many fields there are in the tables, and how often they are accessed. If user activity is often not very interesting, then it might just be a nuisance to always have it on the same record, for performance and maintenance reasons. If some data, like settings, say, are accessed very often, but simply contains too many fields, it might also not be convenient to merge the tables. If you're only interested in the performance gain, you might consider other approaches, such as keeping the settings separate, but saving them in a session variable of their own so that you don't have to query the database for them very often.

总之,我要说的问题是表中有多少字段,以及它们被访问的频率。如果用户活动常常不是很有趣,那么出于性能和维护的原因,总是将其放在相同的记录上可能会令人讨厌。如果某些数据(比如设置)经常被访问,但仅仅包含太多字段,那么合并表可能也不太方便。如果您只对性能增益感兴趣,您可以考虑其他方法,例如将设置分开,但是将它们保存在它们自己的会话变量中,这样您就不必经常查询数据库了。

#4


9  

Do all of those tables have a 1-to-1 relationship? For example, will each user row only have one corresponding row in user_stats or user_levels? If so, it might make sense to combine them into one table. If the relationship is not 1 to 1 though, it probably wouldn't make sense to combine (denormalize) them.

所有这些表都有1比1的关系吗?例如,每个用户行在user_stats或user_levels中是否只有一个对应的行?如果是这样,将它们合并到一个表中可能是有意义的。如果关系不是1到1,那么合并(反规范化)它们可能没有意义。

Having them in separate tables vs. one table is probably going to have little effect on performance though unless you have hundreds of thousands or millions of user records. The only real gain you'll get is from simplifying your queries by combining them.

将它们放在单独的表中而不是一个表中可能对性能影响不大,除非您有数十万或数百万的用户记录。您将获得的唯一真正好处是通过组合查询来简化查询。

ETA:

埃塔:

If your concern is about having too many columns, then think about what stuff you typically use together and combine those, leaving the rest in a separate table (or several separate tables if needed).

如果您关心的是有太多的列,那么考虑一下您通常一起使用的内容,并将它们组合在一起,将其余的放在一个单独的表中(如果需要,也可以放在几个单独的表中)。

If you look at the way you use the data, my guess is that you'll find that something like 80% of your queries use 20% of that data with the remaining 80% of the data being used only occasionally. Combine that frequently used 20% into one table, and leave the 80% that you don't often use in separate tables and you'll probably have a good compromise.

如果你看看你使用数据的方式,我猜你会发现80%的查询使用20%的数据而剩下的80%的数据只是偶尔使用。将经常使用的20%合并到一个表中,剩下的80%在单独的表中使用,你可能会有一个很好的折中方案。

#5


6  

Creating one massive table goes against relational database principals. I wouldn't combine all them into one table. Your going to get multiple instances of repeated data. If your user has three interests for example, you will have 3 rows, with the same user data in just to store the three different interests. Definatly go for the multiple 'normalized' table approach. See this Wiki page for database normalization.

创建一个大型表会违反关系数据库主体。我不会把它们合并到一张桌子上。你会得到多个重复数据的实例。例如,如果你的用户有三个兴趣爱好,你将有3行,相同的用户数据只是为了存储三个不同的兴趣爱好。一定要使用多重“规范化”表方法。请参见这个Wiki页面以实现数据库规范化。

Edit: I have updated my answer, as you have updated your question... I agree with my initial answer even more now since...

编辑:我更新了我的答案,因为你更新了你的问题……我现在更同意我最初的答案,因为……

a large portion of these cells are likely to remain empty

这些细胞的很大一部分可能仍然是空的

If for example, a user didn't have any interests, if you normalize then you simple wont have a row in the interest table for that user. If you have everything in one massive table, then you will have columns (and apparently a lot of them) that contain just NULL's.

例如,如果用户没有任何兴趣,如果您规范化,那么您就不会为该用户在interest表中有一行。如果所有内容都在一个大表中,那么就会有包含NULL'的列(显然很多列)。

I have worked for a telephony company where there has been tons of tables, getting data could require many joins. When the performance of reading from these tables was critical then procedures where created that could generate a flat table (i.e. a denormalized table) that would require no joins, calculations etc that reports could point to. These where then used in conjunction with a SQL server agent to run the job at certain intervals (i.e. a weekly view of some stats would run once a week and so on).

我曾在一家电话公司工作,那里有很多表格,获取数据可能需要很多连接。当从这些表中读取的性能是关键的时候,创建可以生成一个平表的程序(即一个非规范化的表),它不需要连接、计算等,而报告可以指向。然后与SQL server代理一起使用它们以特定的间隔运行作业(例如,每周运行一次某些统计数据的视图,等等)。

#6


6  

Why not use the same approach Wordpress does by having a users table with basic user information that everyone has and then adding a "user_meta" table that can basically be any key, value pair associated with the user id. So if you need to find all the meta information for the user you could just add that to your query. You would also not always have to add the extra query if not needed for things like logging in. The benefit to this approach also leaves your table open to adding new features to your users such as storing their twitter handle or each individual interest. You also won't have to deal with a maze of associated ID's because you have one table that rules all metadata and you will limit it to only one association instead of 50.

为什么不使用相同的方法Wordpress通过用户与基本的用户信息表,每个人都有,然后添加一个“user_meta”表,基本上可以任意键,值对与用户id相关联。如果你需要找到所有的元信息用户你可以添加到您的查询。如果不需要登录之类的东西,您也不必总是添加额外的查询。这种方法的好处还在于,您的表可以向用户添加新特性,比如存储他们的twitter句柄或每个人的兴趣。您也不需要处理关联ID的迷宫,因为您有一个表来管理所有元数据,您将只将其限制为一个关联而不是50。

Wordpress specifically does this to allow for features to be added via plugins, therefore allowing for your project to be more scalable and will not require a complete database overhaul if you need to add a new feature.

Wordpress特别允许通过插件添加特性,因此允许您的项目具有更强的可扩展性,如果需要添加新特性,则不需要对数据库进行彻底的修改。

#7


2  

I think this is one of those "it depends" situation. Having multiple tables is cleaner and probably theoretically better. But when you have to join 6-7 tables to get information about a single user, you might start to rethink that approach.

我认为这是一种“视情况而定”的情况。拥有多个表更简洁,理论上可能更好。但是,当您必须连接6-7个表以获取关于单个用户的信息时,您可能会开始重新考虑这种方法。

#8


1  

I would say it depends on what the other tables really mean. Does a user_details contain more then 1 more / users and so on. What level on normalization is best suited for your needs depends on your demands.

我认为这取决于其他表的真正含义。user_details是否包含更多的1个/用户等等。什么级别的标准化最适合您的需求取决于您的需求。

If you have one table with good index that would probably be faster. But on the other hand probably more difficult to maintain.

如果你有一个索引良好的表,可能会更快。但另一方面,可能更难维持。

To me it look like you could skip User_Details as it probably is 1 to 1 relation with Users. But the rest are probably alot of rows per user?

对我来说,似乎可以跳过User_Details,因为它与用户的关系可能是1比1。但是剩下的可能是每个用户的行数?