I have a table with all registered members
, with columns like uid
, username
, last_action_time
.
我有一个包含所有注册成员的表,其中包含uid,username,last_action_time等列。
I also have a table that keeps track of who has been online
in the past 5 minutes. It is populated by a cronjob by pulling data from members
with last_action_time
being less than 5 minutes ago.
我还有一张桌子,可以跟踪过去5分钟内谁在线。它由cronjob填充,通过从last_action_time小于5分钟前的成员中提取数据。
Question: Should my online
table include username
or no? I'm asking this because I could JOIN
both tables to obtain this data, but I could store the username
in the online
table and not have to join. My concern is that I will have duplicate data stored in two tables, and that seems wrong.
问题:我的在线表格是否应包含用户名?我问这个是因为我可以加入两个表来获取这些数据,但我可以将用户名存储在在线表中而不必加入。我担心的是,我会将重复数据存储在两个表中,这似乎是错误的。
5 个解决方案
#1
1
If you haven't run into performance issues, DO NOT denormalize. There is a good saying "normalize until it hurts, denormalize until it works". In your case, it works with normalized schema (users table joined). And data bases are designed to handle huge amounts of data.
如果您还没有遇到性能问题,请不要反规范化。有一句好话“正常化直到它受伤,反正常化直到它起作用”。在您的情况下,它适用于规范化架构(加入用户表)。数据库旨在处理大量数据。
#2
1
This approach is called denormalization. I mean that sometimes for quick select query we have to duplicate some data across tables. In this case I believe that this one is good choice if you have a lot of data in both tables.
这种方法称为非规范化。我的意思是有时为了快速选择查询,我们必须跨表复制一些数据。在这种情况下,如果两个表中都有大量数据,我相信这个是很好的选择。
#3
1
You just hit a very valid question: when does it make sense to duplicate data ?
您只是提出了一个非常有效的问题:什么时候复制数据才有意义?
I could rewrite your question as: when does it make sense to use a cache. Caches need maintenance, you need to keep them up to date yourself and they use up some extra space (although negligible in this case). But they have a pro: performance increase.
我可以将您的问题重写为:何时使用缓存是有意义的。缓存需要维护,您需要自己更新它们并且它们占用了一些额外的空间(在这种情况下可以忽略不计)。但他们有一个专业:性能提升。
In the example you mentioned, you need to see if that performance increase is actually worth it and if it outweighs the additional work of having and maintaining a cache.
在您提到的示例中,您需要查看性能增加是否实际值得,以及它是否超过了拥有和维护缓存的额外工作。
My gut feeling is that your database isn't gigantic, so joining every time should take a minimal amount of effort from the server, so I'd go with that.
我的直觉是你的数据库并不是巨大的,所以每次加入都需要从服务器上花费很少的精力,所以我会坚持下去。
Hope it helps
希望能帮助到你
#4
0
You shouldn't store the username in the online table. There shouldn't be any performance issue . Just use a join every time to get the username.
您不应将用户名存储在在线表中。不应该有任何性能问题。只需每次都使用一个连接来获取用户名。
Plus, you don't need the online table at all, why don't you query only the users with an last_action_time < 5 min from the members table?
另外,您根本不需要在线表,为什么不查询成员表中last_action_time <5分钟的用户?
#5
0
A user ID would be an integer (AKA 4 bytes). A username (i would imagine is up to 16 bytes). How many users? How ofter a username changes? These are the questions to consider.
用户ID是整数(AKA 4字节)。用户名(我想象的最多16个字节)。有多少用户?用户名如何更改?这些是需要考虑的问题。
I wold just store the username. I wou;ld have though once the username is registered it is fixed for the duration.
我只是存储用户名。我想,但是一旦用户名被注册,它就会固定一段时间。
If is difficult to answer these questions without a little background - performance issues are difficult to think about when the depth and breath, usabge etc. is not known.
如果在没有一点背景的情况下很难回答这些问题 - 当深度和呼吸,usabge等未知时,很难想到性能问题。
#1
1
If you haven't run into performance issues, DO NOT denormalize. There is a good saying "normalize until it hurts, denormalize until it works". In your case, it works with normalized schema (users table joined). And data bases are designed to handle huge amounts of data.
如果您还没有遇到性能问题,请不要反规范化。有一句好话“正常化直到它受伤,反正常化直到它起作用”。在您的情况下,它适用于规范化架构(加入用户表)。数据库旨在处理大量数据。
#2
1
This approach is called denormalization. I mean that sometimes for quick select query we have to duplicate some data across tables. In this case I believe that this one is good choice if you have a lot of data in both tables.
这种方法称为非规范化。我的意思是有时为了快速选择查询,我们必须跨表复制一些数据。在这种情况下,如果两个表中都有大量数据,我相信这个是很好的选择。
#3
1
You just hit a very valid question: when does it make sense to duplicate data ?
您只是提出了一个非常有效的问题:什么时候复制数据才有意义?
I could rewrite your question as: when does it make sense to use a cache. Caches need maintenance, you need to keep them up to date yourself and they use up some extra space (although negligible in this case). But they have a pro: performance increase.
我可以将您的问题重写为:何时使用缓存是有意义的。缓存需要维护,您需要自己更新它们并且它们占用了一些额外的空间(在这种情况下可以忽略不计)。但他们有一个专业:性能提升。
In the example you mentioned, you need to see if that performance increase is actually worth it and if it outweighs the additional work of having and maintaining a cache.
在您提到的示例中,您需要查看性能增加是否实际值得,以及它是否超过了拥有和维护缓存的额外工作。
My gut feeling is that your database isn't gigantic, so joining every time should take a minimal amount of effort from the server, so I'd go with that.
我的直觉是你的数据库并不是巨大的,所以每次加入都需要从服务器上花费很少的精力,所以我会坚持下去。
Hope it helps
希望能帮助到你
#4
0
You shouldn't store the username in the online table. There shouldn't be any performance issue . Just use a join every time to get the username.
您不应将用户名存储在在线表中。不应该有任何性能问题。只需每次都使用一个连接来获取用户名。
Plus, you don't need the online table at all, why don't you query only the users with an last_action_time < 5 min from the members table?
另外,您根本不需要在线表,为什么不查询成员表中last_action_time <5分钟的用户?
#5
0
A user ID would be an integer (AKA 4 bytes). A username (i would imagine is up to 16 bytes). How many users? How ofter a username changes? These are the questions to consider.
用户ID是整数(AKA 4字节)。用户名(我想象的最多16个字节)。有多少用户?用户名如何更改?这些是需要考虑的问题。
I wold just store the username. I wou;ld have though once the username is registered it is fixed for the duration.
我只是存储用户名。我想,但是一旦用户名被注册,它就会固定一段时间。
If is difficult to answer these questions without a little background - performance issues are difficult to think about when the depth and breath, usabge etc. is not known.
如果在没有一点背景的情况下很难回答这些问题 - 当深度和呼吸,usabge等未知时,很难想到性能问题。