SQL优化多个连接结果中的重复记录

时间:2021-04-10 00:23:51

For example I have 3 tables: First like 'Users', where for every single user stored his name. Second - 'Location', where addresses of users stored - typically 1 address for 1 user. And third - 'Messages' - where every user typically have a bunch of records.

例如,我有3个表:首先是“用户”,每个用户都存储了他的名字。第二个 - “位置”,用户存储的地址 - 通常为1个用户的1个地址。第三 - “消息” - 每个用户通常都有一堆记录。

And when joining these three tables - like

加入这三个表时 - 就像

SELECT Users.name, Location.address, Messages.message FROM Users
LEFT JOIN Location ON Location.user_id = Users.id
LEFT JOIN Messages ON Messages.user_id = Users.id
WHERE blah blah

Results will be containing many duplicate records, because table 'Messages' have many records for each user. And these duplicates will slow down fetching. And so Im looking for solution, how it can be optimized. For example I tried GROUP_CONCAT() with GROUP BY User.id - but when results of GROUP_CONCAT() getting to be relatively long, GROUP_CONCAT() starts to return NULL. And I can not master it, Ive tried to set group_concat_max_len and max_allowed_packet to high values - all with no luck.

结果将包含许多重复记录,因为表'Messages'为每个用户都有许多记录。而这些重复项将减慢提取速度。所以我正在寻找解决方案,如何优化它。例如,我尝试使用GROUP BY User.id进行GROUP_CONCAT() - 但是当GROUP_CONCAT()的结果变得相对较长时,GROUP_CONCAT()开始返回NULL。我无法掌握它,我试图将group_concat_max_len和max_allowed_pa​​cket设置为高值 - 所有这些都没有运气。

Well, do somebody have any thoughts on this?

那么,有人对此有任何想法吗?

ps Probably important note, that in my real case, instead of just one column 'message' I have many columns, and many distinct rows with them. And my 'Messages' tables look like 'message', 'time', 'recipient', deleted', 'medium', etc and my GROUP_CONCAT() contains all these fields.

ps可能很重要的一点,在我的实际案例中,我有很多列,而不是一列“消息”,而且有很多不同的行。我的'Messages'表格看起来像'message','time','recipient',删除','medium'等,而我的GROUP_CONCAT()包含所有这些字段。

UPD: Seems like GROUP_CONCAT() drops all results if only one record appears to be NULL. For example if using GROUP_CONCAT(Messages.message, Messages.time), and occasionally time in one row will be NULL, it will return NULL.

UPD:如果只有一条记录显示为NULL,则看起来像GROUP_CONCAT()会丢弃所​​有结果。例如,如果使用GROUP_CONCAT(Messages.message,Messages.time),并且偶尔在一行中的时间将为NULL,则它将返回NULL。

3 个解决方案

#1


0  

In this case, you may actually benefit from a document-storage database like Mongo, for storing Messages.

在这种情况下,您实际上可以从像Mongo这样的文档存储数据库中受益,用于存储消息。

#2


0  

You probably want group_concat(distinct):

你可能想要group_concat(distinct):

SELECT Users.name, group_concat(distinct Location.address) as locations,
       group_concat(distinct Messages.message) as messages
FROM Users
LEFT JOIN Location ON Location.user_id = Users.id
LEFT JOIN Messages ON Messages.user_id = Users.id
WHERE blah blah
group by users.name

#3


0  

Results will be containing many duplicate records, because table 'Messages' have many records for each user.

结果将包含许多重复记录,因为表'Messages'为每个用户都有许多记录。

By "duplicate", do mean that for each unique message there will be a row, and that that row will contain values for user name and location that exist in other rows? Are you asking for a way to smush all the messages into one, so that there's only one row for each user+location? For speed??

通过“复制”,是否意味着对于每个唯一的消息,将有一行,并且该行将包含存在于其他行中的用户名和位置的值?您是否想要一种方法将所有消息刷成一个,以便每个用户+位置只有一行?速度?

If this is a question of performance, I would be interested to hear how measured, and what's fast enough. I also wonder, should you succeed, how you'll distinguish messages.

如果这是一个性能问题,我很想知道如何测量,以及什么是足够快。我也想知道,如果你成功了,你将如何区分消息。

#1


0  

In this case, you may actually benefit from a document-storage database like Mongo, for storing Messages.

在这种情况下,您实际上可以从像Mongo这样的文档存储数据库中受益,用于存储消息。

#2


0  

You probably want group_concat(distinct):

你可能想要group_concat(distinct):

SELECT Users.name, group_concat(distinct Location.address) as locations,
       group_concat(distinct Messages.message) as messages
FROM Users
LEFT JOIN Location ON Location.user_id = Users.id
LEFT JOIN Messages ON Messages.user_id = Users.id
WHERE blah blah
group by users.name

#3


0  

Results will be containing many duplicate records, because table 'Messages' have many records for each user.

结果将包含许多重复记录,因为表'Messages'为每个用户都有许多记录。

By "duplicate", do mean that for each unique message there will be a row, and that that row will contain values for user name and location that exist in other rows? Are you asking for a way to smush all the messages into one, so that there's only one row for each user+location? For speed??

通过“复制”,是否意味着对于每个唯一的消息,将有一行,并且该行将包含存在于其他行中的用户名和位置的值?您是否想要一种方法将所有消息刷成一个,以便每个用户+位置只有一行?速度?

If this is a question of performance, I would be interested to hear how measured, and what's fast enough. I also wonder, should you succeed, how you'll distinguish messages.

如果这是一个性能问题,我很想知道如何测量,以及什么是足够快。我也想知道,如果你成功了,你将如何区分消息。