MySQL在这类查询中是自然地慢,还是配置错误?

时间:2022-09-20 19:01:53

The following query is intended to receive a list of unread messages by user. It involves 3 tables: recipients contains a relation of users to message IDs, messages contains the messages themselves, and message_readers contains a list of which users have read which messages.

下面的查询旨在接收用户未读消息的列表。它包含3个表:收件人包含用户与消息id的关系,消息包含消息本身,message_reader包含用户读过哪些消息的列表。

The query reliably takes 4.9 seconds - this is seriously hurting our performance, and is especially worrisome since we hope the database will eventually be several orders of magnitude larger. Granted, it's an inherently heavy query, but the data set is tiny, and intuitively it seems that it should be much faster. The server has enough memory (32gb) that the entire database should be loaded in RAM at all times, and there's nothing else running on the box.

查询可靠地需要4.9秒——这严重损害了我们的性能,尤其令人担忧,因为我们希望数据库最终会比现在大几个数量级。当然,这是一个固有的繁重的查询,但是数据集很小,而且直觉上看起来它应该快得多。服务器有足够的内存(32gb),整个数据库都应该在RAM中随时加载,并且在这个框中没有其他的操作。

The tables are all tiny:

桌子都很小:

recipients: 23581
messages: 9679
message_readers: 2685

The query itself:

查询本身:

SELECT 
    m.*
FROM 
    messages m
INNER JOIN recipients r ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.message_id = m.id
WHERE
    r.id = $user_id
    AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)

The explain plan is pretty straightforward:

解释计划非常简单:

+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| id | select_type | table | type   | possible_keys                     | key                               | key_len | ref                            | rows  | Extra       |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
|  1 | SIMPLE      | r     | ref    | index_recipients_on_id            | index_recipients_on_id            | 768     | const                          | 11908 | Using where |
|  1 | SIMPLE      | m     | eq_ref | PRIMARY                           | PRIMARY                           | 4       | db.r.message_id                |     1 | Using index |
|  1 | SIMPLE      | mr    | ALL    | NULL                              | NULL                              | NULL    | NULL                           |  2498 | Using where |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+

There IS an index on message_readers.read_by_id, but I guess it can't really use it because of the IS NULL condition.

message_reader上有一个索引。read_by_id,但我猜它不能真正使用它,因为是空条件。

I'm using all default settings except for the following:

我正在使用所有默认设置,除了以下设置:

key_buffer=4G
query_cache_limit = 256M
query_cache_size = 1G
innodb_buffer_pool_size=12G

Thanks!

谢谢!

6 个解决方案

#1


4  

Assuming that message_readers is a subset of recipients, I recommend making the following changes:

假设message_reader是收件人的子集,我建议进行以下更改:

  1. Get rid of the message_readers table and replace it with a flag on the recipients table. This will eliminiate the null check and remove a join.

    删除message_reader表,并将其替换为接收表上的标志。这将省略空检查并删除连接。

  2. It probably already is, but make sure your clustered index for recipients is id, message_id rather than message_id, id, since nearly all searches for messages will be based on the recipients.

    可能已经是这样了,但是要确保收件人的聚集索引是id、message_id而不是message_id、id,因为几乎所有对消息的搜索都基于收件人。

Here is the SELECT that results:

这里是结果的选择:

SELECT
    r.whatever,
    m.whatever,
    -- ...
FROM
    recipients r
    INNER JOIN messages m ON m.id = r.message_id
WHERE
    r.id = $user_id
    AND r.read_flag = 'N'

UPDATE

更新

Here is the correct version of your query using the existing scheme:

以下是使用现有方案查询的正确版本:

SELECT
    r.whatever,
    m.whatever,
    -- ...
FROM
    recipients r
    INNER JOIN messages m ON r.message_id = m.id
    LEFT JOIN message_readers mr ON mr.read_by_id = r.id 
                                 AND mr.message_id = m.id
WHERE
    r.id = $user_id
    AND mr.read_by_id IS NULL

This assumes that your clustered indexes are what would be expected:

这假设您的聚集索引是预期的:

recipients: id, message_id
messages: id
message_readers: read_by_id, message_id

#2


1  

You can get rid of the IS NULL-condition when you rewrite your query like this:

当你像这样重写你的查询时,你可以摆脱IS为空条件:

SELECT 
    count(m.id)
FROM 
    messages m
INNER JOIN recipients r ON re.message_id = m.id
WHERE r.id = $user_id
  AND NOT EXISTS
         (SELECT mr.id 
            FROM message_readers mr 
           WHERE mr.message_id = m.id
             AND mr.read_by_id = $user_id)

Basically this reads like: get all messages for recipient where not in message_readers and describes the problem simpeler.

这基本上是这样的:在message_reader中获取收件人的所有消息,并描述问题简化程序。

#3


1  

Assuming you just want the count as shown in your query), what happens if you change the joins like so?

假设您只想要查询中的计数),如果您像这样更改连接,会发生什么情况?

I use MSSQL and this has the potential to speed it up. I've never used MySQL, but it should work, shouldn't it?

我使用MSSQL,这有可能加速它。我从来没用过MySQL,但它应该能用,不是吗?

SELECT     count(m.id)
FROM       messages m
INNER JOIN recipients r ON r.message_id = m.id AND r.id = $user_id
LEFT JOIN  message_readers mr ON mr.message_id = m.id AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)

EDIT: What about this for a mad idea? I thought you could split out the OR into two separate left joins and then take the record if either of those returns something.

编辑:这个疯狂的想法怎么样?我认为你可以把这个或两个分开的左连接,然后记录如果其中任何一个返回某个东西。

SELECT     count(m.id)
FROM       messages m
LEFT JOIN  recipients r ON r.message_id = m.id AND r.id = $user_id
LEFT JOIN  message_readers mr ON mr.message_id = m.id AND mr.read_by_id IS NULL
LEFT JOIN  message_readers mr2 ON mr2.message_id = m.id AND mr2.read_by_id <> $user_id
WHERE      COALESCE(mr.message_id, mr2.message_id) IS NOT NULL

#4


1  

Unless I am missing something, you don't appear to need the messages table at all. What you really want is the number of message ids that appear for this user in recipients, and do not appear for this user in message_readers.

除非我漏掉了什么,否则您似乎根本不需要消息表。您真正想要的是在收件人中为这个用户显示的消息id的数量,而不是在message_reader中为这个用户显示的消息id的数量。

If I'm right above, you can accomplish what you want with a MINUS:

如果我在上面,你可以用a -来完成你想要的:

SELECT count(message_id)
  FROM (
        SELECT r.message_id  
          FROM recipients r 
         WHERE r.id = $user_id
        MINUS
        SELECT mr.message_id
          FROM message_readers mr
         WHERE mr.read_by_id = $user_id
       )

This avoids joins entirely. Now if you do indeed need data from the messages table for your production query, you can join the messages table to this subquery (or stick it in an IN clause).

这避免了完全连接。现在,如果您确实需要从消息表中获取生产查询的数据,您可以将消息表连接到这个子查询(或者将其插入到in子句中)。

It's possible that I'm off base here as my experience is in Oracle-land but MySQL supports MINUS so this is probably worth a shot.

我可能在这里偏离了基础,因为我的经验是在Oracle-land但是MySQL支持负的,所以这可能值得一试。

#5


1  

What's the query time for

查询时间是多少

select distinct message_id
  from message_readers
 where read_by_id <> $user_id

Note: The "is null" logic should be caught by this since null isn't equal to anything

注意:“是空”逻辑应该被这个逻辑捕获,因为null不等于任何东西。

If this is fast then try this:

如果速度很快,那么试试这个:

SELECT count(m.id)
FROM messages m
INNER JOIN recipients r ON r.message_id = m.id
where r.id = $user_id
and m.id in (
    select distinct message_id
      from message_readers
     where read_by_id <> $user_id)

Original answer didn't work: Try including message_id and id in a covering index on recipients and see what happens.

最初的答案是无效的:尝试将message_id和id包含在收件人的覆盖索引中,看看会发生什么。

#6


1  

an comment count(m.id) means count not null values but m.id is never null so its extra. well try with that

注释计数(m.id)表示计数不是空值,而是m。id从不为空,所以它是额外的。试着用这

SELECT count(*)
FROM 
messages m
INNER JOIN recipients r ON r.message_id = m.id  
left join 
(
    select m.id
    messages m
    INNER JOIN message_readers mr 
    ON mr.message_id = m.id     
    and (mr.read_by_id <> $user_id or mr.read_by_id IS NULL)        
)as sub 
on sub.id = m.id        
WHERE r.id = $user_id

one doubt maybe is correct in you business logic why all user can read incomming messages (mr.read_by_is null ) and why an message can be read for the others or do not specific receiver (mr.read_by_id <> $user_id)

您的业务逻辑中有一个疑问可能是正确的:为什么所有用户都可以读取输入消息(read_by_is null),为什么可以为其他人读取消息,或者不读取特定的接收者(mr.read_by_id <> $user_id)

its a pool, I guess

我猜是游泳池

one better approach is change the inner in subquery by an exists. see that "mr.read_by_id IS NULL" is not neccesary that is if mr_read_by_id is null "so means what " mr.read_by_id = $user_id " is false"

一种更好的方法是通过存在改变子查询的内部。要注意"mr。read_by_id为空"不是必需的如果mr_read_by_id为空"那么"mr。read_by_id = $user_id "是假的"

SELECT count(*)
FROM 
messages m
INNER JOIN recipients r ON r.message_id = m.id  
left join 
(
    select m.id
    messages m
            where not exists(select * from message_readers mr 
    where mr.message_id = m.id      
    and mr.read_by_id = $user_id)
)as sub 
on sub.id = m.id        
WHERE r.id = $user_id

#1


4  

Assuming that message_readers is a subset of recipients, I recommend making the following changes:

假设message_reader是收件人的子集,我建议进行以下更改:

  1. Get rid of the message_readers table and replace it with a flag on the recipients table. This will eliminiate the null check and remove a join.

    删除message_reader表,并将其替换为接收表上的标志。这将省略空检查并删除连接。

  2. It probably already is, but make sure your clustered index for recipients is id, message_id rather than message_id, id, since nearly all searches for messages will be based on the recipients.

    可能已经是这样了,但是要确保收件人的聚集索引是id、message_id而不是message_id、id,因为几乎所有对消息的搜索都基于收件人。

Here is the SELECT that results:

这里是结果的选择:

SELECT
    r.whatever,
    m.whatever,
    -- ...
FROM
    recipients r
    INNER JOIN messages m ON m.id = r.message_id
WHERE
    r.id = $user_id
    AND r.read_flag = 'N'

UPDATE

更新

Here is the correct version of your query using the existing scheme:

以下是使用现有方案查询的正确版本:

SELECT
    r.whatever,
    m.whatever,
    -- ...
FROM
    recipients r
    INNER JOIN messages m ON r.message_id = m.id
    LEFT JOIN message_readers mr ON mr.read_by_id = r.id 
                                 AND mr.message_id = m.id
WHERE
    r.id = $user_id
    AND mr.read_by_id IS NULL

This assumes that your clustered indexes are what would be expected:

这假设您的聚集索引是预期的:

recipients: id, message_id
messages: id
message_readers: read_by_id, message_id

#2


1  

You can get rid of the IS NULL-condition when you rewrite your query like this:

当你像这样重写你的查询时,你可以摆脱IS为空条件:

SELECT 
    count(m.id)
FROM 
    messages m
INNER JOIN recipients r ON re.message_id = m.id
WHERE r.id = $user_id
  AND NOT EXISTS
         (SELECT mr.id 
            FROM message_readers mr 
           WHERE mr.message_id = m.id
             AND mr.read_by_id = $user_id)

Basically this reads like: get all messages for recipient where not in message_readers and describes the problem simpeler.

这基本上是这样的:在message_reader中获取收件人的所有消息,并描述问题简化程序。

#3


1  

Assuming you just want the count as shown in your query), what happens if you change the joins like so?

假设您只想要查询中的计数),如果您像这样更改连接,会发生什么情况?

I use MSSQL and this has the potential to speed it up. I've never used MySQL, but it should work, shouldn't it?

我使用MSSQL,这有可能加速它。我从来没用过MySQL,但它应该能用,不是吗?

SELECT     count(m.id)
FROM       messages m
INNER JOIN recipients r ON r.message_id = m.id AND r.id = $user_id
LEFT JOIN  message_readers mr ON mr.message_id = m.id AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)

EDIT: What about this for a mad idea? I thought you could split out the OR into two separate left joins and then take the record if either of those returns something.

编辑:这个疯狂的想法怎么样?我认为你可以把这个或两个分开的左连接,然后记录如果其中任何一个返回某个东西。

SELECT     count(m.id)
FROM       messages m
LEFT JOIN  recipients r ON r.message_id = m.id AND r.id = $user_id
LEFT JOIN  message_readers mr ON mr.message_id = m.id AND mr.read_by_id IS NULL
LEFT JOIN  message_readers mr2 ON mr2.message_id = m.id AND mr2.read_by_id <> $user_id
WHERE      COALESCE(mr.message_id, mr2.message_id) IS NOT NULL

#4


1  

Unless I am missing something, you don't appear to need the messages table at all. What you really want is the number of message ids that appear for this user in recipients, and do not appear for this user in message_readers.

除非我漏掉了什么,否则您似乎根本不需要消息表。您真正想要的是在收件人中为这个用户显示的消息id的数量,而不是在message_reader中为这个用户显示的消息id的数量。

If I'm right above, you can accomplish what you want with a MINUS:

如果我在上面,你可以用a -来完成你想要的:

SELECT count(message_id)
  FROM (
        SELECT r.message_id  
          FROM recipients r 
         WHERE r.id = $user_id
        MINUS
        SELECT mr.message_id
          FROM message_readers mr
         WHERE mr.read_by_id = $user_id
       )

This avoids joins entirely. Now if you do indeed need data from the messages table for your production query, you can join the messages table to this subquery (or stick it in an IN clause).

这避免了完全连接。现在,如果您确实需要从消息表中获取生产查询的数据,您可以将消息表连接到这个子查询(或者将其插入到in子句中)。

It's possible that I'm off base here as my experience is in Oracle-land but MySQL supports MINUS so this is probably worth a shot.

我可能在这里偏离了基础,因为我的经验是在Oracle-land但是MySQL支持负的,所以这可能值得一试。

#5


1  

What's the query time for

查询时间是多少

select distinct message_id
  from message_readers
 where read_by_id <> $user_id

Note: The "is null" logic should be caught by this since null isn't equal to anything

注意:“是空”逻辑应该被这个逻辑捕获,因为null不等于任何东西。

If this is fast then try this:

如果速度很快,那么试试这个:

SELECT count(m.id)
FROM messages m
INNER JOIN recipients r ON r.message_id = m.id
where r.id = $user_id
and m.id in (
    select distinct message_id
      from message_readers
     where read_by_id <> $user_id)

Original answer didn't work: Try including message_id and id in a covering index on recipients and see what happens.

最初的答案是无效的:尝试将message_id和id包含在收件人的覆盖索引中,看看会发生什么。

#6


1  

an comment count(m.id) means count not null values but m.id is never null so its extra. well try with that

注释计数(m.id)表示计数不是空值,而是m。id从不为空,所以它是额外的。试着用这

SELECT count(*)
FROM 
messages m
INNER JOIN recipients r ON r.message_id = m.id  
left join 
(
    select m.id
    messages m
    INNER JOIN message_readers mr 
    ON mr.message_id = m.id     
    and (mr.read_by_id <> $user_id or mr.read_by_id IS NULL)        
)as sub 
on sub.id = m.id        
WHERE r.id = $user_id

one doubt maybe is correct in you business logic why all user can read incomming messages (mr.read_by_is null ) and why an message can be read for the others or do not specific receiver (mr.read_by_id <> $user_id)

您的业务逻辑中有一个疑问可能是正确的:为什么所有用户都可以读取输入消息(read_by_is null),为什么可以为其他人读取消息,或者不读取特定的接收者(mr.read_by_id <> $user_id)

its a pool, I guess

我猜是游泳池

one better approach is change the inner in subquery by an exists. see that "mr.read_by_id IS NULL" is not neccesary that is if mr_read_by_id is null "so means what " mr.read_by_id = $user_id " is false"

一种更好的方法是通过存在改变子查询的内部。要注意"mr。read_by_id为空"不是必需的如果mr_read_by_id为空"那么"mr。read_by_id = $user_id "是假的"

SELECT count(*)
FROM 
messages m
INNER JOIN recipients r ON r.message_id = m.id  
left join 
(
    select m.id
    messages m
            where not exists(select * from message_readers mr 
    where mr.message_id = m.id      
    and mr.read_by_id = $user_id)
)as sub 
on sub.id = m.id        
WHERE r.id = $user_id