I have a table of sent SMS text messages which must join to a delivery receipt table to get the latest status of a message.
我有一个发送SMS的文本消息表,它必须连接到一个发送收据表以获得消息的最新状态。
There are 997,148 sent text messages.
有997148条短信。
I am running this query:
我正在运行这个查询:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
LEFT JOIN
delivery_receipts d
ON
d.message_id = m.id
AND
d.id = (SELECT MAX(id) FROM delivery_receipts WHERE message_id = m.id)
Which returns 997,148 results including the latest status of each message.
返回997,148个结果,包括每个消息的最新状态。
This takes 22.8688 seconds to execute.
这需要22.8688秒才能执行。
Here is the SQL for messages_sent
:
messages_sent的SQL语句如下:
CREATE TABLE IF NOT EXISTS `messages_sent` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL,
`api_key` varchar(40) NOT NULL,
`to` varchar(15) NOT NULL,
`message` text NOT NULL,
`type` enum('sms','mms') NOT NULL DEFAULT 'sms',
`sender_id` varchar(15) NOT NULL,
`route` tinyint(1) unsigned NOT NULL,
`supplier` tinyint(1) unsigned NOT NULL,
`submission_reference` varchar(40) NOT NULL,
`unique_submission_reference` varchar(40) NOT NULL,
`reason_code` tinyint(1) unsigned NOT NULL,
`reason` text NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `api_key` (`api_key`),
KEY `sender_id` (`sender_id`),
KEY `route` (`route`),
KEY `submission_reference` (`submission_reference`),
KEY `reason_code` (`reason_code`),
KEY `timestamp` (`timestamp`),
KEY `to` (`to`),
KEY `unique_submission_reference` (`unique_submission_reference`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1000342 ;
And for delivery_receipts
:
和delivery_receipts:
CREATE TABLE IF NOT EXISTS `delivery_receipts` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`message_id` int(10) unsigned NOT NULL,
`dlr_id` bigint(20) unsigned NOT NULL,
`dlr_status` tinyint(2) unsigned NOT NULL,
`dlr_substatus` tinyint(2) unsigned NOT NULL,
`dlr_final` tinyint(1) unsigned NOT NULL,
`dlr_refid` varchar(40) NOT NULL,
`dlr_phone` varchar(12) NOT NULL,
`dlr_charge` tinyint(3) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `message_id` (`message_id`),
KEY `dlr_status` (`dlr_status`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1468592 ;
Here is an EXPLAIN
of the SQL:
下面是SQL的一个解释:
3 个解决方案
#1
4
There is a trick.
有一个技巧。
Instead with picking MAX element with subquery you join with interesting table twice like this:
使用带有子查询的MAX元素,你可以像这样两次加入有趣的表:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
JOIN
delivery_receipts d
ON
d.message_id = m.id
LEFT JOIN
delivery_receipts d1
ON
d1.message_id = m.id
AND
d1.id > d.id
WHERE
d1.id IS NULL
The second time table is joined it has additional condition that field that you want to pick MAX of should be higher than in the first table. And filter out all rows except the ones that do not have other row that's higher.
第二次表加入它有附加条件,你要选择的最大值应该比第一个表高。过滤掉所有的行,除了那些没有更高行的行。
This way only max rows remain.
这样,只有max行保留。
I changed your LEFT JOIN to JOIN. I'm not sure if you need LEFT JOIN there. Even if you it should still work.
我把你的左边加入了。我不确定你是否需要留在那里。即使是你,它仍然可以工作。
Amazingly this is much faster than subquery.
令人惊讶的是,这比子查询快得多。
You might want to try out other variant of the same idea:
你可能想尝试其他的类似的想法:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
JOIN
(
SELECT d0.* FROM
delivery_receipts d0
LEFT JOIN
delivery_receipts d1
ON
d1.message_id = d0.message_id
AND
d1.id > d0.id
WHERE
d1.id IS NULL
) d
ON
d.message_id = m.id
Make sure you have multicolumn index for fields message_id and id in table delivery_receipts maybe such:
确保表delivery_receipt中有字段message_id和id的多olumn索引,可能是这样的:
ALTER TABLE `delivery_receipts`
ADD INDEX `idx` ( `message_id` , `id` );
#2
0
The slowdown seems large, but I'm afraid there is not much room for improvement if you need to stick with this query.
放缓似乎很大,但如果您需要坚持这个查询,恐怕没有多少改进的空间。
One problem is the reporting of d.dlr_status
. Try to remove this from the list of reported columns and see if the query time improves.
一个问题是d.dlr_status的报告。尝试从报告列列表中删除它,看看查询时间是否有所改进。
You would get the best possible performance if everything was stored in messages_sent
. This won't be NF anymore, but it's an option if you need performance. To achieve this, create id
and dlr_status
columns in messages_sent
and add appropriate INSERT
, UPDATE
and DELETE
triggers to delivery_receipts
. The triggers would update the corresponding columns in messages_sent
-- it's a trade-off between query time and update time.
如果所有内容都存储在messages_sent中,您将获得最好的性能。这不再是NF,但如果需要性能,它是一个选项。为此,在messages_sent中创建id和dlr_status列,并向delivery_receipt添加适当的插入、更新和删除触发器。触发器将更新messages_sent中的相应列——这是查询时间和更新时间之间的权衡。
#3
0
You can "cache" part of the computation in the delivery_receipts table, just add is_last_status boolean to the delivery_receipts table. Using simple triggers you can change the value every insert of new receipt.
您可以在delivery_receipt表中“缓存”部分计算,只需向delivery_receipt表中添加is_last_status boolean即可。使用简单的触发器,您可以更改每次插入新收据的值。
Than the select query becomes much simpler:
比select查询更简单:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
LEFT JOIN
delivery_receipts d
ON
d.message_id = m.id
WHERE
d.is_last_status = true
If mysql would support partial indexes the query could be speed up even more.
如果mysql支持部分索引,查询可能会更快。
#1
4
There is a trick.
有一个技巧。
Instead with picking MAX element with subquery you join with interesting table twice like this:
使用带有子查询的MAX元素,你可以像这样两次加入有趣的表:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
JOIN
delivery_receipts d
ON
d.message_id = m.id
LEFT JOIN
delivery_receipts d1
ON
d1.message_id = m.id
AND
d1.id > d.id
WHERE
d1.id IS NULL
The second time table is joined it has additional condition that field that you want to pick MAX of should be higher than in the first table. And filter out all rows except the ones that do not have other row that's higher.
第二次表加入它有附加条件,你要选择的最大值应该比第一个表高。过滤掉所有的行,除了那些没有更高行的行。
This way only max rows remain.
这样,只有max行保留。
I changed your LEFT JOIN to JOIN. I'm not sure if you need LEFT JOIN there. Even if you it should still work.
我把你的左边加入了。我不确定你是否需要留在那里。即使是你,它仍然可以工作。
Amazingly this is much faster than subquery.
令人惊讶的是,这比子查询快得多。
You might want to try out other variant of the same idea:
你可能想尝试其他的类似的想法:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
JOIN
(
SELECT d0.* FROM
delivery_receipts d0
LEFT JOIN
delivery_receipts d1
ON
d1.message_id = d0.message_id
AND
d1.id > d0.id
WHERE
d1.id IS NULL
) d
ON
d.message_id = m.id
Make sure you have multicolumn index for fields message_id and id in table delivery_receipts maybe such:
确保表delivery_receipt中有字段message_id和id的多olumn索引,可能是这样的:
ALTER TABLE `delivery_receipts`
ADD INDEX `idx` ( `message_id` , `id` );
#2
0
The slowdown seems large, but I'm afraid there is not much room for improvement if you need to stick with this query.
放缓似乎很大,但如果您需要坚持这个查询,恐怕没有多少改进的空间。
One problem is the reporting of d.dlr_status
. Try to remove this from the list of reported columns and see if the query time improves.
一个问题是d.dlr_status的报告。尝试从报告列列表中删除它,看看查询时间是否有所改进。
You would get the best possible performance if everything was stored in messages_sent
. This won't be NF anymore, but it's an option if you need performance. To achieve this, create id
and dlr_status
columns in messages_sent
and add appropriate INSERT
, UPDATE
and DELETE
triggers to delivery_receipts
. The triggers would update the corresponding columns in messages_sent
-- it's a trade-off between query time and update time.
如果所有内容都存储在messages_sent中,您将获得最好的性能。这不再是NF,但如果需要性能,它是一个选项。为此,在messages_sent中创建id和dlr_status列,并向delivery_receipt添加适当的插入、更新和删除触发器。触发器将更新messages_sent中的相应列——这是查询时间和更新时间之间的权衡。
#3
0
You can "cache" part of the computation in the delivery_receipts table, just add is_last_status boolean to the delivery_receipts table. Using simple triggers you can change the value every insert of new receipt.
您可以在delivery_receipt表中“缓存”部分计算,只需向delivery_receipt表中添加is_last_status boolean即可。使用简单的触发器,您可以更改每次插入新收据的值。
Than the select query becomes much simpler:
比select查询更简单:
SELECT
m.id,
m.user_id,
m.api_key,
m.to,
m.message,
m.sender_id,
m.route,
m.submission_reference,
m.unique_submission_reference,
m.reason_code,
m.timestamp,
d.id AS dlrid,
d.dlr_status
FROM
messages_sent m
LEFT JOIN
delivery_receipts d
ON
d.message_id = m.id
WHERE
d.is_last_status = true
If mysql would support partial indexes the query could be speed up even more.
如果mysql支持部分索引,查询可能会更快。