SQL查询：加快大型表的速度

We have a table with about 25,000,000 rows called 'events' having the following schema:

我们有一个包含大约25,000,000行的表，称为“事件”，具有以下模式：

TABLE events
- campaign_id   : int(10)
- city      : varchar(60)
- country_code  : varchar(2)

The following query takes VERY long (> 2000 seconds):

以下查询需要很长时间（> 2000秒）：

SELECT COUNT(*) AS counted_events, country_code
FROM events
WHERE campaign_id` in (597) 
GROUPY BY city, country_code
ORDER BY counted_events

We found out that it's because of the GROUP BY part.

我们发现这是因为GROUP BY部分。

There is already an index idx_campaign_id_city_country_code on (campaign_id, city, country_code) which is used.

已经使用了（campaign_id，city，country_code）上的索引idx_campaign_id_city_country_code。

Maybe someone can suggest a good solution to speed it up?

也许有人可以提出一个很好的解决方案来加速它？

Update:

更新：

'Explain' shows that out of many possible index MySql uses this one: 'idx_campaign_id_city_country_code', for rows it shows: '471304' and for 'Extra' it shows: 'Using where; Using temporary; Using filesort' –

'Explain'表明，在许多可能的索引中，MySql使用了这个：'idx_campaign_id_city_country_code'，对于它显示的行：'471304'和'Extra'它显示：'使用where;使用临时;使用filesort' -

Here is the whole result of EXPLAIN:

这是EXPLAIN的整个结果：

id: '1'
id：'1'
select_type: 'SIMPLE'
select_type：'SIMPLE'
table: 'events'
表：'事件'
type: 'ref'
输入：'ref'
possible_keys: 'index_campaign,idx_campaignid_paid,idx_city_country_code,idx_city_country_code_campaign_id,idx_cid,idx_campaign_id_city_country_code'
possible_keys：'index_campaign，idx_campaignid_paid，idx_city_country_code，idx_city_country_code_campaign_id，idx_cid，idx_campaign_id_city_country_code'
key: 'idx_campaign_id_city_country_code'
key：'idx_campaign_id_city_country_code'
key_len: '4'
key_len：'4'
ref: 'const'
ref：'const'
rows: '471304'
行：'471304'
Extra: 'Using where; Using temporary; Using filesort'
额外：'使用地点;使用临时;使用filesort'

UPDATE:

更新：

Ok, I think it has been solved:

好的，我认为它已经解决了：

Looking at the pasted query here again I realized that I forget to mention here that there was one more column in the SELECT called 'country_name'. So the query was very slow then (including country_name), but I'll just leave it out and now the performance of the query is absolutely ok. Sorry for that mistake!

再次查看粘贴的查询，我意识到我忘记在这里提到SELECT中还有一个名为'country_name'的列。所以查询非常慢（包括country_name），但我会把它留下来，现在查询的性能绝对可以。抱歉，这个错误！

So thank you for all your helpful comments, I'll upvote all the good answers! There were some really helpful additions, that I probably also we apply (like changing types etc).

非常感谢您的所有有用的评论，我将提供所有好的答案！有一些非常有用的补充，我可能也应用（如改变类型等）。

4 个解决方案

#1

without seeing what EXPLAIN says it's a long distance shot, anyway:

无论如何，没有看到EXPLAIN说它是远距离射击：

make an index on (city,country_code)
在（city，country_code）上建立索引
see if there's a way to use partitioning, your table is getting rather huge
看看是否有办法使用分区，你的表变得相当庞大
if country code is always 2 chars change it to char
如果国家代码总是2个字符，则将其更改为char
change numeric indexes to unsigned int
将数字索引更改为unsigned int

post entire EXPLAIN output

发布整个EXPLAIN输出

#2

don't use IN() - better use:

不要使用IN（） - 更好地使用：

WHERE campaign_id = 597
OR campaign_id = 231
OR ....

afaik IN() is very slow.

afaik IN（）很慢。

update: like nik0lias commented - IN() is faster than concatenating OR conditions.

更新：像nik0lias一样评论 - IN（）比连接OR条件更快。

#3

Some ideas:

一些想法：

Given the nature and size of the table it would be a great candidate for partitioned tables by country. This way the events of every country would be stored in a different physical table even if it behaves as a virtual big table

鉴于表格的性质和大小，它将成为按国家/地区分区表格的绝佳选择。这样，每个国家的事件都将存储在不同的物理表中，即使它表现为虚拟大表
Is country code an string? May be you have a country_id that could be easier to sort. (It may force you to create or change indexes)

国家代码是字符串吗？可能你有一个country_id，可以更容易排序。（它可能会强制您创建或更改索引）
Are you really using the city in the group by?

你是真的在小组中使用这座城市吗？

#4

partitioning - especially by country will not help
分区 - 特别是按国家划分也无济于事
column IN (const-list) is not slow, it is in fact a case with special optimization
列IN（const-list）并不慢，实际上是特殊优化的情况

The problem is, that MySQL doesn't use the index for sorting. I cannot say why, because it should. Could be a bug.

问题是，MySQL不使用索引进行排序。我不能说为什么，因为它应该。可能是一个错误。

The best strategy to execute this query is to scan that sub-tree of the index where event_id=597. Since the index is then sorted by city_id, country_code no extra sorting is needed and rows can be counted while scanning.

执行此查询的最佳策略是扫描索引的子树，其中event_id = 597。由于索引按city_id排序，因此country_code不需要额外排序，扫描时可以计算行数。

So the indexes are already optimal for this query. MySQL is just not using them correctly.

因此索引已经是此查询的最佳选择。 MySQL没有正确使用它们。

I'm getting more information off line. It seems this is not a database problem at all, but

我正在离线获取更多信息。看起来这根本不是数据库问题，但是

the schema is not normalized. The table contains not only country_code, but also country_name (this should be in an extra table).
架构未规范化。该表不仅包含country_code，还包含country_name（这应该在一个额外的表中）。
the real query contains country_name in the select list. But since that column is not indexed, MySQL cannot use an index scan.
真实查询在选择列表中包含country_name。但由于该列未编入索引，因此MySQL无法使用索引扫描。

As soon as country_name is dropped from the select list, the query reverts to an index-only scan ("using index" in EXPLAIN output) and is blazingly fast.

从选择列表中删除country_name后，查询将恢复为仅索引扫描（EXPLAIN输出中的“using index”）并且速度非常快。

#1