为什么索引不能加速查询?

时间:2021-11-19 18:55:50

I have two tables users and posts with 500k records each.

我有两个表格用户和帖子,每个都有500k的记录。

I want to find users who had written between 100 and 200 posts.

我想找到写过100到200篇文章的用户。

My query is:

我查询的方法是:

SELECT u.accountid, COUNT(*)
FROM users u
JOIN posts p
ON u.accountid = p.owneruserid
GROUP BY u.accountid
HAVING COUNT(*) BETWEEN 100 AND 200;

And I get answer in about a second.

我马上就能得到答案。

I added indexes on accountid and owneruserid fields in tables users and posts respectively but the query didn't speed up. Why?

我分别在表用户和post中添加了accountid和owneruserid字段的索引,但是查询没有加快速度。为什么?

2 个解决方案

#1


3  

HAVING COUNT(*) BETWEEN 100 AND 200;

The part is key to explain why indexes are futile.

这部分是解释索引无效的关键。

We need to get only groups where member count between 100 and 200. It means for each group we need exact count of members. The second point we don't have any restrictions (e.g. WHERE section) so to get counts and all groups we need to go through all the records in the table.

我们只需要组成员数在100到200之间。这意味着我们需要对每个组的成员进行精确的计数。第二点,我们没有任何限制(例如WHERE部分),因此要获取计数和所有组,我们需要遍历表中的所有记录。

Indexes e.g. B-Tree index help to find proper element (row) based on index condition. If data is somehow ordered (index provides the order) we can use binary search to find desired subset. But in our case we need to scan all records. So it does not matter whether they ordered or not.

索引,如b树索引,可以根据索引条件找到合适的元素(行)。如果数据以某种方式被排序(索引提供顺序),我们可以使用二进制搜索来查找所需的子集。但是我们需要扫描所有的记录。所以他们是否订购并不重要。

That's why index does not speed up the query.

这就是索引不能加速查询的原因。

#2


1  

You can simplify the query to:

您可以将查询简化为:

SELECT p.owneruserid, COUNT(*)
FROM posts p
GROUP BY p.owneruserid
HAVING COUNT(*) BETWEEN 100 AND 200;

The index on posts(owneruserid) should work for this query. It is a covering index for the query, so the query might be a wee bit faster.

post (owneruserid)上的索引应该适用于此查询。它是查询的覆盖索引,所以查询可能会快一些。

Overall, the query seems to require scanning all the data in posts for the aggregation. The HAVING cannot take advantage of an index. However, the query can use the covering index to reduce I/O.

总的来说,查询似乎需要扫描post中的所有数据以进行聚合。有的人不能利用指数。但是,查询可以使用覆盖索引来减少I/O。

#1


3  

HAVING COUNT(*) BETWEEN 100 AND 200;

The part is key to explain why indexes are futile.

这部分是解释索引无效的关键。

We need to get only groups where member count between 100 and 200. It means for each group we need exact count of members. The second point we don't have any restrictions (e.g. WHERE section) so to get counts and all groups we need to go through all the records in the table.

我们只需要组成员数在100到200之间。这意味着我们需要对每个组的成员进行精确的计数。第二点,我们没有任何限制(例如WHERE部分),因此要获取计数和所有组,我们需要遍历表中的所有记录。

Indexes e.g. B-Tree index help to find proper element (row) based on index condition. If data is somehow ordered (index provides the order) we can use binary search to find desired subset. But in our case we need to scan all records. So it does not matter whether they ordered or not.

索引,如b树索引,可以根据索引条件找到合适的元素(行)。如果数据以某种方式被排序(索引提供顺序),我们可以使用二进制搜索来查找所需的子集。但是我们需要扫描所有的记录。所以他们是否订购并不重要。

That's why index does not speed up the query.

这就是索引不能加速查询的原因。

#2


1  

You can simplify the query to:

您可以将查询简化为:

SELECT p.owneruserid, COUNT(*)
FROM posts p
GROUP BY p.owneruserid
HAVING COUNT(*) BETWEEN 100 AND 200;

The index on posts(owneruserid) should work for this query. It is a covering index for the query, so the query might be a wee bit faster.

post (owneruserid)上的索引应该适用于此查询。它是查询的覆盖索引,所以查询可能会快一些。

Overall, the query seems to require scanning all the data in posts for the aggregation. The HAVING cannot take advantage of an index. However, the query can use the covering index to reduce I/O.

总的来说,查询似乎需要扫描post中的所有数据以进行聚合。有的人不能利用指数。但是,查询可以使用覆盖索引来减少I/O。