如何加快这个SQL索引查询？

Given the following SQL table :

给定以下SQL表：

Employee(ssn, name, dept, manager, salary)

员工（ssn，姓名，部门，经理，薪水）

You discover that the following query is significantly slower than expected. There is an index on salary, and you have verified that the query plan is using it.

您发现以下查询明显慢于预期。有一个薪水索引，您已验证查询计划正在使用它。

SELECT * 
FROM Employee
WHERE salary = 48000

Please give a possible reason why this query is slower than expected, and provide a tuning solution that addresses that reason.

请说明此查询比预期慢的可能原因，并提供解决该原因的调优解决方案。

I have two ideas for why this query is slower than expected. One is that we are trying to SELECT * instead of SELECT Employee.salary which would slow down the query as we must search across all columns instead of one. Another idea is that the index on salary is non-clustered, and we want to use a clustered index, as the company could be very large and it would make sense to organize the table by the salary field.

我有两个想法，为什么这个查询比预期慢。一个是我们尝试SELECT *而不是SELECT Employee.salary，这会减慢查询速度，因为我们必须搜索所有列而不是一列。另一个想法是工资指数是非聚集的，我们想要使用聚集索引，因为公司可能非常大，并且通过工资字段组织表是有意义的。

Would either of those two solutions speed up this query? I.e. either change SELECT * to SELECT Employee.salary or explicitly set the index on salary to be clustered?

这两个解决方案中的任何一个都会加速这个查询吗即要么将SELECT *更改为SELECT Employee.salary，要么显式设置要聚集的工资的索引？

2 个解决方案

#1

What indexes do you have now?

你现在有什么指数？

Is it really "slow"? What evidence do you have?

它真的“慢”吗？你有什么证据？

Comments on "SELECT * instead of SELECT Employee.salary" --

评论“SELECT *而不是SELECT Employee.salary” -

* is bad form because tomorrow you might add a column, thereby breaking any code that is expecting a certain number of columns in a certain order.
*是不好的形式，因为明天您可能会添加一列，从而破坏任何期望按特定顺序排列一定数量的列的代码。
Dealing with * versus salary does not happen until after the row(s) is located.
在找到行之后才会处理*与工资的比较。
Locating the row(s) is the costly part.
找到行是昂贵的部分。
On the other hand, if you have INDEX(salary) and only look at salary then the index is "covering". That means that the "data" (the other columns) does not need to be fetched. Hence, faster. But this is probably beyond what your teacher has told you about yet.
另一方面，如果你有INDEX（薪水），只看工资，那么指数就是“覆盖”。这意味着不需要获取“数据”（其他列）。因此，更快。但这可能超出了老师告诉你的内容。

Comments on "the index on salary is non-clustered, and we want to use a clustered index" --

评论“工资指数是非集群的，我们想要使用聚集索引” -

In MySQL (not necessarily in other RDBMSs), InnoDB has exactly one PRIMARY KEY and it is always UNIQUE and "clustered".
在MySQL中（不一定在其他RDBMS中），InnoDB只有一个PRIMARY KEY，它始终是UNIQUE和“clustered”。
That is, "clustered" implies "unique", which seems inappropriate for "salary".
也就是说，“聚集”意味着“独特”，这似乎不适合“薪水”。
In InnoDB a "secondary key" implicitly includes the column(s) of the PK (ssn?), with which it can reach over into the data.
在InnoDB中，“二级密钥”隐含地包括PK（ssn？）的列，它可以用来到达数据中。

"verified that the query plan" -- Have you learned about EXPLAIN SELECT ...?

“验证了查询计划” - 您是否了解了EXPLAIN SELECT ...？

More Tips on creating the optimal index for a given SELECT.

有关为给定SELECT创建最佳索引的更多提示。

#2

I will try to be as simple as I can be ,

我会尽量简单，

You can not simply make salary a clustered index unless you make it a unique or primary which is kind of both stupid and senseless because two person can have same salary.

你不能简单地将薪水作为聚集指数，除非你把它作为一个独特的或主要的，既有愚蠢又无意义，因为两个人可以有相同的工资。

There can be only one clustered index per table according to MYSQL documentation. Database by default elects primary key for being clustered index .

根据MYSQL文档，每个表只能有一个聚簇索引。默认情况下，数据库选择主键作为聚簇索引。

If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.

如果没有为表定义PRIMARY KEY，MySQL将找到第一个UNIQUE索引，其中所有键列都是NOT NULL，而InnoDB将它用作聚簇索引。

To speed up your query I have a few suggestions , go for secondary indexes,

为了加快您的查询，我有一些建议，去二级索引，

If you want to search a salary by direct value then hash based indexes are a better option, if MYSQL supports that already.

如果您想通过直接值搜索薪水，那么基于哈希的索引是更好的选择，如果MYSQL已经支持了。

If you want to search a value using greater than , less than or some range ,then B-tree indexes are better choice.

如果要使用大于，小于或某个范围搜索值，则B树索引是更好的选择。

The first option is faster than the second one , but is limited to only equality operator.

第一个选项比第二个选项更快，但仅限于相等运算符。

Hope it helps.

希望能帮助到你。

#1