如何让MySQL使用索引进行视图查询?

时间:2022-04-05 00:09:39

I'm working on a web project with MySql database on Java EE. We needed a view to summarize data from 3 tables with over 3M rows overall. Each table was created with index. But I haven't found out a way to take advantages in the indexes in the conditional select statement retrieval from the view that we created with [group by].

我正在做一个关于Java EE的MySql数据库的web项目。我们需要一个视图来汇总来自3个表的数据,总共有超过3百万行。每个表都是用索引创建的。但是我还没有找到从[group by]创建的视图中利用条件选择语句检索中的索引的方法。

I've getting suggestions from people that using views in MySql is not a good idea. Because you can't create index for views in mysql like in oracle. But in some test that I took, indexes can be used in view select statement. Maybe I've created those views in a wrong way.

有人建议我在MySql中使用视图不是一个好主意。因为你不能像oracle那样为mysql中的视图创建索引。但是在我进行的一些测试中,索引可以在view select语句中使用。也许我用错误的方式创建了这些视图。

I'll use a example to describe my problem.

我将用一个例子来描述我的问题。

We have a table that records data for high scores in NBA games, with index on column [happend_in]

我们有一张记录NBA比赛高分数据的表格,索引在列[happend_in]

CREATE  TABLE `highscores` (
   `tbl_id` int(11) NOT NULL auto_increment,
   `happened_in` int(4) default NULL,
   `player` int(3) default NULL,
   `score` int(3) default NULL,
   PRIMARY KEY  (`tbl_id`),
   KEY `index_happened_in` (`happened_in`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert data(8 rows)

插入数据(第8行)

INSERT INTO highscores(happened_in, player, score)
VALUES (2006, 24, 61),(2006, 24, 44),(2006, 24, 81),
(1998, 23, 51),(1997, 23, 46),(2006, 3, 55),(2007, 24, 34), (2008, 24, 37);

then I create a view to see the highest score that Kobe Bryant got in each year

然后我创造了一个视角去观察科比每年的最高得分

CREATE OR REPLACE VIEW v_kobe_highScores
AS
   SELECT player, max(score) AS highest_score, happened_in
   FROM highscores
   WHERE player = 24
   GROUP BY happened_in;

I wrote a conditional statement to see the highest score that kobe got in 2006;

我写了一个有条件的声明,看科比在2006年获得的最高分数;

select * from v_kobe_highscores where happened_in = 2006;

When I explain it in toad for mysql, I found out that mysql have scan all rows to form the view, then find data with condition in it, without using index on [happened_in].

当我在toad中为mysql解释它时,我发现mysql已经扫描了所有的行来形成视图,然后在其中查找有条件的数据,而没有使用[碰巧]上的索引。

explain select * from v_kobe_highscores where happened_in = 2006;

如何让MySQL使用索引进行视图查询?

The view that we use in our project is built among tables with millions of rows. Scanning all the rows from table in every view data retrieval is unacceptable. Please help! Thanks!

我们在项目中使用的视图是在具有数百万行的表之间构建的。在每个视图数据检索中扫描表中的所有行是不可接受的。请帮助!谢谢!

@zerkms Here is the result I tested on real-life. I don't see much differences between. I think @spencer7593 has the right point. The MySQL optimizer doesn't "push" that predicate down in the view query. 如何让MySQL使用索引进行视图查询?

@zerkms是我在现实生活中测试的结果。我看不出有什么差别。我认为@spencer7593说得对。MySQL优化器不会“推”视图查询中的谓词。

3 个解决方案

#1


32  

How do you get MySQL to use an index for a view query? The short answer, provide an index that MySQL can use.

如何让MySQL使用视图查询的索引?简而言之,提供MySQL可以使用的索引。

In this case, the optimum index is likely a "covering" index:

在这种情况下,最佳指数可能是“覆盖”指数:

... ON highscores (player, happened_in, score)

It's likely that MySQL will use that index, and the EXPLAIN will show: "Using index" due to the WHERE player = 24 (an equality predicate on the leading column in the index. The GROUP BY happened_id (the second column in the index), may allow MySQL to optimize that using the index to avoid a sort operation. Including the score column in the index will allow the query to satisfied entirely from the index, without having to visit (lookup) the data pages referenced by the index.

MySQL很可能会使用该索引,解释将显示:“使用索引”,因为player = 24(在索引的前一列上的等号谓词)。这个GROUP BY events _id(索引中的第二列)可能允许MySQL优化使用索引来避免排序操作。在索引中包含score列将使查询完全满足索引,而不必访问(查找)索引引用的数据页。

That's the quick answer. The longer answer is that MySQL is very unlikely to use an index with leading column of happened_id for the view query.

快速回答。更长的答案是,MySQL不太可能在视图查询中使用带前面列的索引碰巧。


Why the view causes a performance issue

为什么视图会导致性能问题

One of the issues you have with the MySQL view is that MySQL does not "push" the predicate from the outer query down into the view query.

MySQL视图的问题之一是,MySQL不会将谓词从外部查询“推入”到视图查询中。

Your outer query specifies WHERE happened_in = 2006. The MySQL optimizer does not consider the predicate when it runs the inner "view query". That query for the view gets executed separately, before the outer query. The resultset from the execution of that query get "materialized"; that is, the results are stored as an intermediate MyISAM table. (MySQL calls it a "derived table", and that name they use makes sense, when you understand the operations that MysQL performs.)

您的外部查询指定了在何处发生的_in = 2006。MySQL优化器在运行内部“视图查询”时不考虑谓词。视图的查询在外部查询之前分别执行。执行该查询的结果集被“具体化”;也就是说,结果存储为中间的ismyam表。(MySQL将其称为“派生表”,当您理解MySQL执行的操作时,他们使用的名称是有意义的。)

The bottom line is that the index you have defined on happened_in is not being used by MySQL when it rusn the query that forms the view definition.

最重要的是,当MySQL解析构成视图定义的查询时,您在碰巧的事件中定义的索引并没有被使用。

After the intermediate "derived table" is created, THEN the outer query is executed, using that "derived table" as a rowsource. It's when that outer query runs that the happened_in = 2006 predicate is evaluated.

在创建中间的“派生表”之后,将使用该“派生表”作为行源执行外部查询。当外部查询运行时,对碰巧= 2006的谓词进行计算。

Note that all of the rows from the view query are stored, which (in your case) is a row for EVERY value of happened_in, not just the one you specify an equality predicate on in the outer query.

请注意,视图查询中的所有行都被存储了,这(在您的例子中)是每一个evented_in值的行,而不仅仅是您在外部查询中指定相等谓词的行。

The way that view queries are processed may be "unexpected" by some, and this is one reason that using "views" in MySQL can lead to performance problems, as compared to the way view queries are processed by other relational databases.

视图查询的处理方式可能会“出乎意料”,这也是为什么在MySQL中使用“视图”会导致性能问题的原因之一,与其他关系数据库处理视图查询的方式相比。


Improving performance of the view query with a suitable covering index

用适当的覆盖索引改进视图查询的性能。

Given your view definition and your query, about the best you are going to get would be a "Using index" access method for the view query. To get that, you'd need a covering index, e.g.

根据视图定义和查询,您将得到的最佳结果是视图查询的“使用索引”访问方法。要得到它,你需要一个覆盖指数。

... ON highscores (player, happened_in, score).

That's likely to be the most beneficial index (performance wise) for your existing view definition and your existing query. The player column is the leading column because you have an equality predicate on that column in the view query. The happened_in column is next, because you've got a GROUP BY operation on that column, and MySQL is going to be able to use this index to optimize the GROUP BY operation. We also include the score column, because that is the only other column referenced in your query. That makes the index a "covering" index, because MySQL can satisfy that query directly from index pages, without a need to visit any pages in the underlying table. And that's as good as we're going to get out of that query plan: "Using index" with no "Using filesort".

这可能是对现有视图定义和现有查询最有利的索引(性能方面的索引)。player列是前导列,因为在view查询中该列上有一个等号谓词。接下来发生的是在列中发生的事件,因为在这一列上有一个GROUP BY操作,MySQL将能够使用这个索引来优化GROUP BY操作。我们还包括score列,因为这是查询中引用的惟一其他列。这使得索引成为“覆盖”索引,因为MySQL可以直接满足来自索引页的查询,而不需要访问底层表中的任何页面。这就像我们从查询计划中得到的一样:“使用索引”而不是“使用文件排序”。


Compare performance to standalone query with no derived table

将性能与没有派生表的独立查询进行比较。

You could compare the execution plan for your query against the view vs. an equivalent standalone query:

您可以将查询的执行计划与视图和等效的独立查询进行比较:

SELECT player
     , MAX(score) AS highest_score
     , happened_in
 FROM highscores
WHERE player = 24
  AND happened_in = 2006
GROUP
   BY player
    , happened_in

The standalone query can also make use of a covering index e.g.

独立查询还可以使用覆盖索引。

... ON highscores (player, happened_in, score)

but without a need to materialize an intermediate MyISAM table.

但是不需要实现中间的MyISAM表。


I am not sure that any of the previous provides a direct answer to the question you were asking.

我不确定之前的任何一个问题是否能直接回答你的问题。

Q: How do I get MySQL to use an INDEX for view query?

问:如何让MySQL使用索引进行视图查询?

A: Define a suitable INDEX that the view query can use.

定义视图查询可以使用的合适索引。

The short answer is provide a "covering index" (index includes all columns referenced in the view query). The leading columns in that index should be the columns that are referenced with equality predicates (in your case, the column player would be a leading column because you have a player = 24 predicate in the query. Also, the columns referenced in the GROUP BY should be leading columns in the index, which allows MySQL to optimize the GROUP BY operation, by making use of the index rather than using a sort operation.

简短的答案是提供一个“覆盖索引”(索引包含视图查询中引用的所有列)。该索引中的主要列应该是使用等式谓词引用的列(在您的情况中,列player将是一个主要列,因为在查询中有一个player = 24谓词)。而且,组中引用的列应该是索引中的前导列,这允许MySQL通过操作来优化组,方法是使用索引而不是使用排序操作。

The key point here is that the view query is basically a standalone query; the results from that query get stored in an intermediate "derived" table (a MyISAM table that gets created when a query against the view gets run.

这里的关键点是视图查询基本上是一个独立的查询;查询的结果存储在中间的“派生”表中(MyISAM表,在运行针对视图的查询时创建这个表。

Using views in MySQL is not necessarily a "bad idea", but I would strongly caution those who choose to use views within MySQL to be AWARE of how MySQL processes queries that reference those views. And the way MySQL processes view queries differs (significantly) from the way view queries are handled by other databases (e.g. Oracle, SQL Server).

使用MySQL中的视图并不一定是一个“坏主意”,但我强烈建议那些选择使用MySQL中的视图的人要注意MySQL是如何处理引用这些视图的查询的。而且,MySQL处理视图查询的方式与其他数据库(例如Oracle、SQL Server)处理视图查询的方式是不同的。

#2


2  

Creating the composite index with player + happened_in (in this particular order) columns is the best you can do in this case.

在这个例子中,用player +发生的事件(在这个特定的顺序)中创建复合索引是最好的。

PS: don't test mysql optimizer behaviour on such small amount of rows, because it's likely to prefer fullscan over indexes. If you want to see what will happen in real life - fill it with real life-alike amount of data.

PS:不要在这么少的行上测试mysql优化器的行为,因为它可能更喜欢全扫描而不是索引。如果你想看看现实生活中会发生什么——用真实的生活数据来填满它。

#3


0  

This doesn't directly answer the question, but it is a directly related workaround for others running into this issue. This achieves the same benefits of using a view, while minimizing the disadvantages.

这并不能直接回答这个问题,但对于遇到这个问题的其他人来说,这是一个直接相关的解决方案。这样做的好处与使用视图相同,同时最小化了缺点。

I setup a PHP function to which I can send parameters, things to push into the inside to maximize index usage, rather than using them in a join or where clause outside a view. In the function you can formulate the SQL syntax for a derived table, and return that syntax. Then in the calling program, you can do something like this:

我设置了一个PHP函数,我可以向它发送参数,这些东西可以推入到内部以最大化索引的使用,而不是在视图之外的连接或where子句中使用它们。在函数中,您可以为派生表构造SQL语法,并返回该语法。在调用程序中,你可以这样做:

$table = tablesyntax(parameters);
select field1, field2 from {$table} as x... + other SQL

Thus you get the encapsulation benefits of the view, the ability to call it as if it is a view, but not the index limitations.

因此,您可以获得视图的封装优势,即调用它的能力,就好像它是一个视图,而不是索引限制。

#1


32  

How do you get MySQL to use an index for a view query? The short answer, provide an index that MySQL can use.

如何让MySQL使用视图查询的索引?简而言之,提供MySQL可以使用的索引。

In this case, the optimum index is likely a "covering" index:

在这种情况下,最佳指数可能是“覆盖”指数:

... ON highscores (player, happened_in, score)

It's likely that MySQL will use that index, and the EXPLAIN will show: "Using index" due to the WHERE player = 24 (an equality predicate on the leading column in the index. The GROUP BY happened_id (the second column in the index), may allow MySQL to optimize that using the index to avoid a sort operation. Including the score column in the index will allow the query to satisfied entirely from the index, without having to visit (lookup) the data pages referenced by the index.

MySQL很可能会使用该索引,解释将显示:“使用索引”,因为player = 24(在索引的前一列上的等号谓词)。这个GROUP BY events _id(索引中的第二列)可能允许MySQL优化使用索引来避免排序操作。在索引中包含score列将使查询完全满足索引,而不必访问(查找)索引引用的数据页。

That's the quick answer. The longer answer is that MySQL is very unlikely to use an index with leading column of happened_id for the view query.

快速回答。更长的答案是,MySQL不太可能在视图查询中使用带前面列的索引碰巧。


Why the view causes a performance issue

为什么视图会导致性能问题

One of the issues you have with the MySQL view is that MySQL does not "push" the predicate from the outer query down into the view query.

MySQL视图的问题之一是,MySQL不会将谓词从外部查询“推入”到视图查询中。

Your outer query specifies WHERE happened_in = 2006. The MySQL optimizer does not consider the predicate when it runs the inner "view query". That query for the view gets executed separately, before the outer query. The resultset from the execution of that query get "materialized"; that is, the results are stored as an intermediate MyISAM table. (MySQL calls it a "derived table", and that name they use makes sense, when you understand the operations that MysQL performs.)

您的外部查询指定了在何处发生的_in = 2006。MySQL优化器在运行内部“视图查询”时不考虑谓词。视图的查询在外部查询之前分别执行。执行该查询的结果集被“具体化”;也就是说,结果存储为中间的ismyam表。(MySQL将其称为“派生表”,当您理解MySQL执行的操作时,他们使用的名称是有意义的。)

The bottom line is that the index you have defined on happened_in is not being used by MySQL when it rusn the query that forms the view definition.

最重要的是,当MySQL解析构成视图定义的查询时,您在碰巧的事件中定义的索引并没有被使用。

After the intermediate "derived table" is created, THEN the outer query is executed, using that "derived table" as a rowsource. It's when that outer query runs that the happened_in = 2006 predicate is evaluated.

在创建中间的“派生表”之后,将使用该“派生表”作为行源执行外部查询。当外部查询运行时,对碰巧= 2006的谓词进行计算。

Note that all of the rows from the view query are stored, which (in your case) is a row for EVERY value of happened_in, not just the one you specify an equality predicate on in the outer query.

请注意,视图查询中的所有行都被存储了,这(在您的例子中)是每一个evented_in值的行,而不仅仅是您在外部查询中指定相等谓词的行。

The way that view queries are processed may be "unexpected" by some, and this is one reason that using "views" in MySQL can lead to performance problems, as compared to the way view queries are processed by other relational databases.

视图查询的处理方式可能会“出乎意料”,这也是为什么在MySQL中使用“视图”会导致性能问题的原因之一,与其他关系数据库处理视图查询的方式相比。


Improving performance of the view query with a suitable covering index

用适当的覆盖索引改进视图查询的性能。

Given your view definition and your query, about the best you are going to get would be a "Using index" access method for the view query. To get that, you'd need a covering index, e.g.

根据视图定义和查询,您将得到的最佳结果是视图查询的“使用索引”访问方法。要得到它,你需要一个覆盖指数。

... ON highscores (player, happened_in, score).

That's likely to be the most beneficial index (performance wise) for your existing view definition and your existing query. The player column is the leading column because you have an equality predicate on that column in the view query. The happened_in column is next, because you've got a GROUP BY operation on that column, and MySQL is going to be able to use this index to optimize the GROUP BY operation. We also include the score column, because that is the only other column referenced in your query. That makes the index a "covering" index, because MySQL can satisfy that query directly from index pages, without a need to visit any pages in the underlying table. And that's as good as we're going to get out of that query plan: "Using index" with no "Using filesort".

这可能是对现有视图定义和现有查询最有利的索引(性能方面的索引)。player列是前导列,因为在view查询中该列上有一个等号谓词。接下来发生的是在列中发生的事件,因为在这一列上有一个GROUP BY操作,MySQL将能够使用这个索引来优化GROUP BY操作。我们还包括score列,因为这是查询中引用的惟一其他列。这使得索引成为“覆盖”索引,因为MySQL可以直接满足来自索引页的查询,而不需要访问底层表中的任何页面。这就像我们从查询计划中得到的一样:“使用索引”而不是“使用文件排序”。


Compare performance to standalone query with no derived table

将性能与没有派生表的独立查询进行比较。

You could compare the execution plan for your query against the view vs. an equivalent standalone query:

您可以将查询的执行计划与视图和等效的独立查询进行比较:

SELECT player
     , MAX(score) AS highest_score
     , happened_in
 FROM highscores
WHERE player = 24
  AND happened_in = 2006
GROUP
   BY player
    , happened_in

The standalone query can also make use of a covering index e.g.

独立查询还可以使用覆盖索引。

... ON highscores (player, happened_in, score)

but without a need to materialize an intermediate MyISAM table.

但是不需要实现中间的MyISAM表。


I am not sure that any of the previous provides a direct answer to the question you were asking.

我不确定之前的任何一个问题是否能直接回答你的问题。

Q: How do I get MySQL to use an INDEX for view query?

问:如何让MySQL使用索引进行视图查询?

A: Define a suitable INDEX that the view query can use.

定义视图查询可以使用的合适索引。

The short answer is provide a "covering index" (index includes all columns referenced in the view query). The leading columns in that index should be the columns that are referenced with equality predicates (in your case, the column player would be a leading column because you have a player = 24 predicate in the query. Also, the columns referenced in the GROUP BY should be leading columns in the index, which allows MySQL to optimize the GROUP BY operation, by making use of the index rather than using a sort operation.

简短的答案是提供一个“覆盖索引”(索引包含视图查询中引用的所有列)。该索引中的主要列应该是使用等式谓词引用的列(在您的情况中,列player将是一个主要列,因为在查询中有一个player = 24谓词)。而且,组中引用的列应该是索引中的前导列,这允许MySQL通过操作来优化组,方法是使用索引而不是使用排序操作。

The key point here is that the view query is basically a standalone query; the results from that query get stored in an intermediate "derived" table (a MyISAM table that gets created when a query against the view gets run.

这里的关键点是视图查询基本上是一个独立的查询;查询的结果存储在中间的“派生”表中(MyISAM表,在运行针对视图的查询时创建这个表。

Using views in MySQL is not necessarily a "bad idea", but I would strongly caution those who choose to use views within MySQL to be AWARE of how MySQL processes queries that reference those views. And the way MySQL processes view queries differs (significantly) from the way view queries are handled by other databases (e.g. Oracle, SQL Server).

使用MySQL中的视图并不一定是一个“坏主意”,但我强烈建议那些选择使用MySQL中的视图的人要注意MySQL是如何处理引用这些视图的查询的。而且,MySQL处理视图查询的方式与其他数据库(例如Oracle、SQL Server)处理视图查询的方式是不同的。

#2


2  

Creating the composite index with player + happened_in (in this particular order) columns is the best you can do in this case.

在这个例子中,用player +发生的事件(在这个特定的顺序)中创建复合索引是最好的。

PS: don't test mysql optimizer behaviour on such small amount of rows, because it's likely to prefer fullscan over indexes. If you want to see what will happen in real life - fill it with real life-alike amount of data.

PS:不要在这么少的行上测试mysql优化器的行为,因为它可能更喜欢全扫描而不是索引。如果你想看看现实生活中会发生什么——用真实的生活数据来填满它。

#3


0  

This doesn't directly answer the question, but it is a directly related workaround for others running into this issue. This achieves the same benefits of using a view, while minimizing the disadvantages.

这并不能直接回答这个问题,但对于遇到这个问题的其他人来说,这是一个直接相关的解决方案。这样做的好处与使用视图相同,同时最小化了缺点。

I setup a PHP function to which I can send parameters, things to push into the inside to maximize index usage, rather than using them in a join or where clause outside a view. In the function you can formulate the SQL syntax for a derived table, and return that syntax. Then in the calling program, you can do something like this:

我设置了一个PHP函数,我可以向它发送参数,这些东西可以推入到内部以最大化索引的使用,而不是在视图之外的连接或where子句中使用它们。在函数中,您可以为派生表构造SQL语法,并返回该语法。在调用程序中,你可以这样做:

$table = tablesyntax(parameters);
select field1, field2 from {$table} as x... + other SQL

Thus you get the encapsulation benefits of the view, the ability to call it as if it is a view, but not the index limitations.

因此,您可以获得视图的封装优势,即调用它的能力,就好像它是一个视图,而不是索引限制。