使用索引优化SQLite3上的SQL查询

时间:2022-09-17 23:05:13

I'm trying to optimize a SQL Query by creating indexes to have the best performances.

我正在尝试通过创建索引来优化SQL查询以获得最佳性能。

Table definition

CREATE TABLE Mots (
  numero            INTEGER NOT NULL, 
  fk_dictionnaires integer(5) NOT NULL, 
  mot              varchar(50) NOT NULL, 
  ponderation      integer(20) NOT NULL,
  drapeau varchar(1) NOT NULL,
  CONSTRAINT pk_mots PRIMARY KEY(numero),
  CONSTRAINT uk_dico_mot_mots UNIQUE(fk_dictionnaires, mot),
  CONSTRAINT fk_mots_dictionnaires FOREIGN KEY(fk_dictionnaires) REFERENCES Dictionnaires(numero)
  );

Indexes definition

CREATE INDEX idx_dictionnaires ON mots(fk_dictionnaires DESC);
CREATE INDEX idx_mots_ponderation ON mots(ponderation);
CREATE UNIQUE INDEX idx_mots_unique ON mots(fk_dictionnaires, mot);

SQL Query :

SQL查询:

SELECT numero, mot, ponderation, drapeau 
FROM mots 
WHERE mot LIKE 'ar%' 
   AND fk_dictionnaires=1 
   AND LENGTH(mot)>=4 
   ORDER BY ponderation DESC 
LIMIT 5;

Query Plan

0|0|0|SEARCH TABLE mots USING INDEX idx_dictionnaires (fk_dictionnaires=?) (~2 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY

Defined indexes don't seem used and the query lasts (according to the .timer) :

似乎没有使用定义的索引,查询持续(根据.timer):

CPU Time: user 0.078001 sys 0.015600

However, when I removed the fk_dictionnaires=1. My indexes are correctly used and the performances are around 0.000000-0.01XXXXXX sec

但是,当我删除fk_dictionnaires = 1时。我的索引使用正确,性能大约为0.000000-0.01XXXXXX秒

0|0|0|SCAN TABLE mots USING INDEX idx_mots_ponderation (~250000 rows)

I found out some similars questions on * but no anwser help me.

我在*上发现了一些类似的问题,但没有anwser帮助我。

How can I improve the performances by using indexes or/and by changing the SQL Query? Thanks in advance.

如何通过使用索引或/和更改SQL查询来提高性能?提前致谢。

1 个解决方案

#1


5  

SQLite seems to think that the idx_dictionnaires index is very sparse and concludes that if it scans using idx_dictionnaires, it will only have to examine a couple of rows. However, the performance results you quote suggest that it must be examining more than just a couple rows. First, why don't you try ANALYZE mots, so SQLite will have up-to-date information on the cardinality of each index available?

SQLite似乎认为idx_dictionnaires索引非常稀疏,并得出结论,如果它使用idx_dictionnaires扫描,它只需要检查几行。但是,您引用的性能结果表明它必须检查的不仅仅是几行。首先,为什么不尝试ANALYZE mots,所以SQLite会有关于每个索引的基数的最新信息?

Here is something else which might help, from the SQLite documentation:

以下是SQLite文档中可能有用的其他内容:


Terms of the WHERE clause can be manually disqualified for use with indices by prepending a unary + operator to the column name. The unary + is a no-op and will not slow down the evaluation of the test specified by the term. But it will prevent the term from constraining an index. So, in the example above, if the query were rewritten as:

通过在列名前加一元+运算符,可以手动取消WHERE子句的条款以与索引一起使用。一元+是无操作,不会减慢对该术语指定的测试的评估。但它会阻止该术语限制索引。因此,在上面的示例中,如果查询被重写为:

SELECT z FROM ex2 WHERE +x=5 AND y=6;

The + operator on the x column will prevent that term from constraining an index. This would force the use of the ex2i2 index.

x列上的+运算符将阻止该术语约束索引。这将强制使用ex2i2索引。

Note that the unary + operator also removes type affinity from an expression, and in some cases this can cause subtle changes in the meaning of an expression. In the example above, if column x has TEXT affinity then the comparison "x=5" will be done as text. But the + operator removes the affinity. So the comparison "+x=5" will compare the text in column x with the numeric value 5 and will always be false.

请注意,一元+运算符也会从表达式中删除类型相关性,在某些情况下,这会导致表达式含义的细微变化。在上面的示例中,如果列x具有TEXT亲和性,则比较“x = 5”将作为文本完成。但+运算符会删除亲和力。因此,比较“+ x = 5”将比较x列中的文本与数值5,并且始终为false。


If ANALYZE mots isn't enough to help SQLite choose the best index to use, you can use this feature to force it to use the index you want.

如果ANALYZE mots不足以帮助SQLite选择要使用的最佳索引,则可以使用此功能强制它使用所需的索引。

You could also try compound indexes -- it looks like you already defined one on fk_dictionnaires,mot, but SQLite isn't using it. For the "fast" query, SQLite seemed to prefer using the index on ponderation, to avoid sorting the rows at the end of the query. If you add an index on fk_dictionnaires,ponderation DESC, and SQLite actually uses it, it could pick out the rows which match fk_dictionnaires=1 without a table scan and avoid sorting at the end.

您也可以尝试复合索引 - 看起来您已经在fk_dictionnaires,mot上定义了一个,但SQLite没有使用它。对于“快速”查询,SQLite似乎更喜欢使用索引进行思考,以避免在查询结束时对行进行排序。如果你在fk_dictionnaires上添加索引,ponderation DESC和SQLite实际上使用它,它可以选择匹配fk_dictionnaires = 1的行而不进行表扫描,并避免在最后进行排序。


POSTSCRIPT: The compound index I suggested above "fixed" the OP's performance problem, but he also asked how and why it works. @AGeiser, I'll use a brief illustration to try to help you understand DB indexes intuitively:

POSTSCRIPT:上面我建议的复合指数“修复”OP的性能问题,但他也询问了它的工作原理和原因。 @AGeiser,我将使用一个简短的插图来帮助您直观地理解数据库索引:

Imagine you need to find all the people in your town whose surnames start with "A". You have a directory of all the names, but they are in random order. What do you do? You have no choice but to read through the whole directory, and pick out the ones which start with "A". Sounds like a lot of work, right? (This is like a DB table with no indexes.)

想象一下,你需要找到你镇上姓氏以“A”开头的所有人。您有一个包含所有名称的目录,但它们是随机排列的。你是做什么?你别无选择,只能阅读整个目录,并挑选以“A”开头的目录。听起来很多工作,对吧? (这就像没有索引的数据库表。)

But what if somebody gives you a phone book, with all the names in alphabetical order? Now you can just find the first and last entries which start with "A" (using something like a binary search), and take all the entries in that range. You don't have to even look at all the other names in the book. This will be way faster. (This is like a DB table with an index; in this case, call it an index on last_name,first_name.)

但是,如果有人给你一本电话簿,所有的名字都按字母顺序排列怎么办?现在,您可以找到以“A”开头的第一个和最后一个条目(使用二进制搜索之类的东西),并获取该范围内的所有条目。您甚至不必查看本书中的所有其他名称。这会更快。 (这就像带有索引的DB表;在这种情况下,将其称为last_name,first_name上的索引。)

Now what if you want all the people whose names start with "A", but in the case that 2 people have the same name, you want them to be ordered by postal code? Even if you get the needed names quickly using the "phone book" (ie. index on last_name,first_name), you will still have to sort them all manually... so it starts sounding like a lot of work again. What could make this job really easy?

现在如果你想要名字以“A”开头的所有人,但是如果2个人的名字相同,你想要邮政编码吗?即使你使用“电话簿”(即last_name,first_name上的索引)快速获得所需的名称,你仍然需要手动对它们进行排序......所以它开始听起来像是很多工作。什么能让这项工作变得简单?

It would take another "phone book" -- but one in which the entries are ordered first by name, and then by postal code. With a "phone book" like that, you could quickly select the range of entries which you need, and you wouldn't even need to sort them -- they would already be in the desired order. (This is an index on last_name,first_name,postal_code.)

它需要另一个“电话簿” - 但是其中的条目首先按名称排序,然后按邮政编码排序。使用这样的“电话簿”,您可以快速选择所需的条目范围,甚至不需要对它们进行排序 - 它们已经按照所需的顺序排列。 (这是last_name,first_name,postal_code的索引。)

I think this illustration should make it clear how indexes can help SELECT queries, not just by reducing the number of rows which must be examined, but also by (potentially) eliminating the need for a separate "sort" phase after the needed rows are found. Hopefully it also makes it clear that a compound index on a,b is completely different from one on b,a. I could go on giving more "phone book" examples, but this answer would become so long that it would be more like a blog post. To build your intuition on which indexes are likely to benefit a query, I recommend the book from O'Reilly on "SQL Antipatterns" (especially chapter 13, "Index Shotgun").

我认为这个例子应该清楚说明索引如何帮助SELECT查询,而不仅仅是通过减少必须检查的行数,而且还可以(通过)消除在找到所需行之后单独的“排序”阶段的需要。希望它也清楚地表明a,b上的复合索引与b上的复合索引完全不同,a。我可以继续提供更多的“电话簿”示例,但这个答案会变得太长,以至于它更像是一篇博文。为了建立你对哪些索引可能有益于查询的直觉,我推荐O'Reilly关于“SQL Antipatterns”的书(特别是第13章,“Index Shotgun”)。

#1


5  

SQLite seems to think that the idx_dictionnaires index is very sparse and concludes that if it scans using idx_dictionnaires, it will only have to examine a couple of rows. However, the performance results you quote suggest that it must be examining more than just a couple rows. First, why don't you try ANALYZE mots, so SQLite will have up-to-date information on the cardinality of each index available?

SQLite似乎认为idx_dictionnaires索引非常稀疏,并得出结论,如果它使用idx_dictionnaires扫描,它只需要检查几行。但是,您引用的性能结果表明它必须检查的不仅仅是几行。首先,为什么不尝试ANALYZE mots,所以SQLite会有关于每个索引的基数的最新信息?

Here is something else which might help, from the SQLite documentation:

以下是SQLite文档中可能有用的其他内容:


Terms of the WHERE clause can be manually disqualified for use with indices by prepending a unary + operator to the column name. The unary + is a no-op and will not slow down the evaluation of the test specified by the term. But it will prevent the term from constraining an index. So, in the example above, if the query were rewritten as:

通过在列名前加一元+运算符,可以手动取消WHERE子句的条款以与索引一起使用。一元+是无操作,不会减慢对该术语指定的测试的评估。但它会阻止该术语限制索引。因此,在上面的示例中,如果查询被重写为:

SELECT z FROM ex2 WHERE +x=5 AND y=6;

The + operator on the x column will prevent that term from constraining an index. This would force the use of the ex2i2 index.

x列上的+运算符将阻止该术语约束索引。这将强制使用ex2i2索引。

Note that the unary + operator also removes type affinity from an expression, and in some cases this can cause subtle changes in the meaning of an expression. In the example above, if column x has TEXT affinity then the comparison "x=5" will be done as text. But the + operator removes the affinity. So the comparison "+x=5" will compare the text in column x with the numeric value 5 and will always be false.

请注意,一元+运算符也会从表达式中删除类型相关性,在某些情况下,这会导致表达式含义的细微变化。在上面的示例中,如果列x具有TEXT亲和性,则比较“x = 5”将作为文本完成。但+运算符会删除亲和力。因此,比较“+ x = 5”将比较x列中的文本与数值5,并且始终为false。


If ANALYZE mots isn't enough to help SQLite choose the best index to use, you can use this feature to force it to use the index you want.

如果ANALYZE mots不足以帮助SQLite选择要使用的最佳索引,则可以使用此功能强制它使用所需的索引。

You could also try compound indexes -- it looks like you already defined one on fk_dictionnaires,mot, but SQLite isn't using it. For the "fast" query, SQLite seemed to prefer using the index on ponderation, to avoid sorting the rows at the end of the query. If you add an index on fk_dictionnaires,ponderation DESC, and SQLite actually uses it, it could pick out the rows which match fk_dictionnaires=1 without a table scan and avoid sorting at the end.

您也可以尝试复合索引 - 看起来您已经在fk_dictionnaires,mot上定义了一个,但SQLite没有使用它。对于“快速”查询,SQLite似乎更喜欢使用索引进行思考,以避免在查询结束时对行进行排序。如果你在fk_dictionnaires上添加索引,ponderation DESC和SQLite实际上使用它,它可以选择匹配fk_dictionnaires = 1的行而不进行表扫描,并避免在最后进行排序。


POSTSCRIPT: The compound index I suggested above "fixed" the OP's performance problem, but he also asked how and why it works. @AGeiser, I'll use a brief illustration to try to help you understand DB indexes intuitively:

POSTSCRIPT:上面我建议的复合指数“修复”OP的性能问题,但他也询问了它的工作原理和原因。 @AGeiser,我将使用一个简短的插图来帮助您直观地理解数据库索引:

Imagine you need to find all the people in your town whose surnames start with "A". You have a directory of all the names, but they are in random order. What do you do? You have no choice but to read through the whole directory, and pick out the ones which start with "A". Sounds like a lot of work, right? (This is like a DB table with no indexes.)

想象一下,你需要找到你镇上姓氏以“A”开头的所有人。您有一个包含所有名称的目录,但它们是随机排列的。你是做什么?你别无选择,只能阅读整个目录,并挑选以“A”开头的目录。听起来很多工作,对吧? (这就像没有索引的数据库表。)

But what if somebody gives you a phone book, with all the names in alphabetical order? Now you can just find the first and last entries which start with "A" (using something like a binary search), and take all the entries in that range. You don't have to even look at all the other names in the book. This will be way faster. (This is like a DB table with an index; in this case, call it an index on last_name,first_name.)

但是,如果有人给你一本电话簿,所有的名字都按字母顺序排列怎么办?现在,您可以找到以“A”开头的第一个和最后一个条目(使用二进制搜索之类的东西),并获取该范围内的所有条目。您甚至不必查看本书中的所有其他名称。这会更快。 (这就像带有索引的DB表;在这种情况下,将其称为last_name,first_name上的索引。)

Now what if you want all the people whose names start with "A", but in the case that 2 people have the same name, you want them to be ordered by postal code? Even if you get the needed names quickly using the "phone book" (ie. index on last_name,first_name), you will still have to sort them all manually... so it starts sounding like a lot of work again. What could make this job really easy?

现在如果你想要名字以“A”开头的所有人,但是如果2个人的名字相同,你想要邮政编码吗?即使你使用“电话簿”(即last_name,first_name上的索引)快速获得所需的名称,你仍然需要手动对它们进行排序......所以它开始听起来像是很多工作。什么能让这项工作变得简单?

It would take another "phone book" -- but one in which the entries are ordered first by name, and then by postal code. With a "phone book" like that, you could quickly select the range of entries which you need, and you wouldn't even need to sort them -- they would already be in the desired order. (This is an index on last_name,first_name,postal_code.)

它需要另一个“电话簿” - 但是其中的条目首先按名称排序,然后按邮政编码排序。使用这样的“电话簿”,您可以快速选择所需的条目范围,甚至不需要对它们进行排序 - 它们已经按照所需的顺序排列。 (这是last_name,first_name,postal_code的索引。)

I think this illustration should make it clear how indexes can help SELECT queries, not just by reducing the number of rows which must be examined, but also by (potentially) eliminating the need for a separate "sort" phase after the needed rows are found. Hopefully it also makes it clear that a compound index on a,b is completely different from one on b,a. I could go on giving more "phone book" examples, but this answer would become so long that it would be more like a blog post. To build your intuition on which indexes are likely to benefit a query, I recommend the book from O'Reilly on "SQL Antipatterns" (especially chapter 13, "Index Shotgun").

我认为这个例子应该清楚说明索引如何帮助SELECT查询,而不仅仅是通过减少必须检查的行数,而且还可以(通过)消除在找到所需行之后单独的“排序”阶段的需要。希望它也清楚地表明a,b上的复合索引与b上的复合索引完全不同,a。我可以继续提供更多的“电话簿”示例,但这个答案会变得太长,以至于它更像是一篇博文。为了建立你对哪些索引可能有益于查询的直觉,我推荐O'Reilly关于“SQL Antipatterns”的书(特别是第13章,“Index Shotgun”)。