如何创建SQL索引以提高ORDER BY性能

时间:2022-06-01 17:46:05

I have some SQL similar to the following, which joins four tables and then orders the results by the "status" column of the first:

我有一些类似于以下的SQL,它连接四个表,然后通过第一个的“status”列对结果进行排序:

SELECT * 
 FROM a, b, c, d 
 WHERE b.aid=a.id AND c.id=a.cid AND a.did=d.id AND a.did='XXX'
 ORDER BY a.status

It works. However, it's slow. I've worked out this is because of the ORDER BY clause and the lack of any index on table "a".

有用。但是,它很慢。我弄清楚这是因为ORDER BY子句和表“a”上缺少任何索引。

All four tables have the PRIMARY KEYs set on the "id" column.

所有四个表都在“id”列上设置了PRIMARY KEY。

So, I know I need to add an index to table a which includes the "status" column but what else does it need to include? Should "bid", "cid" and "did" be in there too?

所以,我知道我需要在表a中添加一个索引,其中包含“status”列,但还需要包含哪些内容呢?应该“竞标”,“cid”和“did”也在那里吗?

I've tried to ask this in a general SQL sense but, if it's important, the target is SQLite for use with Gears.

我试图在一般的SQL意义上问这个问题,但是,如果它很重要,那么目标就是用于Gears的SQLite。

Thanks in advance,

提前致谢,

Jake (noob)

杰克(诺布)

3 个解决方案

#1


4  

I would say it's slow because the engine is doing scans all over the place instead of seeks. Did you mean to do SELECT a.* instead? That would be faster as well, SELECT * here is equivalent to a.*, b.*, c.*, d.*.

我会说它很慢,因为引擎在整个地方进行扫描而不是寻找。你的意思是做一个SELECT。*而不是吗?那也会更快,这里的SELECT *相当于。*,b。*,c。*,d。*。

You will probably get better results if you put a separate index on each of these columns:

如果在每个列上放置一个单独的索引,您可能会得到更好的结果:

  • a.did (so that a.did = 'XXX' is a seek instead of a scan, also helps a.did = d.id)
  • a.did(这样a.did ='XXX'是搜索而不是扫描,也有助于a.did = d.id)
  • a.cid (for a.cid = c.id)
  • a.cid(对于a.cid = c.id)
  • b.aid (for a.id = b.aid)
  • b.aid(对于a.id = b.aid)

You could try adding Status to the first and second indexes with ASCENDING order, for additional performance - it doesn't hurt.

您可以尝试使用ASCENDING订单将状态添加到第一个和第二个索引,以获得额外的性能 - 它不会造成伤害。

#2


0  

I'd be curious as to how you worked out that the problem is 'the ORDER BY clause and the lack of any index on table "a".' I find this a little suspicious because there is an index on table a, on the primary key, you later say.

我很好奇你是怎么解决问题是'ORDER BY子句和表上缺少任何索引“a”。我发现这有点可疑,因为主要密钥上的表a上有一个索引,您稍后会说。

Looking at the nature of the query and what I can guess about the nature of the data, I would think that this query would generally produce relatively few results compared to the size of the tables it's using, and that thus the ORDER BY would be extremely cheap. Of course, this is just a guess.

看一下查询的本质以及我可以猜测数据的性质,我认为这个查询通常会产生与它所使用的表大小相比较少的结果,因此ORDER BY会非常低廉。当然,这只是猜测。

Whether an index will even help at all is dependent on the data in the table. What indices your query optimizer will use when doing a query is dependent on a lot of different factors, one of the big ones being the expected number of results produced from a lookup.

索引是否甚至可以帮助取决于表中的数据。查询优化器在执行查询时将使用哪些索引取决于许多不同的因素,其中一个重要因素是查找生成的预期结果数。

One thing that would help a lot is if you would post the output of EXPLAINing your query.

如果您要发布EXPLAINing查询的输出,那么有一件事情会有所帮助。

#3


0  

have you tried joins?

你试过加入吗?

select * from a inner join b on a.id = b.aid inner join c on a.cid = c.id inner join d on a.did=d.id where a.did='XXX' ORDER BY a.status

从A.did = d.id上的a.cid = c.id内部联接d上的a.id = b.aid内部联接c中的内部联接b中选择*其中a.did ='XXX'ORDER BY a.status

the correct use of joins (left, richt, inner, outer) depends on structure of tables

正确使用连接(left,richt,inner,outer)取决于表的结构

hope this helps

希望这可以帮助

#1


4  

I would say it's slow because the engine is doing scans all over the place instead of seeks. Did you mean to do SELECT a.* instead? That would be faster as well, SELECT * here is equivalent to a.*, b.*, c.*, d.*.

我会说它很慢,因为引擎在整个地方进行扫描而不是寻找。你的意思是做一个SELECT。*而不是吗?那也会更快,这里的SELECT *相当于。*,b。*,c。*,d。*。

You will probably get better results if you put a separate index on each of these columns:

如果在每个列上放置一个单独的索引,您可能会得到更好的结果:

  • a.did (so that a.did = 'XXX' is a seek instead of a scan, also helps a.did = d.id)
  • a.did(这样a.did ='XXX'是搜索而不是扫描,也有助于a.did = d.id)
  • a.cid (for a.cid = c.id)
  • a.cid(对于a.cid = c.id)
  • b.aid (for a.id = b.aid)
  • b.aid(对于a.id = b.aid)

You could try adding Status to the first and second indexes with ASCENDING order, for additional performance - it doesn't hurt.

您可以尝试使用ASCENDING订单将状态添加到第一个和第二个索引,以获得额外的性能 - 它不会造成伤害。

#2


0  

I'd be curious as to how you worked out that the problem is 'the ORDER BY clause and the lack of any index on table "a".' I find this a little suspicious because there is an index on table a, on the primary key, you later say.

我很好奇你是怎么解决问题是'ORDER BY子句和表上缺少任何索引“a”。我发现这有点可疑,因为主要密钥上的表a上有一个索引,您稍后会说。

Looking at the nature of the query and what I can guess about the nature of the data, I would think that this query would generally produce relatively few results compared to the size of the tables it's using, and that thus the ORDER BY would be extremely cheap. Of course, this is just a guess.

看一下查询的本质以及我可以猜测数据的性质,我认为这个查询通常会产生与它所使用的表大小相比较少的结果,因此ORDER BY会非常低廉。当然,这只是猜测。

Whether an index will even help at all is dependent on the data in the table. What indices your query optimizer will use when doing a query is dependent on a lot of different factors, one of the big ones being the expected number of results produced from a lookup.

索引是否甚至可以帮助取决于表中的数据。查询优化器在执行查询时将使用哪些索引取决于许多不同的因素,其中一个重要因素是查找生成的预期结果数。

One thing that would help a lot is if you would post the output of EXPLAINing your query.

如果您要发布EXPLAINing查询的输出,那么有一件事情会有所帮助。

#3


0  

have you tried joins?

你试过加入吗?

select * from a inner join b on a.id = b.aid inner join c on a.cid = c.id inner join d on a.did=d.id where a.did='XXX' ORDER BY a.status

从A.did = d.id上的a.cid = c.id内部联接d上的a.id = b.aid内部联接c中的内部联接b中选择*其中a.did ='XXX'ORDER BY a.status

the correct use of joins (left, richt, inner, outer) depends on structure of tables

正确使用连接(left,richt,inner,outer)取决于表的结构

hope this helps

希望这可以帮助