优化使用相同表上的多个左连接的查询

时间:2022-05-09 08:35:42

I've come across a query that is taking "too long". The query has 50+ left joins between 10 or so tables. To give a brief overview of the database model, the tables joined are tables that store data for a particular data type (ex: date_fields, integer_fields, text_fields, etc.) and each has a column for the value, a "datafield" id, and a ticket id. The query is built programmatically based on an association table between a "ticket" and its "data fields".

我遇到过一个查询,它花费的时间“太长”了。查询有50+左连接在10个左右的表之间。给数据库模型的简要概述,加入表的表存储数据为特定的数据类型(例:date_fields integer_fields,text_field,等等),每个列的值,“datafield”id,id和一个机票。查询是通过编程方式创建的基于关联表之间的“门票”,其“数据字段”。

The join statements look something like the following:

联接语句如下所示:

...FROM tickets t
LEFT JOIN ticket_text_fields t001 ON(t.id=t001.ticket_id AND t001.textfield_id=7)
...
LEFT JOIN ticket_date_fields t056 ON(t.id=t056.ticket_id AND t056.datafield_id=434)

When using explain on the query shows the following:

当在查询中使用explain时,显示如下:

1   SIMPLE   t       ref   idx_dataset_id                   idx_dataset_id  5   const   2871   Using where; Using temporary; Using filesort
1   SIMPLE   t001   ref   idx_ticket_id,idx_datafield_id   idx_ticket_id   5   t.id   5   
... 
1   SIMPLE   t056   ref   idx_ticket_id,idx_datafield_id   idx_ticket_id   5   t.id   8

What direction can I take to tune this query? All the indexes seem to be in place. Perhaps the t table (tickets) row number (2871) should be reduced. How many left joins is too much? Should the datafield tables be joined only once and then queried each for the data that is required?

我可以采取什么方向来调优这个查询?所有的指数似乎都已到位。也许应该减少t表(票证)行号(2871)。剩下多少加入太多?是否应该只连接数据表一次,然后对每个表查询所需的数据?

2 个解决方案

#1


7  

You're using a variation of the terrible antipattern called Entity-Attribute-Value. You're storing attributes on separate rows, so if you want to reconstruct something that looks like a conventional row of data, you need to make one join per attribute.

您正在使用一种可怕的反模式变体,称为实体-属性-值。您正在将属性存储在单独的行上,因此如果您想重构看起来像常规数据行的东西,您需要为每个属性创建一个连接。

It's not surprising this creates a query with 50 joins. This is far too many for most databases to run efficiently (you haven't identified which database you're using). Eventually you'll want a few more attributes and you might exceed some architectural limit of the database on the number of joins it can do.

创建一个包含50个连接的查询并不奇怪。对于大多数数据库来说,要有效地运行这些数据实在是太多了(您还没有确定要使用哪个数据库)。最终,您将需要更多的属性,您可能会超出数据库在连接数量上的一些架构限制。

The solution is: don't reconstruct the row in SQL.

解决方案是:不要在SQL中重构行。

Instead, query the attributes as multiple rows, instead of trying to combine them onto a single row.

相反,应该将属性查询为多行,而不是试图将它们合并到单个行中。

SELECT ... FROM tickets t
INNER JOIN ticket_text_fields f ON t.id=f.ticket_id
WHERE f.textfield_id IN (7, 8, 9, ...)
UNION ALL
SELECT ... FROM tickets t
INNER JOIN ticket_date_fields d ON t.id=d.ticket_id
WHERE d.datafield_id IN (434, 435, 436, ...)

Then you have to write a function in your application to loop over the resulting rowset, and collect the attributes one by one into an object in application space, so then you can use it as if it's a single entity.

然后,您必须在应用程序中编写一个函数,对结果行集进行循环,并将属性逐个收集到应用程序空间中的一个对象中,这样您就可以将其用作一个单独的实体。

#2


0  

for the clearer query i would use something like this:

对于更清晰的查询,我将使用如下内容:

SELECT ... FROM tickets as t  
JOIN ticket_text_fields as txt ON t.id = txt.ticket_id  
JOIN ticket_date_fields as dt ON t.id = dt.ticket_id  
WHERE txt.textfield_id IN (...)
AND dt.datefield_id IN (...)

Joins would be probably LEFT, but it depends on the structure of your data.
There is no union in the query and there are only two joins

连接可能会被保留,但这取决于数据的结构。查询中没有union,只有两个连接

#1


7  

You're using a variation of the terrible antipattern called Entity-Attribute-Value. You're storing attributes on separate rows, so if you want to reconstruct something that looks like a conventional row of data, you need to make one join per attribute.

您正在使用一种可怕的反模式变体,称为实体-属性-值。您正在将属性存储在单独的行上,因此如果您想重构看起来像常规数据行的东西,您需要为每个属性创建一个连接。

It's not surprising this creates a query with 50 joins. This is far too many for most databases to run efficiently (you haven't identified which database you're using). Eventually you'll want a few more attributes and you might exceed some architectural limit of the database on the number of joins it can do.

创建一个包含50个连接的查询并不奇怪。对于大多数数据库来说,要有效地运行这些数据实在是太多了(您还没有确定要使用哪个数据库)。最终,您将需要更多的属性,您可能会超出数据库在连接数量上的一些架构限制。

The solution is: don't reconstruct the row in SQL.

解决方案是:不要在SQL中重构行。

Instead, query the attributes as multiple rows, instead of trying to combine them onto a single row.

相反,应该将属性查询为多行,而不是试图将它们合并到单个行中。

SELECT ... FROM tickets t
INNER JOIN ticket_text_fields f ON t.id=f.ticket_id
WHERE f.textfield_id IN (7, 8, 9, ...)
UNION ALL
SELECT ... FROM tickets t
INNER JOIN ticket_date_fields d ON t.id=d.ticket_id
WHERE d.datafield_id IN (434, 435, 436, ...)

Then you have to write a function in your application to loop over the resulting rowset, and collect the attributes one by one into an object in application space, so then you can use it as if it's a single entity.

然后,您必须在应用程序中编写一个函数,对结果行集进行循环,并将属性逐个收集到应用程序空间中的一个对象中,这样您就可以将其用作一个单独的实体。

#2


0  

for the clearer query i would use something like this:

对于更清晰的查询,我将使用如下内容:

SELECT ... FROM tickets as t  
JOIN ticket_text_fields as txt ON t.id = txt.ticket_id  
JOIN ticket_date_fields as dt ON t.id = dt.ticket_id  
WHERE txt.textfield_id IN (...)
AND dt.datefield_id IN (...)

Joins would be probably LEFT, but it depends on the structure of your data.
There is no union in the query and there are only two joins

连接可能会被保留,但这取决于数据的结构。查询中没有union,只有两个连接