如何使用多个连接优化查询?

时间:2022-04-14 04:17:19

I have simple but long query which count the content of the result it takes about 14 seconds. the count itself on the main table takes less than a second but after multiple join the delay is too high as follow

我有简单但很长的查询,它计算结果的内容大约需要14秒。主表上的计数本身不到一秒钟,但在多次加入后,延迟太高,如下所示

Select  Count(Distinct visits.id) As Count_id
    From  visits
    Left Join  clients_locations  ON visits.client_location_id = clients_locations.id
    Left Join  clients  ON clients_locations.client_id = clients.id
    Left Join  locations  ON clients_locations.location_id = locations.id
    Left Join  users  ON visits.user_id = users.id
    Left Join  potentialities  ON clients_locations.potentiality = potentialities.id
    Left Join  classes  ON clients_locations.class = classes.id
    Left Join  professions  ON clients.profession_id = professions.id
    Inner Join  specialties  ON clients.specialty_id = specialties.id
    Left Join  districts  ON locations.district_id = districts.id
    Left Join  provinces  ON districts.province_id = provinces.id
    Left Join  locations_types  ON locations.location_type_id = locations_types.id
    Left Join  areas  ON clients_locations.area_id = areas.id
    Left Join  calls  ON calls.visit_id = visits.id 

The output of explain is

解释的输出是

+---+---+---+---+---+---+---+---+---+---+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+---+---+---+---+---+---+---+---+---+---+
| 1 | SIMPLE | specialties | index | PRIMARY | specialty_name | 52 | NULL | 53 | Using index |
| 1 | SIMPLE | clients | ref | PRIMARY,specialty | specialty | 4 | crm_db.specialties.id | 143 |  |
| 1 | SIMPLE | clients_locations | ref | PRIMARY,client_id | client_id | 4 | crm_db.clients.id | 1 |  |
| 1 | SIMPLE | locations | eq_ref | PRIMARY | PRIMARY | 4 | crm_db.clients_locations.location_id | 1 |  |
| 1 | SIMPLE | districts | eq_ref | PRIMARY | PRIMARY | 4 | crm_db.locations.district_id | 1 | Using where |
| 1 | SIMPLE | visits | ref | unique_visit,client_location_id | unique_visit | 4 | crm_db.clients_locations.id | 4 | Using index |
| 1 | SIMPLE | calls | ref | call_unique,visit_id | call_unique | 4 | crm_db.visits.id | 1 | Using index |
+---+---+---+---+---+---+---+---+---+---+

Update 1 The above query used with dynamic where statement $sql = $sql . "Where ". $whereFilter but the i submitted it in simple form . So do not consider the answer just eleminate the joins :)

更新1以上查询与动态where语句$ sql = $ sql一起使用。 “哪里”。 $ whereFilter但是我以简单的形式提交了它。所以不要考虑答案只是加入连接:)

Update 2 Here is example of dynamic filtering

更新2以下是动态过滤的示例

$temp = $this->province_id;
if ($temp != null) {
        $whereFilter = $whereFilter . " and provinces.id In ($temp) ";
    }

But in startup case which is our case no where statement

但在启动情况下,这是我们的情况没有where声明

6 个解决方案

#1


7  

Left joins always return a row from the first table, but may return multiple rows if there are multiple matching rows. But because you are counting distinct visit rows, left joining to another table while counting distinct visits is the same as just counting the rows of visits. Thus the only joins that affect the result are inner joins, so you can remove all "completely" left joined tables without affecting the result.

左连接总是从第一个表返回一行,但如果有多个匹配的行,则可能返回多行。但是因为您正在计算不同的访问行,所以在连接到另一个表时,计算不同的访问次数与计算访问行数相同。因此,影响结果的唯一连接是内连接,因此您可以删除所有“完全”左连接表而不影响结果。

What I mean by "completely" is that some left joined tables are effectively inner joined; the inner join to specialty requires the join to clients to succeed and thus also be an inner join, which in turn requires the join to clients_locations to succeed and thus also be an inner join.

我所说的“完全”是指一些左连接的桌子实际上是内部连接的;内部联接到专业需要连接到客户端才能成功,因此也是内部联接,这反过来要求连接到clients_locations成功,因此也是内部联接。

Your query (as posted) can be reduced to:

您的查询(已发布)可以简化为:

Select Count(Distinct visits.id) As Count_id
From visits
Join clients_locations ON visits.client_location_id = clients_locations.id
Join clients ON clients_locations.client_id = clients.id
Join specialties ON clients.specialty_id = specialties.id

Removing all those unnecessary joins will however greatly improve the runtime of your query, not only because there are less joins to make but also because the resulting rowset size could be enormous when you consider that the size is the product of the matches in all the tables (not the sum.

然而,删除所有这些不必要的连接将极大地改善查询的运行时间,这不仅是因为连接的连接较少,而且因为当您认为大小是所有表中匹配的乘积时,得到的行集大小可能会很大(不是总和。

For maximum performance, create a covering indexes on all id-and-fk columns:

为获得最佳性能,请在所有id-and-fk列上创建覆盖索引:

create index visits_id_client_location_id on visits(id, client_location_id);
create index clients_locations_id_client_id on clients_locations(id, client_id);
create index clients_id_specialty_id on clients(id, specialty_id);

so index-only scans can be used where possible. I assume there are indexes on the PK columns.

因此,可以在可能的情况下使用仅索引扫描。我假设PK列上有索引。

#2


3  

You don't seem to have any (or much) intentional filtering. If you want to know the number of visits referred to in calls, I would propose:

您似乎没有任何(或多)故意过滤。如果你想知道电话中提到的访问次数,我建议:

select count(distinct c.visit_id)
from calls c;

#3


3  

in order to optimize the whole process you can dynamically construct the pre-where SQL according to the filters you are going to apply. Like:

为了优化整个过程,您可以根据要应用的过滤器动态构建前置SQL。喜欢:


    // base select and left join 
    $preSQL = "Select  Count(Distinct visits.id) As Count_id From  visits ";
    $preSQL .= "Left Join  clients_locations  ON visits.client_location_id = clients_locations.id ";

    // filtering by province_id
    $temp = $this->province_id;
    if ($temp != null) {
            $preSQL .= "Left Join  locations ON clients_locations.location_id = locations.id ";
            $preSQL .= "Left Join  districts ON locations.district_id = districts.id ";
            $preSQL .= "Left Join  provinces ON districts.province_id = provinces.id ";
            $whereFilter = "provinces.id In ($temp) ";
        }

    $sql = $preSQL . "Where ".   $whereFilter;
    // ...

If you are using multiple filters you can put all inner/left-join strings in an array and then after analysing the request, you can construct your $preSQL using the minimum of joins.

如果您使用多个过滤器,则可以将所有内部/左侧连接字符串放在一个数组中,然后在分析请求后,您可以使用最少的连接构造$ preSQL。

#4


1  

Use COUNT(CASE WHEN visit_id!="" THEN 1 END) as visit.

使用COUNT(例如,当visit_id!=“”那么1结束时)作为访问。

Hope this will help

希望这会有所帮助

#5


1  

Isn't it just:

不只是:

SELECT COUNT(id)
FROM visits

because all the left outer joins also return a visits.id when theres no matching clients, ..., calls and id's ought to be unique?

因为当没有匹配的客户端,...,调用和id应该是唯一的时,所有左外连接也返回visits.id?

Different hint: The one inner join also is only effective when a client exists. Generally when needing inner joins they must be put as high/near as possible to the source table, so in your example it would have been best in the line after "left join clients".

不同提示:一个内连接也仅在客户端存在时有效。通常,当需要内部联接时,它们必须尽可能高/接近源表,因此在您的示例中,在“左联接客户端”之后的行中最好。

#6


0  

I didn't understand too much your idea, specially your INNER JOIN that will tranform some LEFT in INNER JOINs, it seems strange, but lets try a solution:

我不太了解你的想法,特别是你的INNER JOIN将在INNER JOIN中转换一些LEFT,这看起来很奇怪,但让我们尝试一下解决方案:

Usually the LEFT JOINs has a very bad performance, and I think you'll need them only if you'll use them in WHERE clause, then you can include them with INNER JOIN only if you'll use them. For example:

通常LEFT JOIN的性能非常糟糕,我认为只有在WHERE子句中使用它们才需要它们,然后只有在你使用它们时才可以用INNER JOIN包含它们。例如:

$query = "Select Count(Distinct visits.id) As Count_id  From  visits ";

if($temp != null){
    $query .= " INNER JOIN  clients_locations  ON visits.client_location_id = clients_locations.id ";
    $query .= " INNER JOIN  locations  ON clients_locations.location_id = locations.id  ";
    $query .= " INNER JOIN  locations  ON clients_locations.location_id = locations.id ";
    $query .= " INNER JOIN  districts  ON locations.district_id = districts.id "
    $query .= " INNER JOIN  provinces  ON districts.province_id = provinces.id ";
    $whereFilter .= " and provinces.id In ($temp) ";
}

I think it'll help your performance and it'll works as you need.

我认为这将有助于您的表现,它将按您的需要运作。

#1


7  

Left joins always return a row from the first table, but may return multiple rows if there are multiple matching rows. But because you are counting distinct visit rows, left joining to another table while counting distinct visits is the same as just counting the rows of visits. Thus the only joins that affect the result are inner joins, so you can remove all "completely" left joined tables without affecting the result.

左连接总是从第一个表返回一行,但如果有多个匹配的行,则可能返回多行。但是因为您正在计算不同的访问行,所以在连接到另一个表时,计算不同的访问次数与计算访问行数相同。因此,影响结果的唯一连接是内连接,因此您可以删除所有“完全”左连接表而不影响结果。

What I mean by "completely" is that some left joined tables are effectively inner joined; the inner join to specialty requires the join to clients to succeed and thus also be an inner join, which in turn requires the join to clients_locations to succeed and thus also be an inner join.

我所说的“完全”是指一些左连接的桌子实际上是内部连接的;内部联接到专业需要连接到客户端才能成功,因此也是内部联接,这反过来要求连接到clients_locations成功,因此也是内部联接。

Your query (as posted) can be reduced to:

您的查询(已发布)可以简化为:

Select Count(Distinct visits.id) As Count_id
From visits
Join clients_locations ON visits.client_location_id = clients_locations.id
Join clients ON clients_locations.client_id = clients.id
Join specialties ON clients.specialty_id = specialties.id

Removing all those unnecessary joins will however greatly improve the runtime of your query, not only because there are less joins to make but also because the resulting rowset size could be enormous when you consider that the size is the product of the matches in all the tables (not the sum.

然而,删除所有这些不必要的连接将极大地改善查询的运行时间,这不仅是因为连接的连接较少,而且因为当您认为大小是所有表中匹配的乘积时,得到的行集大小可能会很大(不是总和。

For maximum performance, create a covering indexes on all id-and-fk columns:

为获得最佳性能,请在所有id-and-fk列上创建覆盖索引:

create index visits_id_client_location_id on visits(id, client_location_id);
create index clients_locations_id_client_id on clients_locations(id, client_id);
create index clients_id_specialty_id on clients(id, specialty_id);

so index-only scans can be used where possible. I assume there are indexes on the PK columns.

因此,可以在可能的情况下使用仅索引扫描。我假设PK列上有索引。

#2


3  

You don't seem to have any (or much) intentional filtering. If you want to know the number of visits referred to in calls, I would propose:

您似乎没有任何(或多)故意过滤。如果你想知道电话中提到的访问次数,我建议:

select count(distinct c.visit_id)
from calls c;

#3


3  

in order to optimize the whole process you can dynamically construct the pre-where SQL according to the filters you are going to apply. Like:

为了优化整个过程,您可以根据要应用的过滤器动态构建前置SQL。喜欢:


    // base select and left join 
    $preSQL = "Select  Count(Distinct visits.id) As Count_id From  visits ";
    $preSQL .= "Left Join  clients_locations  ON visits.client_location_id = clients_locations.id ";

    // filtering by province_id
    $temp = $this->province_id;
    if ($temp != null) {
            $preSQL .= "Left Join  locations ON clients_locations.location_id = locations.id ";
            $preSQL .= "Left Join  districts ON locations.district_id = districts.id ";
            $preSQL .= "Left Join  provinces ON districts.province_id = provinces.id ";
            $whereFilter = "provinces.id In ($temp) ";
        }

    $sql = $preSQL . "Where ".   $whereFilter;
    // ...

If you are using multiple filters you can put all inner/left-join strings in an array and then after analysing the request, you can construct your $preSQL using the minimum of joins.

如果您使用多个过滤器,则可以将所有内部/左侧连接字符串放在一个数组中,然后在分析请求后,您可以使用最少的连接构造$ preSQL。

#4


1  

Use COUNT(CASE WHEN visit_id!="" THEN 1 END) as visit.

使用COUNT(例如,当visit_id!=“”那么1结束时)作为访问。

Hope this will help

希望这会有所帮助

#5


1  

Isn't it just:

不只是:

SELECT COUNT(id)
FROM visits

because all the left outer joins also return a visits.id when theres no matching clients, ..., calls and id's ought to be unique?

因为当没有匹配的客户端,...,调用和id应该是唯一的时,所有左外连接也返回visits.id?

Different hint: The one inner join also is only effective when a client exists. Generally when needing inner joins they must be put as high/near as possible to the source table, so in your example it would have been best in the line after "left join clients".

不同提示:一个内连接也仅在客户端存在时有效。通常,当需要内部联接时,它们必须尽可能高/接近源表,因此在您的示例中,在“左联接客户端”之后的行中最好。

#6


0  

I didn't understand too much your idea, specially your INNER JOIN that will tranform some LEFT in INNER JOINs, it seems strange, but lets try a solution:

我不太了解你的想法,特别是你的INNER JOIN将在INNER JOIN中转换一些LEFT,这看起来很奇怪,但让我们尝试一下解决方案:

Usually the LEFT JOINs has a very bad performance, and I think you'll need them only if you'll use them in WHERE clause, then you can include them with INNER JOIN only if you'll use them. For example:

通常LEFT JOIN的性能非常糟糕,我认为只有在WHERE子句中使用它们才需要它们,然后只有在你使用它们时才可以用INNER JOIN包含它们。例如:

$query = "Select Count(Distinct visits.id) As Count_id  From  visits ";

if($temp != null){
    $query .= " INNER JOIN  clients_locations  ON visits.client_location_id = clients_locations.id ";
    $query .= " INNER JOIN  locations  ON clients_locations.location_id = locations.id  ";
    $query .= " INNER JOIN  locations  ON clients_locations.location_id = locations.id ";
    $query .= " INNER JOIN  districts  ON locations.district_id = districts.id "
    $query .= " INNER JOIN  provinces  ON districts.province_id = provinces.id ";
    $whereFilter .= " and provinces.id In ($temp) ";
}

I think it'll help your performance and it'll works as you need.

我认为这将有助于您的表现,它将按您的需要运作。