具有多个表JOIN的SQL SELECT DISTINCT性能

时间:2022-05-25 04:20:57

I am building a database containing information on university courses. Each course can be associated with

我正在建立一个包含大学课程信息的数据库。每门课程都可以与之相关联

  • one or many authors
  • 一个或多个作者

  • one or many disciplines
  • 一个或多个学科

  • one or many institutions
  • 一个或多个机构

  • one or many levels
  • 一个或多个级别

My database contains the following tables:

我的数据库包含以下表格:

  • course (cou_id, cou_name, cou_number, cou_year, cou_term)
  • 课程(cou_id,cou_name,cou_number,cou_year,cou_term)

  • author (aut_id, aut_last)
  • 作者(aut_id,aut_last)

  • discipline (dis_id, dis_name)
  • 纪律(dis_id,dis_name)

  • institution (ins_id, ins_name, ins_classification)
  • 机构(ins_id,ins_name,ins_classification)

  • level (lev_id, lev_name)
  • 级别(lev_id,lev_name)

  • authorcourse (linking table)
  • authorcourse(链接表)

  • coursediscipline (linking table)
  • coursediscipline(链接表)

  • courseinstitution (linking table)
  • courseinstitution(链接表)

  • courselevel (linking table)
  • courselevel(链接表)

In order to retrieve ALL courses from the database (and the corresponding author,discipline, institution, and level information), I use the following query:

为了从数据库中检索所有课程(以及相应的作者,学科,机构和级别信息),我使用以下查询:

SELECT DISTINCT aut_last, c.cou_id, cou_name, cou_number, cou_year, cou_term, dis_name, ins_name, ins_classification, lev_name
    FROM authorcourse ac1
    INNER JOIN authorcourse ac2        
    ON ac1.cou_id = ac2.cou_id        
    INNER JOIN author a
    ON ac2.aut_id=a.aut_id
    INNER JOIN course c
    ON ac2.cou_id = c.cou_id
    INNER JOIN coursediscipline cd1
    ON ac2.cou_id = cd1.cou_id
    INNER JOIN coursediscipline cd2
    ON cd1.cou_id = cd2.cou_id
    INNER JOIN discipline d
    ON cd2.dis_id = d.dis_id
    INNER JOIN courseinstitution ci1
    ON ac2.cou_id = ci1.cou_id
    INNER JOIN courseinstitution ci2
    ON ci1.cou_id = ci2.cou_id
    INNER JOIN institution i
    ON ci2.ins_id = i.ins_id
    INNER JOIN courselevel cl1
    ON ac2.cou_id = cl1.cou_id
    INNER JOIN courselevel cl2
    ON cl1.cou_id = cl2.cou_id
    INNER JOIN level l
    ON cl2.lev_id = l.lev_id

This query works well when there are 15 courses in the database with “simple” relationships. e.g.:

当数据库中有15个具有“简单”关系的课程时,此查询很有效。例如。:

 cou_name = 'course1', cou_number = 'C1', cou_year = '1999', cou_term = 'summer'
 aut_last = 'Doe1'
 dis_name = 'discipline1'
 ins_name = 'institution1', ins_classification = 'classification1'
 lev_name = 'level1'

-->Showing rows 0 - 14 ( 15 total, Query took 0.0118 sec) EXPLAIN produces the following table:

- >显示行0 - 14(总共15行,查询花了0.0118秒)EXPLAIN生成下表:

id select_type table type   possible_keys          key     key_len ref             rows Extra
1  SIMPLE      ac1   index  cou_id                 aut_id  2       NULL            15   Using index; Using  temporary
1  SIMPLE      ac2   ref    PRIMARY,aut_id,cou_id  cou_id  2       ccdb.ac1.cou_id 1    Using index
1  SIMPLE      a     eq_ref PRIMARY                PRIMARY 2       ccdb.ac2.aut_id 1    
1  SIMPLE      c     eq_ref PRIMARY                PRIMARY 2       ccdb.ac2.cou_id 1    Using where
1  SIMPLE      cd1   ref    PRIMARY,cou_id         PRIMARY 2       ccdb.ac1.cou_id 1    Using index
1  SIMPLE      cd2   ref    PRIMARY,cou_id,dis_id  PRIMARY 2       ccdb.ac2.cou_id 1    Using where; Using index
1  SIMPLE      d     eq_ref PRIMARY                PRIMARY 2       ccdb.cd2.dis_id 1    
1  SIMPLE      ci1   ref    PRIMARY,cou_id         PRIMARY 2       ccdb.ac2.cou_id 1    Using where; Using index
1  SIMPLE      ci2   ref    PRIMARY,cou_id,ins_id  PRIMARY 2       ccdb.ac2.cou_id 1    Using where; Using index
1  SIMPLE      i     eq_ref PRIMARY                PRIMARY 2       ccdb.ci2.ins_id 1    
1  SIMPLE      cl1   ref    PRIMARY,cou_id         PRIMARY 2       ccdb.cd1.cou_id 1    Using where; Using index
1  SIMPLE      cl2   ref    PRIMARY,cou_id,lev_id  PRIMARY 2       ccdb.cl1.cou_id 1    Using where; Using index
1  SIMPLE      l     eq_ref PRIMARY                PRIMARY 2       ccdb.cl2.lev_id 1    

Problem: Performance dramatically decreases when there are 15 courses with multiple relationships. Example course:

问题:当有15个具有多种关系的课程时,性能会急剧下降。示例课程:

cou_name = 'course1', cou_number = 'C1', cou_year = '1999', cou_term = 'summer'
aut_last = 'Doe1', 'Doe', 'Doe3', 'Doe4'
dis_name = 'discipline1', 'discipline2', 'discipline3', 'discipline4'
ins_name = 'institution1'(ins_classification = 'classification1'),     'institution2'(ins_classification = 'classification2'), 'institution3'(ins_classification =  'classification3'), 'institution4' (ins_classification = 'classification4')
lev_name = 'level1', 'level2', 'level3', 'level4'

-->Showing rows 0 - 29 ( 3,840 total, Query took 14.7039 sec) EXPLAIN produces the following table:

- >显示行0 - 29(总共3,840,查询花了14.7039秒)EXPLAIN生成下表:

 id select_type table type   possible_keys         key     key_len ref             rows Extra
 1  SIMPLE      c     ALL    PRIMARY               NULL    NULL    NULL            15   Using temporary
 1  SIMPLE      ac1   ref    PRIMARY,aut_id,cou_id cou_id  2       ccdb.c.cou_id   2    Using index
 1  SIMPLE      a     eq_ref PRIMARY               PRIMARY 2       ccdb.ac1.aut_id 1    
 1  SIMPLE      ac2   ref    cou_id                cou_id  2       ccdb.c.cou_id   2    Using index
 1  SIMPLE      cd1   ref    PRIMARY,cou_id        cou_id  2       ccdb.ac1.cou_id 2    Using where; Using index
 1  SIMPLE      cd2   ref    PRIMARY,cou_id,dis_id cou_id  2       ccdb.c.cou_id   2    Using index
 1  SIMPLE      d     eq_ref PRIMARY               PRIMARY 2       ccdb.cd2.dis_id 1    
 1  SIMPLE      ci1   ref    PRIMARY,cou_id        cou_id  2       ccdb.ac1.cou_id 2    Using where; Using index
 1  SIMPLE      ci2   ref    PRIMARY,cou_id,ins_id cou_id  2       ccdb.ac2.cou_id 2    Using where; Using index
 1  SIMPLE      i     eq_ref PRIMARY               PRIMARY 2       ccdb.ci2.ins_id 1    
 1  SIMPLE      cl1   ref    PRIMARY,cou_id        cou_id  2       ccdb.c.cou_id   2    Using index
 1  SIMPLE      cl2   ref    PRIMARY,cou_id,lev_id cou_id  2       ccdb.ci2.cou_id 2    Using where; Using index
 1  SIMPLE      l     eq_ref PRIMARY               PRIMARY 2       ccdb.cl2.lev_id 1    

When run through my PHP website, I get the following error “Fatal error: Maximum execution time of 30 seconds exceeded in …”

当我的PHP网站运行时,我收到以下错误“致命错误:超过30秒的最大执行时间......”

Question: How can I speed up this query? I tried several different combinations of the joins and (as you can see in the EXPLAIN results) I indexed all columns I considered potentially relevant.

问题:如何加快此查询?我尝试了几种不同的连接组合(正如您在EXPLAIN结果中看到的那样),我将所有认为可能相关的列编入索引。

Any help would be greatly appreciated.

任何帮助将不胜感激。

3 个解决方案

#1


1  

IT looks to me asif your pulling all this data for the 'course view' type detail page ?

如果您在“课程视图”类型详细信息页面中提取所有这些数据,IT会向我看来?

If so I would say, once a course has been made in the database, how often are the number of authors, disciplines, institutions and levels going to change ?

如果是这样,我会说,一旦在数据库中开设了一门课程,作者,学科,机构和水平的数量会多久发生变化?

IF there never going to change from the time there set, then when it is set, also set it in a totally denormalised table like this:

如果从那里设置的时间永远不会改变,那么当它被设置时,也将它设置在一个完全非规范化的表中,如下所示:

courseView (cou_id, cou_name, cou_number, cou_year, cou_term, data)

courseView(cou_id,cou_name,cou_number,cou_year,cou_term,data)

.. and in 'data', you just put in a serialised array of all the data. Ugly, but it's going to be fast.

..并且在'data'中,您只需输入所有数据的序列化数组。丑陋,但它会很快。

Then, when you search by course id to pull up the one, you can search only one row, one index and pull up all the data instantly.

然后,当您按课程ID搜索以提取一个时,您只能搜索一行,一个索引并立即提取所有数据。

..

Also, if your going to let people search by Authors, then you can still do this with the normalised table with a simple query like normal.

此外,如果您要让人们按作者搜索,那么您仍然可以使用标准化表格执行此操作,并使用正常的简单查询。

#2


1  

You appear to have unnecessary joins in your query. I believe you could get the same by doing the following. It would probably improve your query's performance.

您的查询中似乎有不必要的联接。我相信你可以通过以下方式获得同样的结果。它可能会提高您的查询性能。

SELECT DISTINCT aut_last, c.cou_id, cou_name, cou_number, cou_year, cou_term, dis_name, ins_name, ins_classification, lev_name
    FROM authorcourse ac 
    INNER JOIN author a
    ON ac.aut_id=a.aut_id
    INNER JOIN course c
    ON ac.cou_id = c.cou_id
    INNER JOIN coursediscipline cd 
    ON ac.cou_id = cd.cou_id
    INNER JOIN discipline d
    ON cd.dis_id = d.dis_id
    INNER JOIN courseinstitution ci 
    ON ac.cou_id = ci.cou_id
    INNER JOIN institution i
    ON ci.ins_id = i.ins_id
    INNER JOIN courselevel cl 
    ON ac.cou_id = cl.cou_id
    INNER JOIN level l
    ON cl.lev_id = l.lev_id

You had redundant joins to the same tables which did not seem to be accomplishing anything.

你有相同的表的冗余连接,似乎没有完成任何事情。

#3


0  

I dont believe there's enough information to resolve your problem outright. By looking at your explain statement, it looks like you have all the indexes setup correctly, but the number of rows that you're getting is the concern.

我不相信有足够的信息可以彻底解决你的问题。通过查看您的explain语句,看起来您已正确设置所有索引,但是您获得的行数是关注点。

The culprit is most likely due to a table scan on a really large table is being compounded by the number of rows you're getting. From my experience when I encounter these issues, I pinpoint where the slowless occurs and work from there. Some strategies to use are temporary tables or subqueries (which break them up into smaller, more managable queries.

罪魁祸首很可能是由于在一张非常大的桌子上进行的桌面扫描正在加剧你所获得的行数。根据我遇到这些问题时的经验,我确定了慢速发生的位置并从那里开始工作。一些使用的策略是临时表或子查询(将它们分解为更小,更易管理的查询。

Also to get rid of that PHP fatal error problem, you should be able use a try / catch exception block to handle that gracefully.

另外,要摆脱PHP致命错误问题,您应该能够使用try / catch异常块来优雅地处理它。

#1


1  

IT looks to me asif your pulling all this data for the 'course view' type detail page ?

如果您在“课程视图”类型详细信息页面中提取所有这些数据,IT会向我看来?

If so I would say, once a course has been made in the database, how often are the number of authors, disciplines, institutions and levels going to change ?

如果是这样,我会说,一旦在数据库中开设了一门课程,作者,学科,机构和水平的数量会多久发生变化?

IF there never going to change from the time there set, then when it is set, also set it in a totally denormalised table like this:

如果从那里设置的时间永远不会改变,那么当它被设置时,也将它设置在一个完全非规范化的表中,如下所示:

courseView (cou_id, cou_name, cou_number, cou_year, cou_term, data)

courseView(cou_id,cou_name,cou_number,cou_year,cou_term,data)

.. and in 'data', you just put in a serialised array of all the data. Ugly, but it's going to be fast.

..并且在'data'中,您只需输入所有数据的序列化数组。丑陋,但它会很快。

Then, when you search by course id to pull up the one, you can search only one row, one index and pull up all the data instantly.

然后,当您按课程ID搜索以提取一个时,您只能搜索一行,一个索引并立即提取所有数据。

..

Also, if your going to let people search by Authors, then you can still do this with the normalised table with a simple query like normal.

此外,如果您要让人们按作者搜索,那么您仍然可以使用标准化表格执行此操作,并使用正常的简单查询。

#2


1  

You appear to have unnecessary joins in your query. I believe you could get the same by doing the following. It would probably improve your query's performance.

您的查询中似乎有不必要的联接。我相信你可以通过以下方式获得同样的结果。它可能会提高您的查询性能。

SELECT DISTINCT aut_last, c.cou_id, cou_name, cou_number, cou_year, cou_term, dis_name, ins_name, ins_classification, lev_name
    FROM authorcourse ac 
    INNER JOIN author a
    ON ac.aut_id=a.aut_id
    INNER JOIN course c
    ON ac.cou_id = c.cou_id
    INNER JOIN coursediscipline cd 
    ON ac.cou_id = cd.cou_id
    INNER JOIN discipline d
    ON cd.dis_id = d.dis_id
    INNER JOIN courseinstitution ci 
    ON ac.cou_id = ci.cou_id
    INNER JOIN institution i
    ON ci.ins_id = i.ins_id
    INNER JOIN courselevel cl 
    ON ac.cou_id = cl.cou_id
    INNER JOIN level l
    ON cl.lev_id = l.lev_id

You had redundant joins to the same tables which did not seem to be accomplishing anything.

你有相同的表的冗余连接,似乎没有完成任何事情。

#3


0  

I dont believe there's enough information to resolve your problem outright. By looking at your explain statement, it looks like you have all the indexes setup correctly, but the number of rows that you're getting is the concern.

我不相信有足够的信息可以彻底解决你的问题。通过查看您的explain语句,看起来您已正确设置所有索引,但是您获得的行数是关注点。

The culprit is most likely due to a table scan on a really large table is being compounded by the number of rows you're getting. From my experience when I encounter these issues, I pinpoint where the slowless occurs and work from there. Some strategies to use are temporary tables or subqueries (which break them up into smaller, more managable queries.

罪魁祸首很可能是由于在一张非常大的桌子上进行的桌面扫描正在加剧你所获得的行数。根据我遇到这些问题时的经验,我确定了慢速发生的位置并从那里开始工作。一些使用的策略是临时表或子查询(将它们分解为更小,更易管理的查询。

Also to get rid of that PHP fatal error problem, you should be able use a try / catch exception block to handle that gracefully.

另外,要摆脱PHP致命错误问题,您应该能够使用try / catch异常块来优雅地处理它。