一个非常大的SQL查询的问题

时间:2022-07-10 03:44:12

Problem:

问题:

I'm creating a Trac report that shows how many tickets are at each stage in our development cycle per chapter of our library. A tickets represents a single piece of work, usually an individual routine.

我正在创建一个Trac报告,显示我们图书馆每一章的开发周期中每个阶段有多少张门票。一张票代表一项工作,通常是一个单独的程序。

For instance how many tickets for the upcoming release (milestone) are at the peer review stage for chapter X.

例如,即将发布的版本(里程碑)在第十章的同行评审阶段有多少张票。

There are 10 development stages and 47 chapters.

有10个发展阶段和47个章节。

The given MySQL query is for all 10 development stages but for only one chapter and is 25 lines long, the whole query for all chapters is therefore over 1200 lines.

给定的MySQL查询适用于所有10个开发阶段,但仅适用于一个章节,且长度为25行,因此所有章节的整个查询超过1200行。

The error given by Trac is KeyError: 'numrows' where the query gets to large.

Trac给出的错误是KeyError:查询变大的“numrows”。

When entering the query directly into MySQL the error given is Out of resources when opening file (Errcode: 24) (23)

当直接将查询输入MySQL时,打开文件时,所给出的错误是没有资源的(Errcode: 24) (23)

Question :

问题:

  • Refactoring - can this be done 'better' sql gurus, are there some clever tricks/advance techniques?

    重构——这能做得更好吗? sql大师们,有一些聪明的技巧/先进技术吗?

  • Approach - do I need a different approach completely?

    方法——我需要完全不同的方法吗?

  • Configuration - can MySQL and/or Trac be configured to accept very large queries

    配置——可以将MySQL和/或Trac配置为接受非常大的查询吗

Notes:

注:

The data in the tables is small, the query doesn't take long to execute when it's under the apparent size limitations.

表中的数据比较小,当查询处于明显的大小限制时,执行时间不会太长。

The query is passed from the Trac system to MySQL which places some restrictions on what can be done, for example only a single query can be sent from trac to generate a report.

查询从Trac系统传递到MySQL,这对可执行的操作有一些限制,例如只有一个查询可以从Trac发送到生成报告。

An example of what a Trac report looks like can be seen here.

一个关于Trac报告的例子可以在这里看到。

The %c%* in the query is just the unique string I use for replacing the actual chapters when the query is generated via a script.

查询中的%c%*仅仅是通过脚本生成查询时用于替换实际章节的惟一字符串。

SELECT '%c%' as Chapter,
(SELECT count(ticket.id) AS Matches FROM engine.ticket INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status IN ('new','assigned') ) AS 'New',
(SELECT count(ticket.id) AS Matches FROM engine.ticket INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='document_interface' ) AS 'Document\
 Interface',
(SELECT count(ticket.id) AS Matches FROM engine.ticket INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='interface_development' ) AS 'Inter\
face Development',
(SELECT count(ticket.id) AS Matches FROM engine.ticket INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='interface_check' ) AS 'Interface C\
heck',
(SELECT count(ticket.id) AS Matches FROM engine.ticket INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='document_routine' ) AS 'Document R\
outine',
(SELECT count(ticket.id) AS Matches FROM engine.ticket INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='full_development' ) AS 'Full Devel\
opment',
(SELECT count(ticket.id) AS Matches FROM engine.ticket INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='peer_review_1' ) AS 'Peer Review O\
ne',
(SELECT count(ticket.id) AS Matches FROM engine.ticket INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%'AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='peer_review_2' ) AS 'Peer Review Tw\
o',
(SELECT count(ticket.id) AS Matches FROM engine.ticket INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='qa' ) AS 'QA',
(SELECT count(ticket.id) AS Matches FROM engine.ticket INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%'AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='closed' ) AS 'Closed',
count(id) AS Total,
ticket.id AS _id
FROM engine.ticket
INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine'

5 个解决方案

#1


2  

Not going to completely rewrite it... but here's my suggestion:

不会完全重写……但这是我的建议:

SELECT '%c%' as Chapter,
    SUM(CASE WHEN ticket.status IN ('new','assigned') THEN 1 ELSE 0 END) as `New`,
    ...
    SUM(CASE WHEN ticket.status='closed' THEN 1 ELSE 0 END) as 'Closed',
    count(id) AS Total,
    ticket.id AS _id
FROM engine.ticket
INNER JOIN engine.ticket_custom 
    ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' 
    AND ticket_custom.value LIKE '%c%' 
    AND type='New material' 
    AND milestone='1.1.12' 
    AND component NOT LIKE 'internal_engine'
GROUP BY ticket.id
;

#2


2  

Instead of making a subquery for every count, use a case to count from the data that already is fetched for the query:

不要对每个计数进行子查询,而是使用case从查询中已经获取的数据中进行计数:

select '%c%' as Chapter,
  sum(case when ticket.status IN ('new','assigned') then 1 else 0 end) as 'New',
  sum(case when ticket.status='document_interface' then 1 else 0 end) as 'DocumentInterface',
  sum(case when ticket.status='interface_development' then 1 else 0 end) as 'Interface Development',
  sum(case when ticket.status='interface_check' then 1 else 0 end) as 'Interface Check',
  sum(case when ticket.status='document_routine' then 1 else 0 end) as 'Document Routine',
  sum(case when ticket.status='full_development' then 1 else 0 end) as 'Full Development',
  sum(case when ticket.status='peer_review_1' then 1 else 0 end) as 'Peer Review One',
  sum(case when ticket.status='peer_review_2' then 1 else 0 end) as 'Peer Review Two',
  sum(case when ticket.status='qa' then 1 else 0 end) as 'QA',
  sum(case when ticket.status='closed' then 1 else 0 end) as 'Closed',
  count(id) as Total,
  ticket.id as _id
from
  engine.ticket
  inner join engine.ticket_custom on ticket.id = ticket_custom.ticket
where
  ticket_custom.name='chapter' and
  ticket_custom.value LIKE '%c%' and
  type='New material' and
  milestone='1.1.12' and
  component NOT LIKE 'internal_engine'

#3


1  

When it comes to generating complex reports like this through Trac, it's better to not use a report at all. Reports are OK for relatively simple queries, but they become unwieldy when you are organizing that many different stages and chapters.

当涉及到通过Trac来生成复杂的报告时,最好不要使用报告。对于相对简单的查询,报表是可以接受的,但是当您组织许多不同的阶段和章节时,报表就变得难以处理了。

Instead, try creating your "report" using a wiki page. This will give you more control over layout and presentation, plus you can avoid writing SQL altogether. Here's some sample wiki code for a faux report that shows all non-closed tickets grouped by milestone and then by status:

相反,尝试使用wiki页面创建“报告”。这将使您能够更好地控制布局和表示,并且您可以避免编写SQL。下面是一个假报告的wiki代码示例,它显示了所有按里程碑分组、然后按状态分组的非封闭票据:

= Custom Report =
My custom report, as a wiki page

== Tickets for Milestone A ==
[[TicketQuery(milestone=MilestoneA,status!=closed,group=status,format=table)]]

== Tickets for Milestone B ==
[[TicketQuery(milestone=MilestoneB,status!=closed,group=status,format=table)]]

...

I don't know how you have your stages and chapters defined, so you'll need to tweak the query parameters to use the appropriate ticket fields. You can also play around with format and other options to tweak the output format.

我不知道您是如何定义阶段和章节的,所以您需要调整查询参数以使用适当的票证字段。您还可以使用格式和其他选项来调整输出格式。

That's just a simple example. The TicketQuery macro is capable of generating more complex reports. For an example of something that's probably closer (complexity-wise) to what you are looking for, take a look at the Trac project's release notes. The entire "Detailed List of Changes" section is generated with a single TicketQuery macro (hit the 'edit' button to see how they did it).

这只是一个简单的例子。TicketQuery宏能够生成更复杂的报告。对于一个可能更接近(复杂性方面)您正在寻找的东西的示例,请查看Trac项目的发布说明。完整的“更改列表”部分是通过一个TicketQuery宏生成的(点击“编辑”按钮,看看他们是如何做到的)。


Another example

另一个例子

To re-create the chart that you linked to, you can do something like this:

为了重新创建你链接到的图表,你可以这样做:

||= **Id** =||= **Enhancements** =||= **Defects** =||= **Tasks** =||
||[milestone:v1.0 v1.0] || [[TicketQuery(milestone=v1.0,type=enhancement,format=count)]] || [[TicketQuery(milestone=v1.0,type=defect,format=count)]] || [[TicketQuery(milestone=v1.0,type=task,format=count)]] ||
... repeat for each milestone ...

Essentially, using format=count in the macro gives you what the database is returning for one of the sum(case ...) statements in Guffa's answer.

本质上,在宏中使用format=count为您提供了Guffa的答案中的sum(case…)语句返回的内容。

The big benefit of doing this using macros instead of direct SQL is that it becomes database-agnostic. You don't have to worry about differences between database engines (mysql, sqlite, etc), changes in Trac's database layout, special code for handling custom fields, etc.

使用宏而不是直接SQL来实现这一点的最大好处是,它与数据库无关。您不必担心数据库引擎(mysql、sqlite等)、Trac数据库布局的更改、处理自定义字段的特殊代码等之间的差异。

#4


0  

Yeek, that's ugly. Do you HAVE to fetch all that data in a single row? That many subqueries is going to slam the server. Can you not do a standard grouped fetch and do the pivot table stuff client-side?

Yeek,丑陋。你必须在一行中获取所有的数据吗?那么多的子查询会使服务器崩溃。你能不能做一个标准的分组取回,并做数据透视表的客户端?

e.g.

如。

SELECT count(ticket.id) AS Matches, ticket_custom.name, ticket.status
FROM engine.ticket
INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='qa' AND (ticket.status IN (........))
GROUP BY ticket.id, ticket_custom.name

then, in pseudo-code:

然后,在伪代码:

data = array()
while(row = fetch($result)) {
   data[ticket.id]][ticket.status] = row[ticket.status];
}

This way you'd be running only one single query, and doing some minor extra work in the fetch logic to recreate what your original query is laboring to do.

这样,您将只运行一个查询,并在fetch逻辑中做一些次要的额外工作,以重新创建您的原始查询要做的工作。

#5


0  

I ended up writing my own trac reporting system in php. This allows a lot more flexibility to create the kind of reports I need and doesn't involve ridiculous mysql queries. Too intergrate the script with the trac project the report items i.e the number of tickets at stage QA in chapter XX are hyperlinks to the tickets themselves in trac: http://myhost.co.uk/trac-project/query?id=10&id=15. Also the NavAddPluggin allows the main trac navigation bar to be customized so a menu option that links to the script can be added.

最后我用php编写了我自己的trac报告系统。这使得创建我需要的报告具有更大的灵活性,并且不涉及荒谬的mysql查询。太过将脚本与trac项目报告项目i交织在一起。第XX章QA阶段的票数是trac中的票的超链接:http://myhost.co.uk/trac-project/query?id=10&id=15。此外,NavAddPluggin还允许定制主trac导航条,以便添加到脚本的菜单选项。

#1


2  

Not going to completely rewrite it... but here's my suggestion:

不会完全重写……但这是我的建议:

SELECT '%c%' as Chapter,
    SUM(CASE WHEN ticket.status IN ('new','assigned') THEN 1 ELSE 0 END) as `New`,
    ...
    SUM(CASE WHEN ticket.status='closed' THEN 1 ELSE 0 END) as 'Closed',
    count(id) AS Total,
    ticket.id AS _id
FROM engine.ticket
INNER JOIN engine.ticket_custom 
    ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' 
    AND ticket_custom.value LIKE '%c%' 
    AND type='New material' 
    AND milestone='1.1.12' 
    AND component NOT LIKE 'internal_engine'
GROUP BY ticket.id
;

#2


2  

Instead of making a subquery for every count, use a case to count from the data that already is fetched for the query:

不要对每个计数进行子查询,而是使用case从查询中已经获取的数据中进行计数:

select '%c%' as Chapter,
  sum(case when ticket.status IN ('new','assigned') then 1 else 0 end) as 'New',
  sum(case when ticket.status='document_interface' then 1 else 0 end) as 'DocumentInterface',
  sum(case when ticket.status='interface_development' then 1 else 0 end) as 'Interface Development',
  sum(case when ticket.status='interface_check' then 1 else 0 end) as 'Interface Check',
  sum(case when ticket.status='document_routine' then 1 else 0 end) as 'Document Routine',
  sum(case when ticket.status='full_development' then 1 else 0 end) as 'Full Development',
  sum(case when ticket.status='peer_review_1' then 1 else 0 end) as 'Peer Review One',
  sum(case when ticket.status='peer_review_2' then 1 else 0 end) as 'Peer Review Two',
  sum(case when ticket.status='qa' then 1 else 0 end) as 'QA',
  sum(case when ticket.status='closed' then 1 else 0 end) as 'Closed',
  count(id) as Total,
  ticket.id as _id
from
  engine.ticket
  inner join engine.ticket_custom on ticket.id = ticket_custom.ticket
where
  ticket_custom.name='chapter' and
  ticket_custom.value LIKE '%c%' and
  type='New material' and
  milestone='1.1.12' and
  component NOT LIKE 'internal_engine'

#3


1  

When it comes to generating complex reports like this through Trac, it's better to not use a report at all. Reports are OK for relatively simple queries, but they become unwieldy when you are organizing that many different stages and chapters.

当涉及到通过Trac来生成复杂的报告时,最好不要使用报告。对于相对简单的查询,报表是可以接受的,但是当您组织许多不同的阶段和章节时,报表就变得难以处理了。

Instead, try creating your "report" using a wiki page. This will give you more control over layout and presentation, plus you can avoid writing SQL altogether. Here's some sample wiki code for a faux report that shows all non-closed tickets grouped by milestone and then by status:

相反,尝试使用wiki页面创建“报告”。这将使您能够更好地控制布局和表示,并且您可以避免编写SQL。下面是一个假报告的wiki代码示例,它显示了所有按里程碑分组、然后按状态分组的非封闭票据:

= Custom Report =
My custom report, as a wiki page

== Tickets for Milestone A ==
[[TicketQuery(milestone=MilestoneA,status!=closed,group=status,format=table)]]

== Tickets for Milestone B ==
[[TicketQuery(milestone=MilestoneB,status!=closed,group=status,format=table)]]

...

I don't know how you have your stages and chapters defined, so you'll need to tweak the query parameters to use the appropriate ticket fields. You can also play around with format and other options to tweak the output format.

我不知道您是如何定义阶段和章节的,所以您需要调整查询参数以使用适当的票证字段。您还可以使用格式和其他选项来调整输出格式。

That's just a simple example. The TicketQuery macro is capable of generating more complex reports. For an example of something that's probably closer (complexity-wise) to what you are looking for, take a look at the Trac project's release notes. The entire "Detailed List of Changes" section is generated with a single TicketQuery macro (hit the 'edit' button to see how they did it).

这只是一个简单的例子。TicketQuery宏能够生成更复杂的报告。对于一个可能更接近(复杂性方面)您正在寻找的东西的示例,请查看Trac项目的发布说明。完整的“更改列表”部分是通过一个TicketQuery宏生成的(点击“编辑”按钮,看看他们是如何做到的)。


Another example

另一个例子

To re-create the chart that you linked to, you can do something like this:

为了重新创建你链接到的图表,你可以这样做:

||= **Id** =||= **Enhancements** =||= **Defects** =||= **Tasks** =||
||[milestone:v1.0 v1.0] || [[TicketQuery(milestone=v1.0,type=enhancement,format=count)]] || [[TicketQuery(milestone=v1.0,type=defect,format=count)]] || [[TicketQuery(milestone=v1.0,type=task,format=count)]] ||
... repeat for each milestone ...

Essentially, using format=count in the macro gives you what the database is returning for one of the sum(case ...) statements in Guffa's answer.

本质上,在宏中使用format=count为您提供了Guffa的答案中的sum(case…)语句返回的内容。

The big benefit of doing this using macros instead of direct SQL is that it becomes database-agnostic. You don't have to worry about differences between database engines (mysql, sqlite, etc), changes in Trac's database layout, special code for handling custom fields, etc.

使用宏而不是直接SQL来实现这一点的最大好处是,它与数据库无关。您不必担心数据库引擎(mysql、sqlite等)、Trac数据库布局的更改、处理自定义字段的特殊代码等之间的差异。

#4


0  

Yeek, that's ugly. Do you HAVE to fetch all that data in a single row? That many subqueries is going to slam the server. Can you not do a standard grouped fetch and do the pivot table stuff client-side?

Yeek,丑陋。你必须在一行中获取所有的数据吗?那么多的子查询会使服务器崩溃。你能不能做一个标准的分组取回,并做数据透视表的客户端?

e.g.

如。

SELECT count(ticket.id) AS Matches, ticket_custom.name, ticket.status
FROM engine.ticket
INNER JOIN engine.ticket_custom ON ticket.id = ticket_custom.ticket
WHERE ticket_custom.name='chapter' AND ticket_custom.value LIKE '%c%' AND type='New material' AND milestone='1.1.12' AND component NOT LIKE 'internal_engine' AND ticket.status='qa' AND (ticket.status IN (........))
GROUP BY ticket.id, ticket_custom.name

then, in pseudo-code:

然后,在伪代码:

data = array()
while(row = fetch($result)) {
   data[ticket.id]][ticket.status] = row[ticket.status];
}

This way you'd be running only one single query, and doing some minor extra work in the fetch logic to recreate what your original query is laboring to do.

这样,您将只运行一个查询,并在fetch逻辑中做一些次要的额外工作,以重新创建您的原始查询要做的工作。

#5


0  

I ended up writing my own trac reporting system in php. This allows a lot more flexibility to create the kind of reports I need and doesn't involve ridiculous mysql queries. Too intergrate the script with the trac project the report items i.e the number of tickets at stage QA in chapter XX are hyperlinks to the tickets themselves in trac: http://myhost.co.uk/trac-project/query?id=10&id=15. Also the NavAddPluggin allows the main trac navigation bar to be customized so a menu option that links to the script can be added.

最后我用php编写了我自己的trac报告系统。这使得创建我需要的报告具有更大的灵活性,并且不涉及荒谬的mysql查询。太过将脚本与trac项目报告项目i交织在一起。第XX章QA阶段的票数是trac中的票的超链接:http://myhost.co.uk/trac-project/query?id=10&id=15。此外,NavAddPluggin还允许定制主trac导航条,以便添加到脚本的菜单选项。