优化更新命令,该命令针对每一行聚合整个表

时间:2021-05-06 23:07:10

I have 2 tables, company and investment, and I need to have a column in the company table containing the number of investment each company has.

我有2个表,公司和投资,我需要在公司表中有一个列,其中包含每个公司的投资数量。

This a sqlite database, by the way.

顺便说一句,这是一个sqlite数据库。

I tried the following query:

我尝试了以下查询:

UPDATE company SET numlinks = (SELECT count(*) 
                                   FROM investment 
                                   WHERE investment.company_name = company.name);

I'm pretty sure the query is right. If I run it for a single company the row is updated correctly. But given that I have over 300K rows, the query starts running and it seems to taking a while.

我很确定查询是对的。如果我为一家公司运行它,行就会正确更新。但鉴于我有超过300K行,查询开始运行,似乎需要一段时间。

When running it for a single company with the .timer ON command, the CPU used is about 0.03 (I'm not sure the units, I guess its in seconds)

当使用.timer ON命令为单个公司运行它时,使用的CPU大约为0.03(我不确定单位,我想它的秒数)

Any ideas on how I could make this faster?

关于如何让它更快的任何想法?

2 个解决方案

#1


1  

This is the easiest solution.

这是最简单的解决方案。

alter table company
drop column numlinks

One of the first principles of database normalization is to not store calculated values. When you want to display the number of links, query it when you need it.

数据库规范化的首要原则之一是不存储计算值。如果要显示链接数,请在需要时查询。

select company.name, other_stuff, count(*) links
from company join investment on company.name = investment.company_name
group by company.name, other_stuff

As an aside, with your current design, you are in trouble if two companies have the same name. That's why name fields are rarely used to identify a record.

另外,根据您目前的设计,如果两家公司的名称相同,您就会遇到麻烦。这就是名称字段很少用于识别记录的原因。

If have trouble understanding this answer, I've heard good things about the book, Database Design for Mere Mortals.

如果难以理解这个答案,我已经听过关于这本书的数据库设计的好消息。

#2


1  

What you would like to do is to calculate the summaries and then join them into the update statement. Unfortunately, SQLite does not support joins (here), except through such correlated subquery syntax.

您想要做的是计算摘要,然后将它们加入到更新语句中。不幸的是,SQLite不支持连接(这里),除非通过这种相关的子查询语法。

One way to make this faster is to create a temporary table with:

提高速度的一种方法是创建一个临时表:

select company_name, count(*) as cnt
from investment
group by company_name

And then do the same thing with this table:

然后用这个表做同样的事情:

set numlinks = (select cnt from TemporaryTable)

The performance advantage is twofold. First you don't have to re-do the calculation for each row. More importantly, you can create an index on company_name, significantly speeding up the query.

性能优势是双重的。首先,您不必为每一行重新计算。更重要的是,您可以在company_name上创建索引,从而显着加快查询速度。

#1


1  

This is the easiest solution.

这是最简单的解决方案。

alter table company
drop column numlinks

One of the first principles of database normalization is to not store calculated values. When you want to display the number of links, query it when you need it.

数据库规范化的首要原则之一是不存储计算值。如果要显示链接数,请在需要时查询。

select company.name, other_stuff, count(*) links
from company join investment on company.name = investment.company_name
group by company.name, other_stuff

As an aside, with your current design, you are in trouble if two companies have the same name. That's why name fields are rarely used to identify a record.

另外,根据您目前的设计,如果两家公司的名称相同,您就会遇到麻烦。这就是名称字段很少用于识别记录的原因。

If have trouble understanding this answer, I've heard good things about the book, Database Design for Mere Mortals.

如果难以理解这个答案,我已经听过关于这本书的数据库设计的好消息。

#2


1  

What you would like to do is to calculate the summaries and then join them into the update statement. Unfortunately, SQLite does not support joins (here), except through such correlated subquery syntax.

您想要做的是计算摘要,然后将它们加入到更新语句中。不幸的是,SQLite不支持连接(这里),除非通过这种相关的子查询语法。

One way to make this faster is to create a temporary table with:

提高速度的一种方法是创建一个临时表:

select company_name, count(*) as cnt
from investment
group by company_name

And then do the same thing with this table:

然后用这个表做同样的事情:

set numlinks = (select cnt from TemporaryTable)

The performance advantage is twofold. First you don't have to re-do the calculation for each row. More importantly, you can create an index on company_name, significantly speeding up the query.

性能优势是双重的。首先,您不必为每一行重新计算。更重要的是,您可以在company_name上创建索引,从而显着加快查询速度。