如何在不索引更多字段的情况下提高计数性能?

时间:2022-12-07 04:17:47

There are over 2 millions record in the table.

这张桌子上有超过200万的记录。

I want to count how many errors (with checked) in the table and how many has been checked.

我要计算表中有多少错误(已检查),以及检查了多少错误。

I do two queries:

我做两个查询:

SELECT count(*) as CountError FROM table WHERE checked = 1 AND error != ''

-

- - - - - -

SELECT count(*) as Checked FROM table WHERE checked = 1

The performance is really slow, it take about 5 mins to get the result. How to improve this?

表演真的很慢,大概5分钟就能得到结果。如何改善呢?

I have already have index on status field for the UPDATE performance.

我已经有了更新性能的状态字段索引。

If I index on checked field - then UPDATE performance will be effected which I do not want that.

如果我在已检查的字段上建立索引,那么更新性能将受到影响,这是我不希望看到的。

UPDATE happen more than SELECT.

更新发生的次数多于选择。

The table are Innob

表是Innob

2 个解决方案

#1


3  

You can try if making both counts in the same query is faster:

如果在同一个查询中同时进行这两个计数更快,您可以尝试:

select
  count(*) as CountError,
  sum(case when error != '' then 1 else 0 end) as Checked
from table
where checked = 1

However, the difference will probably not be much to talk about. If you really want a difference then you need to add an index. Consider what the impact really would mean, and make an actual test to get a feel for what the impact could really be. If the update gets 10% slower and the select gets 100000% faster, then it might still be worth it.

然而,两者之间的区别可能不会太大。如果你真的想要有所不同,那么你需要添加一个索引。考虑一下真正的影响是什么,并进行实际的测试,以了解其影响到底是什么。如果更新速度慢了10%,选择速度快了100000%,那么还是值得的。

#2


0  

Your problem here is simply that your checked field is either 1 or 0 which means that MySQL needs to do a table scan even though you have a key as it's unable to efficiently determine where the split between 0 and 1 is, especially on large amounts of rows.

这里的问题是,检查字段是1或0,这意味着MySQL需要进行表扫描,即使您有一个键,因为它无法有效地确定0和1之间的分界在哪里,特别是在大量的行上。

The main advisory I would offer is the one which you don't want, which is to index checked as then SELECT SUM(checked) AS Checked FROM table WHERE checked=1 would be able to use the index without hitting the table.

我要提供的主要建议是您不想要的,即索引检查为然后从表中选择SUM(勾选)为check(勾选),其中check =1可以使用索引而不会撞到表。

Ultimately though, that's not a trivial query. You may wish to look at some way of archiving counts. If you have a date or timestamp then you could set up a task daily which would could store the count(*)'s for the previous day. That in turn would leave you fewer rows to parse on-the-fly.

最终,这不是一个简单的查询。您可能希望查看一些归档计数的方法。如果您有一个日期或时间戳,那么您可以每天设置一个任务,该任务可以存储前一天的计数(*)。这样,您就可以减少动态解析的行数。

Without further information as to the exact purpose of this table, the reason why you won't allow an index on that column etc. it is hard to suggest anything more helpful than the above + throwing hardware at it.

如果没有关于该表的确切目的、不允许在该列上建立索引的原因等的进一步信息,很难提出比上面+向该表投掷硬件更有帮助的内容。

#1


3  

You can try if making both counts in the same query is faster:

如果在同一个查询中同时进行这两个计数更快,您可以尝试:

select
  count(*) as CountError,
  sum(case when error != '' then 1 else 0 end) as Checked
from table
where checked = 1

However, the difference will probably not be much to talk about. If you really want a difference then you need to add an index. Consider what the impact really would mean, and make an actual test to get a feel for what the impact could really be. If the update gets 10% slower and the select gets 100000% faster, then it might still be worth it.

然而,两者之间的区别可能不会太大。如果你真的想要有所不同,那么你需要添加一个索引。考虑一下真正的影响是什么,并进行实际的测试,以了解其影响到底是什么。如果更新速度慢了10%,选择速度快了100000%,那么还是值得的。

#2


0  

Your problem here is simply that your checked field is either 1 or 0 which means that MySQL needs to do a table scan even though you have a key as it's unable to efficiently determine where the split between 0 and 1 is, especially on large amounts of rows.

这里的问题是,检查字段是1或0,这意味着MySQL需要进行表扫描,即使您有一个键,因为它无法有效地确定0和1之间的分界在哪里,特别是在大量的行上。

The main advisory I would offer is the one which you don't want, which is to index checked as then SELECT SUM(checked) AS Checked FROM table WHERE checked=1 would be able to use the index without hitting the table.

我要提供的主要建议是您不想要的,即索引检查为然后从表中选择SUM(勾选)为check(勾选),其中check =1可以使用索引而不会撞到表。

Ultimately though, that's not a trivial query. You may wish to look at some way of archiving counts. If you have a date or timestamp then you could set up a task daily which would could store the count(*)'s for the previous day. That in turn would leave you fewer rows to parse on-the-fly.

最终,这不是一个简单的查询。您可能希望查看一些归档计数的方法。如果您有一个日期或时间戳,那么您可以每天设置一个任务,该任务可以存储前一天的计数(*)。这样,您就可以减少动态解析的行数。

Without further information as to the exact purpose of this table, the reason why you won't allow an index on that column etc. it is hard to suggest anything more helpful than the above + throwing hardware at it.

如果没有关于该表的确切目的、不允许在该列上建立索引的原因等的进一步信息,很难提出比上面+向该表投掷硬件更有帮助的内容。