在4+ m行表上创建几个不同查询的索引。

时间:2021-09-01 21:23:15

The table is currently a 4+ million (~50 GB) row table and growing rapidly.

该表目前为4+ million (~ 50gb)行表,并快速增长。

We don't want to include any rows where the EndTime is invalid and thus less than StartTime, because there's at least 1,000 rows where it's zero.

我们不希望包含任何行,因为它的结束时间是无效的,因此小于StartTime,因为至少有1,000行是0。

My question is what kind index would be best for these three queries? I'm guessing maybe a composite index with EndTime first and StartTime second?

我的问题是,对于这三个查询,哪种索引是最好的?我猜可能是一个包含EndTime和StartTime秒的复合索引?

The StartTime and EndTime fields both contain unix timestamps like: 1401951888

StartTime和EndTime字段都包含unix时间戳,比如:1401951888。


SELECT AVG(EndTime-StartTime) FROM sessions WHERE EndTime>StartTime;
SELECT MAX(EndTime-StartTime) FROM sessions WHERE EndTime>StartTime;
SELECT MIN(EndTime-StartTime) FROM sessions WHERE EndTime>StartTime;

+----------------------+------------+------+-----+---------+-------+
| Field                | Type       | Null | Key | Default | Extra |
+----------------------+------------+------+-----+---------+-------+
| Uuid                 | char(36)   | NO   | PRI | NULL    |       |
| StartTime            | int(11)    | YES  |     | NULL    |       |
| EndTime              | int(11)    | YES  |     | NULL    |       |
+----------------------+------------+------+-----+---------+-------+

1 个解决方案

#1


4  

The table is currently a 4+ million (~50 GB) row table and growing rapidly.

该表目前为4+ million (~ 50gb)行表,并快速增长。

4M rows with just those 3 columns and it's 50GB? Wow... is there a problem somewhere?

400万行,只有这3列,它是50GB?哇……有什么问题吗?

We don't want to include any rows where the EndTime is invalid and thus less than StartTime, because there's at least 1,000 rows where it's zero.

我们不希望包含任何行,因为它的结束时间是无效的,因此小于StartTime,因为至少有1,000行是0。

Since there are no other conditions, the query will have to process the entire table, minus 1000 rows. Therefore, any index will be useless.

由于没有其他条件,查询将不得不处理整个表,减去1000行。因此,任何索引都是无用的。

Unless the table has lots more columns than you showed, in which case the only use for the index will be to be much smaller than the table on-disk, therefore much faster to scan.

除非表的列比您所显示的多,否则索引的惟一用途将比磁盘上的表小得多,因此扫描速度要快得多。

Now, in recent versions of MySQL, you can now create functional indexes on virtual columns! Therefore, you can create an index on:

现在,在MySQL的最新版本中,您可以在虚拟列上创建函数索引!因此,您可以创建一个索引:

endTime - startTime

If your max() and min() use the index, they will be instantaneous, since finding the min/max in a sorted set is a O(1) operation which only needs to look at the first or last entry. However, your avg() will, of course, have to examine all rows to compute the average.

如果您的max()和min()使用索引,那么它们将是即时的,因为在已排序的集合中找到最小/最大值是一个O(1)操作,只需要查看第一个或最后一个条目。然而,您的avg()当然必须检查所有行来计算平均值。

#1


4  

The table is currently a 4+ million (~50 GB) row table and growing rapidly.

该表目前为4+ million (~ 50gb)行表,并快速增长。

4M rows with just those 3 columns and it's 50GB? Wow... is there a problem somewhere?

400万行,只有这3列,它是50GB?哇……有什么问题吗?

We don't want to include any rows where the EndTime is invalid and thus less than StartTime, because there's at least 1,000 rows where it's zero.

我们不希望包含任何行,因为它的结束时间是无效的,因此小于StartTime,因为至少有1,000行是0。

Since there are no other conditions, the query will have to process the entire table, minus 1000 rows. Therefore, any index will be useless.

由于没有其他条件,查询将不得不处理整个表,减去1000行。因此,任何索引都是无用的。

Unless the table has lots more columns than you showed, in which case the only use for the index will be to be much smaller than the table on-disk, therefore much faster to scan.

除非表的列比您所显示的多,否则索引的惟一用途将比磁盘上的表小得多,因此扫描速度要快得多。

Now, in recent versions of MySQL, you can now create functional indexes on virtual columns! Therefore, you can create an index on:

现在,在MySQL的最新版本中,您可以在虚拟列上创建函数索引!因此,您可以创建一个索引:

endTime - startTime

If your max() and min() use the index, they will be instantaneous, since finding the min/max in a sorted set is a O(1) operation which only needs to look at the first or last entry. However, your avg() will, of course, have to examine all rows to compute the average.

如果您的max()和min()使用索引,那么它们将是即时的,因为在已排序的集合中找到最小/最大值是一个O(1)操作,只需要查看第一个或最后一个条目。然而,您的avg()当然必须检查所有行来计算平均值。