Here is the query:
在这里查询:
SELECT name, SUM( `count` ) AS Total
FROM `identdb`
WHERE MBRCONTAINS( GEOMFROMTEXT( 'LineString(34.4 -119.9, 34.5 -119.8)' ) , latlng )
AND MOD( DAYOFYEAR( CURDATE( ) ) - DAYOFYEAR( `date` ) +365, 365 ) <=14
OR MOD( DAYOFYEAR( `date` ) - DAYOFYEAR( CURDATE( ) ) +365, 365 ) <=14
AND MBRCONTAINS( GEOMFROMTEXT( 'LineString(34.4 -119.9, 34.5 -119.8)' ) , latlng )
GROUP BY `name`
It essentially finds any rows where the day of year is plus or minus 14 of today's day, and rows that the latlng spatial column is in the rectangle.
它本质上是找到任何一天的正负14的行,以及latlng空间列在矩形中的行。
Here is what my database looks like:
我的数据库是这样的:
# Column Type Collation
1 name varchar(66) utf8_general_ci
2 count tinyint(3)
3 date date
4 latlng geometry
5 lat1 varchar(15) latin1_swedish_ci
6 long1 varchar(15) latin1_swedish_ci
Keyname Type Unique Packed Column Cardinality Collation Null Comment
PRIMARY BTREE Yes No name 0 A
count 0 A
date 0 A
lat1 0 A
long1 6976936 A
sp_index SPATIAL No No latlng (32) 0 A
There are 7 million records and the query is taking about 7 seconds. I have no clue how to speed this up, thanks in advance!
有700万条记录,查询大约需要7秒。我不知道如何加快这个速度,谢谢!
EXPLAIN:
解释:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE identdb ALL sp_index NULL NULL NULL 6976936 Using where; Using temporary; Using filesort
UPDATED explanation of query: I believe MBRCONTAINS creates a rectangle where I can compare whether the latlng spatial point is inside or not. The date part is finding dayofyear + or - 14 days. It is using modular arithmetic so that it won't mess up around the new years. I had to put the MBRCONTAINS part in twice because of the use of OR.
更新的查询解释:我认为MBRCONTAINS创建了一个矩形,我可以在其中比较latlng空间点是否在内部。日期部分是找到一年中的一天+或- 14天。它使用的是模块化算法,这样在新的一年里就不会出问题。由于OR的使用,我不得不将MBRCONTAINS部分放入两次。
My needs of the query are to find find all name
s that have a day of the year + or - 14 days, and are within the given lat/long pairs, and then total the counts for each.
我的查询需要找到所有的名称,这些名称有一年中的一天+或- 14天,并且在给定的lat/long对中,然后计算每个名称的计数。
I'm dumb at this stuff so please correct me if I'm doing something dumb. Thanks guys!
我在这方面很笨,所以如果我在做傻事,请纠正我。谢谢你们了!
1 个解决方案
#1
6
Rewrite it so that your calculations happen once per query, rather than once per row by expressing your predicates such that the column is not part of the calculation.
重写它,使计算在每个查询中发生一次,而不是在每行中进行一次,方法是表示谓词,使列不属于计算。
For example, this expression:
例如,这个表达式:
MOD( DAYOFYEAR( CURDATE( ) ) - DAYOFYEAR( `date` ) +365, 365 ) <= 14
which requires 7 millions calculations on date
, can be expressed as
这需要700万次计算,可以表示为
`date` between SUBDATE( CURDATE( ), 14) and ADDDATE( CURDATE( ), 14)
which requires only 1 calculation and further would allow an index on the date
column to be used.
That change alone will speed up your query.
这只需要1个计算,进一步将允许使用日期列上的索引。这个更改将加快查询速度。
if you don't have an index on date, put one and your query will fly:
如果你没有一个索引的日期,放一个,你的查询将会飞:
create index mytable_date on mytable(`date`);
I don't know what MBRCONTAINS
does, but try to refactor it too so that the column value is not part of the calculation.
我不知道MBRCONTAINS是怎么做的,但是尝试重构它,使列值不属于计算的一部分。
#1
6
Rewrite it so that your calculations happen once per query, rather than once per row by expressing your predicates such that the column is not part of the calculation.
重写它,使计算在每个查询中发生一次,而不是在每行中进行一次,方法是表示谓词,使列不属于计算。
For example, this expression:
例如,这个表达式:
MOD( DAYOFYEAR( CURDATE( ) ) - DAYOFYEAR( `date` ) +365, 365 ) <= 14
which requires 7 millions calculations on date
, can be expressed as
这需要700万次计算,可以表示为
`date` between SUBDATE( CURDATE( ), 14) and ADDDATE( CURDATE( ), 14)
which requires only 1 calculation and further would allow an index on the date
column to be used.
That change alone will speed up your query.
这只需要1个计算,进一步将允许使用日期列上的索引。这个更改将加快查询速度。
if you don't have an index on date, put one and your query will fly:
如果你没有一个索引的日期,放一个,你的查询将会飞:
create index mytable_date on mytable(`date`);
I don't know what MBRCONTAINS
does, but try to refactor it too so that the column value is not part of the calculation.
我不知道MBRCONTAINS是怎么做的,但是尝试重构它,使列值不属于计算的一部分。