How can you join between a table with a sparse number of dates and another table with an exhaustive number of dates such that the gaps between the sparse dates take the values of the previous sparse date?
如何在具有稀疏日期数的表和另一个具有详尽日期数的表之间进行连接,以使稀疏日期之间的间隙采用上一个稀疏日期的值?
Illustrative example:
PRICE table (sparse dates):
date itemid price
2008-12-04 1 $1
2008-12-11 1 $3
2008-12-15 1 $7
VOLUME table (exhaustive dates):
date itemid volume_amt
2008-12-04 1 12345
2008-12-05 1 23456
2008-12-08 1 34567
2008-12-09 1 ...
2008-12-10 1
2008-12-11 1
2008-12-12 1
2008-12-15 1
2008-12-16 1
2008-12-17 1
2008-12-18 1
Desired result:
date price volume_amt
2008-12-04 $1 12345
2008-12-05 $1 23456
2008-12-08 $1 34567
2008-12-09 $1 ...
2008-12-10 $1
2008-12-11 $3
2008-12-12 $3
2008-12-15 $7
2008-12-16 $7
2008-12-17 $7
2008-12-18 $7
Update:
A couple people have suggested a correlated subquery that accomplishes the desired result. (Correlated subquery = a subquery that contains a reference to the outer query.)
有几个人建议使用相关的子查询来完成所需的结果。 (Correlated subquery =包含对外部查询的引用的子查询。)
This will work; however, I should have noted that the platform I'm using is MySQL, for which correlated subqueries are poorly optimized. Any way to do it without using a correlated subquery?
这将有效;但是,我应该注意到我使用的平台是MySQL,其相关子查询的优化程度很低。没有使用相关子查询的任何方法吗?
5 个解决方案
#1
This isn't as simple as a single LEFT OUTER JOIN to the sparse table, because you want the NULLs left by the outer join to be filled with the most recent price.
这不像稀疏表的单个LEFT OUTER JOIN那么简单,因为您希望外部联接留下的NULL用最近的价格填充。
EXPLAIN SELECT v.`date`, v.volume_amt, p1.item_id, p1.price
FROM Volume v JOIN Price p1
ON (v.`date` >= p1.`date` AND v.item_id = p1.item_id)
LEFT OUTER JOIN Price p2
ON (v.`date` >= p2.`date` AND v.item_id = p2.item_id
AND p1.`date` < p2.`date`)
WHERE p2.item_id IS NULL;
This query matches Volume to all rows in Price that are earlier, and then uses another join to make sure we find only the most recent price.
此查询将Volume与Price之前的所有行匹配,然后使用另一个联接以确保我们只找到最近的价格。
I tested this on MySQL 5.0.51. It uses neither correlated subqueries nor group by.
我在MySQL 5.0.51上测试了这个。它既不使用相关子查询也不使用group by。
edit: Updated the query to match to item_id as well as date. This seems to work too. I created an index on (date)
and an index on (date, item_id)
and the EXPLAIN plan was identical. An index on (item_id, date)
may be better in this case. Here's the EXPLAIN output for that:
编辑:更新了查询以匹配item_id和日期。这似乎也有效。我在(日期)上创建了一个索引,在(date,item_id)上创建了索引,并且EXPLAIN计划是相同的。在这种情况下,(item_id,date)的索引可能更好。这是EXPLAIN输出:
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
| 1 | SIMPLE | p1 | ALL | item_id | NULL | NULL | NULL | 6 | |
| 1 | SIMPLE | v | ref | item_id | item_id | 22 | test.p1.item_id | 3 | Using where |
| 1 | SIMPLE | p2 | ref | item_id | item_id | 22 | test.v.item_id | 1 | Using where; Using index; Not exists |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
But I have a very small data set, and the optimization may depend on larger data sets. You should experiment, analyzing the optimization using a larger data set.
但我有一个非常小的数据集,优化可能依赖于更大的数据集。您应该尝试使用更大的数据集分析优化。
edit: I pasted the wrong EXPLAIN output before. The one above is corrected, and shows better use of the (item_id, date)
index.
编辑:之前我粘贴了错误的EXPLAIN输出。上面的一个更正,并显示更好地使用(item_id,date)索引。
#2
Assuming there is only 1 price per date/itemid:
假设每个日期/ itemid只有1个价格:
select v.date, v.itemid, p.price
from volume v
join price p on p.itemid = v.item_id
where p.date = (select max(p2.date) from price p2
where p2.itemid = v.itemid
and p2.date <= v.date);
#3
SELECT v.date, p.price, v.volume
FROM volume v
LEFT JOIN Price p ON p.itemID=v.itemID
AND p.[date] = (
SELECT MAX([date] )
FROM price p2
WHERE p2.[date] <= v.[date] AND p2.itemid= v.itemid
GROUP BY p2.[date]
)
#4
SELECT Volume.date, volume.itemid, price.price, volume.volume_amt
FROM Volume
LEFT OUTER JOIN Price
ON Volume.date = Price.date
Probably. My SQL-fu is weak
大概。我的SQL-fu很弱
#5
This method works in Oracle. Don't know about other databases, and you didn't specify. If this exact syntax doesn't work in your database, I would guess there are similar techniques.
此方法适用于Oracle。不知道其他数据库,你没有指定。如果这个确切的语法在您的数据库中不起作用,我猜有类似的技术。
dev> select * from price;
AS_OF ID AMOUNT
----------- ---------- ----------
04-Dec-2008 1 1
11-Dec-2008 1 2
15-Dec-2008 1 3
dev> select * from volume;
DAY ID VOLUME
----------- ---------- ----------
05-Dec-2008 1 1
06-Dec-2008 1 2
07-Dec-2008 1 3
08-Dec-2008 1 4
09-Dec-2008 1 5
10-Dec-2008 1 6
11-Dec-2008 1 7
12-Dec-2008 1 8
13-Dec-2008 1 9
14-Dec-2008 1 10
15-Dec-2008 1 11
16-Dec-2008 1 12
17-Dec-2008 1 13
18-Dec-2008 1 14
19-Dec-2008 1 15
20-Dec-2008 1 16
21-Dec-2008 1 17
22-Dec-2008 1 18
23-Dec-2008 1 19
dev> select day, volume, amount from (
2 select day, volume, (select max(as_of) from price p where p.id = v.id and as_of <= day) price_as_of
3 from volume v
4 )
5 join price on as_of = price_as_of
6 order by day;
DAY VOLUME AMOUNT
----------- ---------- ----------
05-Dec-2008 1 1
06-Dec-2008 2 1
07-Dec-2008 3 1
08-Dec-2008 4 1
09-Dec-2008 5 1
10-Dec-2008 6 1
11-Dec-2008 7 2
12-Dec-2008 8 2
13-Dec-2008 9 2
14-Dec-2008 10 2
15-Dec-2008 11 3
16-Dec-2008 12 3
17-Dec-2008 13 3
18-Dec-2008 14 3
19-Dec-2008 15 3
20-Dec-2008 16 3
21-Dec-2008 17 3
22-Dec-2008 18 3
23-Dec-2008 19 3
#1
This isn't as simple as a single LEFT OUTER JOIN to the sparse table, because you want the NULLs left by the outer join to be filled with the most recent price.
这不像稀疏表的单个LEFT OUTER JOIN那么简单,因为您希望外部联接留下的NULL用最近的价格填充。
EXPLAIN SELECT v.`date`, v.volume_amt, p1.item_id, p1.price
FROM Volume v JOIN Price p1
ON (v.`date` >= p1.`date` AND v.item_id = p1.item_id)
LEFT OUTER JOIN Price p2
ON (v.`date` >= p2.`date` AND v.item_id = p2.item_id
AND p1.`date` < p2.`date`)
WHERE p2.item_id IS NULL;
This query matches Volume to all rows in Price that are earlier, and then uses another join to make sure we find only the most recent price.
此查询将Volume与Price之前的所有行匹配,然后使用另一个联接以确保我们只找到最近的价格。
I tested this on MySQL 5.0.51. It uses neither correlated subqueries nor group by.
我在MySQL 5.0.51上测试了这个。它既不使用相关子查询也不使用group by。
edit: Updated the query to match to item_id as well as date. This seems to work too. I created an index on (date)
and an index on (date, item_id)
and the EXPLAIN plan was identical. An index on (item_id, date)
may be better in this case. Here's the EXPLAIN output for that:
编辑:更新了查询以匹配item_id和日期。这似乎也有效。我在(日期)上创建了一个索引,在(date,item_id)上创建了索引,并且EXPLAIN计划是相同的。在这种情况下,(item_id,date)的索引可能更好。这是EXPLAIN输出:
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
| 1 | SIMPLE | p1 | ALL | item_id | NULL | NULL | NULL | 6 | |
| 1 | SIMPLE | v | ref | item_id | item_id | 22 | test.p1.item_id | 3 | Using where |
| 1 | SIMPLE | p2 | ref | item_id | item_id | 22 | test.v.item_id | 1 | Using where; Using index; Not exists |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
But I have a very small data set, and the optimization may depend on larger data sets. You should experiment, analyzing the optimization using a larger data set.
但我有一个非常小的数据集,优化可能依赖于更大的数据集。您应该尝试使用更大的数据集分析优化。
edit: I pasted the wrong EXPLAIN output before. The one above is corrected, and shows better use of the (item_id, date)
index.
编辑:之前我粘贴了错误的EXPLAIN输出。上面的一个更正,并显示更好地使用(item_id,date)索引。
#2
Assuming there is only 1 price per date/itemid:
假设每个日期/ itemid只有1个价格:
select v.date, v.itemid, p.price
from volume v
join price p on p.itemid = v.item_id
where p.date = (select max(p2.date) from price p2
where p2.itemid = v.itemid
and p2.date <= v.date);
#3
SELECT v.date, p.price, v.volume
FROM volume v
LEFT JOIN Price p ON p.itemID=v.itemID
AND p.[date] = (
SELECT MAX([date] )
FROM price p2
WHERE p2.[date] <= v.[date] AND p2.itemid= v.itemid
GROUP BY p2.[date]
)
#4
SELECT Volume.date, volume.itemid, price.price, volume.volume_amt
FROM Volume
LEFT OUTER JOIN Price
ON Volume.date = Price.date
Probably. My SQL-fu is weak
大概。我的SQL-fu很弱
#5
This method works in Oracle. Don't know about other databases, and you didn't specify. If this exact syntax doesn't work in your database, I would guess there are similar techniques.
此方法适用于Oracle。不知道其他数据库,你没有指定。如果这个确切的语法在您的数据库中不起作用,我猜有类似的技术。
dev> select * from price;
AS_OF ID AMOUNT
----------- ---------- ----------
04-Dec-2008 1 1
11-Dec-2008 1 2
15-Dec-2008 1 3
dev> select * from volume;
DAY ID VOLUME
----------- ---------- ----------
05-Dec-2008 1 1
06-Dec-2008 1 2
07-Dec-2008 1 3
08-Dec-2008 1 4
09-Dec-2008 1 5
10-Dec-2008 1 6
11-Dec-2008 1 7
12-Dec-2008 1 8
13-Dec-2008 1 9
14-Dec-2008 1 10
15-Dec-2008 1 11
16-Dec-2008 1 12
17-Dec-2008 1 13
18-Dec-2008 1 14
19-Dec-2008 1 15
20-Dec-2008 1 16
21-Dec-2008 1 17
22-Dec-2008 1 18
23-Dec-2008 1 19
dev> select day, volume, amount from (
2 select day, volume, (select max(as_of) from price p where p.id = v.id and as_of <= day) price_as_of
3 from volume v
4 )
5 join price on as_of = price_as_of
6 order by day;
DAY VOLUME AMOUNT
----------- ---------- ----------
05-Dec-2008 1 1
06-Dec-2008 2 1
07-Dec-2008 3 1
08-Dec-2008 4 1
09-Dec-2008 5 1
10-Dec-2008 6 1
11-Dec-2008 7 2
12-Dec-2008 8 2
13-Dec-2008 9 2
14-Dec-2008 10 2
15-Dec-2008 11 3
16-Dec-2008 12 3
17-Dec-2008 13 3
18-Dec-2008 14 3
19-Dec-2008 15 3
20-Dec-2008 16 3
21-Dec-2008 17 3
22-Dec-2008 18 3
23-Dec-2008 19 3