I have a structure like this in a few tables: id, [...], validfrom, validto.
我在一些表中有这样的结构:id,[…,validfrom validto。
The id
is a NUMBER, and the validfrom
and validto
columns are of type DATE. Any given date should not result in more than one post per id
.
id是一个数字,validfrom和validto列的类型是DATE。任何给定的日期都不应导致每个id的post超过一个。
So this is a correct example:
这是一个正确的例子
id, validfrom, validto
1, 2000-01-01, 2000-02-20
1, 2000-02-21, 2000-03-02
1, 2000-03-03, 2099-12-31
However, there seem to be some issues where a certain dates would return more than one value. Something like this (which is corrupt data):
然而,似乎有一些问题,一个特定的日期会返回多个值。类似这样的东西(即损坏的数据):
id, validfrom, validto
1, 2001-01-01, 2001-02-20
1, 2001-01-15, 2001-03-02
1, 2001-03-03, 2099-12-31
So in the above example, any date between 2001-01-15 and 2001-02-20 would return two rows.
在上面的示例中,2001-01-15和2001-02-20之间的任何日期都将返回两行。
How would I construct a script that finds all thees corrupt posts?
我该如何构造一个脚本来查找所有的腐败帖子?
3 个解决方案
#1
2
Just to find them, assuming validfrom is lesser than validto in every row:
为了找到它们,假设validfrom在每一行都小于validto
select a.*, b.*
from your_table a
join your_table b
on (a.id = b.id and
--overlapping
greatest(a.validfrom, b.validfrom) <= least(a.validto, b.validto) and
--exclude join the same row.
a.rowid <> b.rowid
)
This just find intersecting intervals, because distinct intervals have a valid_from greater than valid_to of the other.
这只是找到相交的间隔,因为不同的间隔之间的valid_from大于valid_to。
UPDATE
: I replaced the condition not (a.validto=b.validto and a.validfrom=b.validfrom)
with
更新:我替换了条件not (a.validto=b)。validto和a.validfrom = b.validfrom)
a.rowid<> b.rowid
because it will report the duplicate rows now. (Thanks wolfi)
因为它现在会报告重复的行。(感谢wolfi)
#2
2
Finding overlapping time spans is a nightmare. Very easy to get wrong, and there is no simple and good solution that I know of. In theory, Oracle has solved this with a data type WM_PERIOD
, which might or might not be installed / available at your database. But it's not a beauty either:
发现重叠的时间跨度是一场噩梦。很容易出错,我知道没有简单而好的解决方案。在理论上,Oracle已经用数据类型WM_PERIOD解决了这个问题,该数据类型可能在数据库中安装也可能不能在数据库中使用。但它也不是美:
SELECT *
FROM your_table a JOIN your_table b USING (id)
WHERE a.rowid < b.rowid
AND wm_overlaps(wm_period(a.validfrom, a.validto),
wm_period(b.validfrom, b.validto))=1;
1 2001-01-01 2001-02-20 2001-01-15 2001-03-02
#3
1
This would look for overlapping rows, and rows that are repeated:
这将查找重叠的行和重复的行:
select *
from YourTable yt1
where -- Overlapping rows exist
exists
(
select *
from YourTable yt2
where yt1.id = yt2.id
-- Rows overlap
and yt1.validfrom <= yt2.validto
and yt2.validfrom <= yt1.validto
-- Rows must be distinct
and yt1.rowid <> yt2.rowid
)
#1
2
Just to find them, assuming validfrom is lesser than validto in every row:
为了找到它们,假设validfrom在每一行都小于validto
select a.*, b.*
from your_table a
join your_table b
on (a.id = b.id and
--overlapping
greatest(a.validfrom, b.validfrom) <= least(a.validto, b.validto) and
--exclude join the same row.
a.rowid <> b.rowid
)
This just find intersecting intervals, because distinct intervals have a valid_from greater than valid_to of the other.
这只是找到相交的间隔,因为不同的间隔之间的valid_from大于valid_to。
UPDATE
: I replaced the condition not (a.validto=b.validto and a.validfrom=b.validfrom)
with
更新:我替换了条件not (a.validto=b)。validto和a.validfrom = b.validfrom)
a.rowid<> b.rowid
because it will report the duplicate rows now. (Thanks wolfi)
因为它现在会报告重复的行。(感谢wolfi)
#2
2
Finding overlapping time spans is a nightmare. Very easy to get wrong, and there is no simple and good solution that I know of. In theory, Oracle has solved this with a data type WM_PERIOD
, which might or might not be installed / available at your database. But it's not a beauty either:
发现重叠的时间跨度是一场噩梦。很容易出错,我知道没有简单而好的解决方案。在理论上,Oracle已经用数据类型WM_PERIOD解决了这个问题,该数据类型可能在数据库中安装也可能不能在数据库中使用。但它也不是美:
SELECT *
FROM your_table a JOIN your_table b USING (id)
WHERE a.rowid < b.rowid
AND wm_overlaps(wm_period(a.validfrom, a.validto),
wm_period(b.validfrom, b.validto))=1;
1 2001-01-01 2001-02-20 2001-01-15 2001-03-02
#3
1
This would look for overlapping rows, and rows that are repeated:
这将查找重叠的行和重复的行:
select *
from YourTable yt1
where -- Overlapping rows exist
exists
(
select *
from YourTable yt2
where yt1.id = yt2.id
-- Rows overlap
and yt1.validfrom <= yt2.validto
and yt2.validfrom <= yt1.validto
-- Rows must be distinct
and yt1.rowid <> yt2.rowid
)