I am using t-sql.
我用t - sql。
I have a simple table called mytable
我有一个简单的表格叫mytable
It looks like this:
它看起来像这样:
ID Num Date
1 0 2015-01-01 00:00:00
1 0 2015-01-02 00:00:00
1 1 2015-01-03 00:00:00
1 2 2015-01-04 00:00:00
2 0 2015-01-01 00:00:00
2 1 2015-02-01 00:00:00
2 0 2015-03-01 00:00:00
3 1 2014-01-01 00:00:00
3 2 2014-01-02 00:00:00
4 2 2015-02-01 00:00:00
4 0 2015-02-02 00:00:00
4 2 2015-02-05 00:00:00
The situation with this table is simply that any time a value of 1
or 2
has been entered into the table, the values that come later (chronologically speaking) cannot be a 0
. This is a data entry error and must be fixed by changing the 0
to a 2
.
这个表的情况很简单,只要在表中输入1或2的值,后面的值(按时间顺序)就不能是0。这是一个数据输入错误,必须通过将0改为2来修正。
So, in the simplified example above, ID
has an error for person 2
and 4
.
因此,在上面的简化示例中,ID对于person 2和person 4有一个错误。
For person 2
, somebody keyed in a 0
on 2015-01-01 00:00:00
, whereas for person 4
, somebody keyed in a 0
at 2015-01-01 00:00:00
.
对于第2个人,有人在2015-01-01 00:00键入0,而对于第4个人,有人在2015-01-01 00:00键入0。
I am new to SQL and honestly would rather just export the whole thing as a csv, open it in R, find the problems, and then update values with an update statement back in the database. But I feel like this is an opportunity to get better at SQL -- unfortunately, I'm stuck.
我是SQL新手,老实说,我宁愿把整个东西导出为csv,在R中打开它,找到问题,然后用数据库中的update语句更新值。但我觉得这是一个在SQL中更好的机会——不幸的是,我被卡住了。
Here I need some way to compare rows within a table to each other, as them being group by ID, yet also to consider this chronological situation. I've tried a cartesian join with a CASE
statement, which didn't work. Any help would be greatly appreciated.
在这里,我需要某种方法来比较表中的行与其他行,因为它们是按ID分组的,但也要考虑这种按时间顺序排列的情况。我尝试过用笛卡尔连接来表示一个情况,但没有成功。如有任何帮助,我们将不胜感激。
2 个解决方案
#1
2
This query will select all problematic records:
此查询将选择所有有问题的记录:
SELECT *
FROM mytable AS t
WHERE Num = 0 AND EXISTS (SELECT 1
FROM mytable
WHERE Num IN (1,2) AND ID = t.ID AND Date < t.Date)
It selects all Num=0
records which have either a Num=1
or a Num=2
preceding record for the same ID
.
它选择的所有Num=0记录都有一个Num=1或一个Num=2之前的相同ID的记录。
Output:
输出:
ID Num Date
------------------
2 0 2015-03-01
4 0 2015-02-02
To update the table simply do:
要更新该表,只需:
UPDATE mytable
SET Num = 2
FROM mytable AS t
WHERE Num = 0 AND EXISTS (SELECT 1
FROM mytable
WHERE Num IN (1,2) AND ID = t.ID AND Date < t.Date)
#2
0
You can join a table back to itself and put the logic in, like this:
您可以将一个表连接回自身,并将逻辑放入其中,如下所示:
select *
from mytable t
join mytable p on t.id = p.id
and t.date > p.date
and t.num < p.num
this will give you "extra" rows if there is more than one prior problem. To fix this you can group by:
如果有多个先前的问题,这将为您提供“额外”行。要解决这个问题,你可以分组如下:
select id, Date, max(priornum) as max_prior
from (
select t.id, t.Date, p.num as priornum
from mytable t
join mytable p on t.id = p.id
and t.date > p.date
and t.num < p.num
) sub
group by id, Date
or use over and distinct (for more modern server versions):
或使用超过和独特(更现代的服务器版本):
select distinct t.id, t.num, t.Date,
max(p.num) OVER (partition by t.id, t.Date) as max_prior
from mytable t
join mytable p on t.id = p.id
and t.date > p.date
and t.num < p.num
#1
2
This query will select all problematic records:
此查询将选择所有有问题的记录:
SELECT *
FROM mytable AS t
WHERE Num = 0 AND EXISTS (SELECT 1
FROM mytable
WHERE Num IN (1,2) AND ID = t.ID AND Date < t.Date)
It selects all Num=0
records which have either a Num=1
or a Num=2
preceding record for the same ID
.
它选择的所有Num=0记录都有一个Num=1或一个Num=2之前的相同ID的记录。
Output:
输出:
ID Num Date
------------------
2 0 2015-03-01
4 0 2015-02-02
To update the table simply do:
要更新该表,只需:
UPDATE mytable
SET Num = 2
FROM mytable AS t
WHERE Num = 0 AND EXISTS (SELECT 1
FROM mytable
WHERE Num IN (1,2) AND ID = t.ID AND Date < t.Date)
#2
0
You can join a table back to itself and put the logic in, like this:
您可以将一个表连接回自身,并将逻辑放入其中,如下所示:
select *
from mytable t
join mytable p on t.id = p.id
and t.date > p.date
and t.num < p.num
this will give you "extra" rows if there is more than one prior problem. To fix this you can group by:
如果有多个先前的问题,这将为您提供“额外”行。要解决这个问题,你可以分组如下:
select id, Date, max(priornum) as max_prior
from (
select t.id, t.Date, p.num as priornum
from mytable t
join mytable p on t.id = p.id
and t.date > p.date
and t.num < p.num
) sub
group by id, Date
or use over and distinct (for more modern server versions):
或使用超过和独特(更现代的服务器版本):
select distinct t.id, t.num, t.Date,
max(p.num) OVER (partition by t.id, t.Date) as max_prior
from mytable t
join mytable p on t.id = p.id
and t.date > p.date
and t.num < p.num