I have a somewhat-complex set of rules that I need to run against a table. The problem is as follows: I have a table that stores medical records and I need to identify the first site that a person goes to after their discharge date. The discharge date is the end_date with a location of 'initial' (this will be the first row for every group). The table is grouped by ID and sorted in the format shown below.
我有一组复杂的规则,需要对表运行。问题是这样的:我有一个存储医疗记录的表格,我需要确定一个人出院后去的第一个地点。出院日期是end_date,其位置为“initial”(这将是每个组的第一行)。该表按ID分组并按照如下所示的格式进行排序。
There are 3 rules: (1) within the group of ID, if any of the rows have a begin_date that matches the first rows end_date, return that location as the first site (if there are two rows that meet this condition, either are correct, the first instance is preferred). (2) if the first option does not exist, then if there is an instance that the patient had location 'Health', then return 'Health'. (3) else, if conditions 1 and 2 do not exist, then return 'Home'
在ID组中有3条规则:(1)如果任何行都有与第一行end_date匹配的begin_date,则将该位置作为第一个站点返回(如果有两行满足此条件,其中任何一行都是正确的,则首选第一个实例)。(2)如果第一个选项不存在,则如果有一个实例表明患者有“健康”位置,则返回“健康”。(3)如果条件1和2不存在,则返回“Home”
table
表
ID color begin_date end_date location
1 red 2017-01-01 2017-01-07 initial
1 green 2017-01-05 2017-01-07 nursing
1 blue 2017-01-07 2017-01-15 rehab
1 red 2017-01-11 2017-01-22 Health
2 red 2017-02-22 2017-02-26 initial
2 green 2017-02-26 2017-02-28 nursing
2 blue 2017-02-26 2017-02-28 rehab
3 red 2017-03-11 2017-03-22 initial
4 red 2017-04-01 2017-04-07 initial
4 green 2017-04-05 2017-04-07 nursing
4 blue 2017-04-10 2017-04-15 Health
final result:
最终结果:
ID first_site
1 rehab
2 nursing
3 home
4 Health
In sql-server 2008 my attempt: (side note: I considered adding a helper column which would be the end_date of the 'initial' location to each row so its easier to compare within a row. Not sure if this is necessary). I appreciate any guidance!
在sql-server 2008中,我尝试:(顺便说一下:我考虑过为每一行添加一个helper列,该列将是“初始”位置的end_date,以便在一行中进行比较。不确定这是否必要)。我很欣赏任何指导!
SELECT
ID,
OVER( PARTITION ID CASE WHEN end_date[0] = begin_date THEN location
WHEN location = 'Health' THEN 'Health'
ELSE 'Home' end) AS [first_site]
FROM table
In python, I was able to get this answer with:
在python中,我能够得到这样的答案:
def conditions(x):
#compare each group first
val = x.loc[x['begin_date'] == x['end_date'].iloc[0], 'location']
#if at least one match (not return empty `Series` get first value)
if not val.empty:
return val.iloc[0]
#if value is empty, check if value 'Health' exists within the group
elif (x['location'] == 'Health').any():
return 'Health'
else:
return 'Home'
final = df.groupby('ID').apply(conditions).reset_index(name='first_site')
2 个解决方案
#1
2
This can be achieved with a windowed function to rank the visits that start on the same day as the initial
end and then a couple simple join
s:
这可以通过一个窗口函数来实现,该函数对从开始的同一天开始的访问进行排序,然后进行几个简单的连接:
declare @t table(ID int,color varchar(20),begin_date date,end_date date,location varchar(20));
insert into @t values(1,'red','20170101','20170107','initial'),(1,'green','20170105','20170107','nursing'),(1,'blue','20170107','20170115','rehab'),(1,'red','20170111','20170122','Health'),(2,'red','20170222','20170226','initial'),(2,'green','20170226','20170228','nursing'),(2,'blue','20170226','20170228','rehab'),(3,'red','20170311','20170322','initial'),(4,'red','20170401','20170407','initial'),(4,'green','20170405','20170407','nursing'),(4,'blue','20170410','20170415','Health');
with d as
(
select ID
,color
,begin_date
,end_date
,location
,row_number() over (partition by ID
,begin_date
order by case when location = 'initial' then '29990101' else begin_date end
) as r
from @t
)
select i.ID
,isnull(d.location,isnull(h.location,'Home')) as first_site
from d as i
left join d
on i.end_date = d.begin_date
and d.r = 1
left join d as h
on i.ID = h.ID
and h.location = 'Health'
where i.location = 'initial'
;
Output:
输出:
+----+------------+
| ID | first_site |
+----+------------+
| 1 | rehab |
| 2 | nursing |
| 3 | Home |
| 4 | Health |
+----+------------+
#2
2
declare @example table (
ExampleID int identity(1,1) not null primary key clustered
, ID int not null
, Color nvarchar(255) not null
, BeginDate date not null
, EndDate date not null
, Loc nvarchar(255) not null
);
insert into @example (ID, color, begindate, enddate, loc)
select 1, 'red' , '2017-01-01', '2017-01-07', 'initial' union all
select 1, 'green' , '2017-01-05', '2017-01-07', 'nursing' union all
select 1, 'blue' , '2017-01-07', '2017-01-15', 'rehab' union all
select 1, 'red' , '2017-01-11', '2017-01-22', 'Health' union all
select 2, 'red' , '2017-02-22', '2017-02-26', 'initial' union all
select 2, 'green' , '2017-02-26', '2017-02-28', 'nursing' union all
select 2, 'blue' , '2017-02-26', '2017-02-28', 'rehab' union all
select 3, 'red' , '2017-03-11', '2017-03-22', 'initial' union all
select 4, 'red' , '2017-04-01', '2017-04-07', 'initial' union all
select 4, 'green' , '2017-04-05', '2017-04-07', 'nursing' union all
select 4, 'blue' , '2017-04-10', '2017-04-15', 'Health';
with cte as (
select a.ID
, a.Color
, a.BeginDate
, a.EndDate
, b.Loc
, rank() over(partition by a.ID order by a.ID, a.begindate, b.enddate desc, b.loc) Ranking
from @example a
left join @example b
on a.EndDate = b.BeginDate
)
, cte2 as (
select id
, Loc
from @example
where loc = 'health'
)
select a.ID
, COALESCE(a.loc, b.loc, 'Home') as Loc
from cte a
left join cte2 b
on a.id = b.id
where Ranking = 1
Output:
输出:
ID Loc
1 rehab
2 nursing
3 home
4 Health
#1
2
This can be achieved with a windowed function to rank the visits that start on the same day as the initial
end and then a couple simple join
s:
这可以通过一个窗口函数来实现,该函数对从开始的同一天开始的访问进行排序,然后进行几个简单的连接:
declare @t table(ID int,color varchar(20),begin_date date,end_date date,location varchar(20));
insert into @t values(1,'red','20170101','20170107','initial'),(1,'green','20170105','20170107','nursing'),(1,'blue','20170107','20170115','rehab'),(1,'red','20170111','20170122','Health'),(2,'red','20170222','20170226','initial'),(2,'green','20170226','20170228','nursing'),(2,'blue','20170226','20170228','rehab'),(3,'red','20170311','20170322','initial'),(4,'red','20170401','20170407','initial'),(4,'green','20170405','20170407','nursing'),(4,'blue','20170410','20170415','Health');
with d as
(
select ID
,color
,begin_date
,end_date
,location
,row_number() over (partition by ID
,begin_date
order by case when location = 'initial' then '29990101' else begin_date end
) as r
from @t
)
select i.ID
,isnull(d.location,isnull(h.location,'Home')) as first_site
from d as i
left join d
on i.end_date = d.begin_date
and d.r = 1
left join d as h
on i.ID = h.ID
and h.location = 'Health'
where i.location = 'initial'
;
Output:
输出:
+----+------------+
| ID | first_site |
+----+------------+
| 1 | rehab |
| 2 | nursing |
| 3 | Home |
| 4 | Health |
+----+------------+
#2
2
declare @example table (
ExampleID int identity(1,1) not null primary key clustered
, ID int not null
, Color nvarchar(255) not null
, BeginDate date not null
, EndDate date not null
, Loc nvarchar(255) not null
);
insert into @example (ID, color, begindate, enddate, loc)
select 1, 'red' , '2017-01-01', '2017-01-07', 'initial' union all
select 1, 'green' , '2017-01-05', '2017-01-07', 'nursing' union all
select 1, 'blue' , '2017-01-07', '2017-01-15', 'rehab' union all
select 1, 'red' , '2017-01-11', '2017-01-22', 'Health' union all
select 2, 'red' , '2017-02-22', '2017-02-26', 'initial' union all
select 2, 'green' , '2017-02-26', '2017-02-28', 'nursing' union all
select 2, 'blue' , '2017-02-26', '2017-02-28', 'rehab' union all
select 3, 'red' , '2017-03-11', '2017-03-22', 'initial' union all
select 4, 'red' , '2017-04-01', '2017-04-07', 'initial' union all
select 4, 'green' , '2017-04-05', '2017-04-07', 'nursing' union all
select 4, 'blue' , '2017-04-10', '2017-04-15', 'Health';
with cte as (
select a.ID
, a.Color
, a.BeginDate
, a.EndDate
, b.Loc
, rank() over(partition by a.ID order by a.ID, a.begindate, b.enddate desc, b.loc) Ranking
from @example a
left join @example b
on a.EndDate = b.BeginDate
)
, cte2 as (
select id
, Loc
from @example
where loc = 'health'
)
select a.ID
, COALESCE(a.loc, b.loc, 'Home') as Loc
from cte a
left join cte2 b
on a.id = b.id
where Ranking = 1
Output:
输出:
ID Loc
1 rehab
2 nursing
3 home
4 Health