Is there a way to look back to a previous row, and calculate a new variable? so as long as the previous row is the same case what is the (previous change) - (current change), and attribute it to the previous 'ChangeEvent' in new columns?
有没有办法回顾前一行,并计算一个新变量?所以只要前一行是相同的情况(先前的更改) - (当前的更改)是什么,并将其归因于新列中的前一个'ChangeEvent'?
here is my DataFrame
这是我的DataFrame
>>> df
ChangeEvent StartEvent case change open
0 Homeless Homeless 1 2014-03-08 00:00:00 2014-02-08
1 other Homeless 1 2014-04-08 00:00:00 2014-02-08
2 Homeless Homeless 1 2014-05-08 00:00:00 2014-02-08
3 Jail Homeless 1 2014-06-08 00:00:00 2014-02-08
4 Jail Jail 2 2014-06-08 00:00:00 2014-02-08
to add columns
添加列
Jail Homeless case
0 6 1
0 30 1
0 0 1
... and so on
... 等等
here is the df build
这是df构建
import pandas as pd
import datetime as DT
d = {'case' : pd.Series([1,1,1,1,2]),
'open' : pd.Series([DT.datetime(2014, 3, 2), DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2)]),
'change' : pd.Series([DT.datetime(2014, 3, 8), DT.datetime(2014, 4, 8),DT.datetime(2014, 5, 8),DT.datetime(2014, 6, 8),DT.datetime(2014, 6, 8)]),
'StartEvent' : pd.Series(['Homeless','Homeless','Homeless','Homeless','Jail']),
'ChangeEvent' : pd.Series(['Homeless','irrelivant','Homeless','Jail','Jail']),
'close' : pd.Series([DT.datetime(2015, 3, 2), DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2)])}
df=pd.DataFrame(d)
1 个解决方案
#1
46
The way to get the previous is using the shift method:
获得前一个的方法是使用shift方法:
In [11]: df1.change.shift(1)
Out[11]:
0 NaT
1 2014-03-08
2 2014-04-08
3 2014-05-08
4 2014-06-08
Name: change, dtype: datetime64[ns]
Now you can subtract these columns. Note: This is with 0.13.1 (datetime stuff has had a lot of work recently, so YMMV with older versions).
现在您可以减去这些列。注意:这是0.13.1(日期时间的东西最近有很多工作,所以YMMV与旧版本)。
In [12]: df1.change.shift(1) - df1.change
Out[12]:
0 NaT
1 -31 days
2 -30 days
3 -31 days
4 0 days
Name: change, dtype: timedelta64[ns]
You can just apply this to each case/group:
您可以将其应用于每个案例/组:
In [13]: df.groupby('case')['change'].apply(lambda x: x.shift(1) - x)
Out[13]:
0 NaT
1 -31 days
2 -30 days
3 -31 days
4 NaT
dtype: timedelta64[ns]
#1
46
The way to get the previous is using the shift method:
获得前一个的方法是使用shift方法:
In [11]: df1.change.shift(1)
Out[11]:
0 NaT
1 2014-03-08
2 2014-04-08
3 2014-05-08
4 2014-06-08
Name: change, dtype: datetime64[ns]
Now you can subtract these columns. Note: This is with 0.13.1 (datetime stuff has had a lot of work recently, so YMMV with older versions).
现在您可以减去这些列。注意:这是0.13.1(日期时间的东西最近有很多工作,所以YMMV与旧版本)。
In [12]: df1.change.shift(1) - df1.change
Out[12]:
0 NaT
1 -31 days
2 -30 days
3 -31 days
4 0 days
Name: change, dtype: timedelta64[ns]
You can just apply this to each case/group:
您可以将其应用于每个案例/组:
In [13]: df.groupby('case')['change'].apply(lambda x: x.shift(1) - x)
Out[13]:
0 NaT
1 -31 days
2 -30 days
3 -31 days
4 NaT
dtype: timedelta64[ns]