It seems that the columns get reordered by column index when calling pandas.DataFrame.groupby().shift()
. The sort parameter applies only to rows.
在调用pandas.DataFrame.groupby()。shift()时,列似乎会被列索引重新排序。 sort参数仅适用于行。
Here is an example:
这是一个例子:
import pandas as pd
df = pd.DataFrame({'A': ['group1', 'group1', 'group2', 'group2', 'group3', 'group3'],
'E': ['a','b','c','d','e','f'],
'B': [10, 12, 10, 25, 10, 12],
'C': [100, 102, 100, 250, 100, 102],
'D': [1,2,3,4,5,6]
})
df.set_index('A',inplace=True)
df = df[['E','C','D','B']]
df
# E C D B
# A
#group1 a 100 1 10
#group1 b 102 2 12
#group2 c 100 3 10
#group2 d 250 4 25
#group3 e 100 5 10
#group3 f 102 6 12
Going from here, I want to achieve:
从这里开始,我希望实现:
# E C D B C_s D_s B_s
# A
#group1 a 100 1 10 102.0 2.0 12.0
#group1 b 102 2 12 NaN NaN NaN
#group2 c 100 3 10 250.0 4.0 25.0
#group2 d 250 4 25 NaN NaN NaN
#group3 e 100 5 10 102.0 6.0 12.0
#group3 f 102 6 12 NaN NaN NaN
But
df[['C_s','D_s','B_s']]= df.groupby(level='A')[['C','D','B']].shift(-1)
Results in:
# E C D B C_s D_s B_s
# A
#group1 a 100 1 10 12.0 102.0 2.0
#group1 b 102 2 12 NaN NaN NaN
#group2 c 100 3 10 25.0 250.0 4.0
#group2 d 250 4 25 NaN NaN NaN
#group3 e 100 5 10 12.0 102.0 6.0
#group3 f 102 6 12 NaN NaN NaN
Introducing an artificial ordering of the columns helps to maintain the intrinsic logical connection of the columns:
引入列的人工排序有助于维护列的内在逻辑连接:
df = df.sort_index(axis=1)
df[['B_s','C_s','D_s']]= df.groupby(level='A')[['B','C','D']].shift(-1).sort_index(axis=1)
df
# B C D E B_s C_s D_s
# A
#group1 10 100 1 a 12.0 102.0 2.0
#group1 12 102 2 b NaN NaN NaN
#group2 10 100 3 c 25.0 250.0 4.0
#group2 25 250 4 d NaN NaN NaN
#group3 10 100 5 e 12.0 102.0 6.0
#group3 12 102 6 f NaN NaN NaN
Why are the columns reordered in the first place?
为什么列首先重新排序?
1 个解决方案
#1
3
In my opinion it is bug.
在我看来这是错误。
Working custom lambda function:
工作自定义lambda函数:
df[['C_s','D_s','B_s']] = df.groupby(level='A')['C','D','B'].apply(lambda x: x.shift(-1))
print (df)
E C D B C_s D_s B_s
A
group1 a 100 1 10 102.0 2.0 12.0
group1 b 102 2 12 NaN NaN NaN
group2 c 100 3 10 250.0 4.0 25.0
group2 d 250 4 25 NaN NaN NaN
group3 e 100 5 10 102.0 6.0 12.0
group3 f 102 6 12 NaN NaN NaN
Thank you @cᴏʟᴅsᴘᴇᴇᴅ for another solution:
谢谢@cᴏʟᴅsᴘᴇᴇᴅ的另一个解决方案:
df[['C_s','D_s','B_s']] = (df.groupby(level='A')['C','D','B']
.apply(pd.DataFrame.shift, periods=-1))
#1
3
In my opinion it is bug.
在我看来这是错误。
Working custom lambda function:
工作自定义lambda函数:
df[['C_s','D_s','B_s']] = df.groupby(level='A')['C','D','B'].apply(lambda x: x.shift(-1))
print (df)
E C D B C_s D_s B_s
A
group1 a 100 1 10 102.0 2.0 12.0
group1 b 102 2 12 NaN NaN NaN
group2 c 100 3 10 250.0 4.0 25.0
group2 d 250 4 25 NaN NaN NaN
group3 e 100 5 10 102.0 6.0 12.0
group3 f 102 6 12 NaN NaN NaN
Thank you @cᴏʟᴅsᴘᴇᴇᴅ for another solution:
谢谢@cᴏʟᴅsᴘᴇᴇᴅ的另一个解决方案:
df[['C_s','D_s','B_s']] = (df.groupby(level='A')['C','D','B']
.apply(pd.DataFrame.shift, periods=-1))