I want to map (via dictionary) part of a MultiIndex DataFrame to a column. Is there a way to do that in a single step?
我想将(通过字典)MultiIndex DataFrame的一部分映射到列。有没有办法一步到位?
For example, with the following sample DataFrame:
例如,使用以下示例DataFrame:
i = pd.MultiIndex.from_product([['A','B','C'], np.arange(1, 11, 1)], names=['Name','Num'])
df = pd.DataFrame(np.random.randn(30), i, columns=['Vals'])
and sample map:
和样本地图:
a = list('abcdefghijk')
m = {}
for i in range(0,11):
m[i] = a[i]
I want to create a column X containing the letter associated with the second index level:
我想创建一个包含与第二个索引级别关联的字母的列X:
df.assign(X=m[df.index.get_level_values('Num').values])
But that doesn't work, and neither does:
但这不起作用,也不起作用:
df['X'] = df.index.map(lambda x: m[x[1]])
3 个解决方案
#1
2
Access the second level with get_level_values
, convert to a Series
, and call map
/replace
-
使用get_level_values访问第二级,转换为Series,并调用map / replace -
df['X'] = df.index.get_level_values(1).to_series().map(m).values
Or,
要么,
df['X'] = df.index.get_level_values(1).to_series().replace(m).values
Alternatively (inspired by OP), you can call map
on df.index.get_level_values
, and pass a callable (in this case, it would be m.get
) -
或者(受OP启发),您可以在df.index.get_level_values上调用map,并传递一个可调用的(在这种情况下,它将是m.get) -
df['X'] = df.index.get_level_values(1).map(m.get)
df
Vals X
Name Num
A 1 2.731237 b
2 0.180595 c
3 -1.428064 d
4 -0.622806 e
5 0.948709 f
6 -1.383310 g
7 0.177631 h
8 -1.071445 i
9 -0.183859 j
10 1.480641 k
B 1 -1.036380 b
2 1.031757 c
3 0.542989 d
4 -0.933676 e
5 -0.540661 f
6 -0.506969 g
7 0.572705 h
8 -1.363675 i
9 -0.588765 j
10 0.998691 k
C 1 -0.471536 b
2 -1.361124 c
3 -0.382200 d
4 0.694174 e
5 1.077779 f
6 -0.501285 g
7 0.961986 h
8 -0.285009 i
9 1.385881 j
10 1.490152 k
Here, I've got to call .values
because I want to be able to assign the result back to the dataframe without indexing alignment issues.
在这里,我必须调用.values,因为我希望能够将结果分配回数据帧而不会对齐对齐问题。
#2
3
Here is another shorthand that works:
这是另一种有效的简写:
df['X'] = df.index.map(lambda x: m.get(x[1]))
It is not invalid to use a dictionary in a lambda like that, it's just that (apparently) the index notation of dictionary value (e.g., m[x[1]]
) lookup doesn't work in this situation.
在这样的lambda中使用字典并非无效,只是(显然)字典值的索引符号(例如,m [x [1]])查找在这种情况下不起作用。
#3
2
rename
it then assign it back
重命名然后将其分配回来
df['New']=df.rename(index=m,level=1).index.get_level_values(1)
df
Out[132]:
Vals New
Name Num
A 1 -0.906266 b
2 0.321047 c
3 0.227720 d
4 3.040522 e
5 0.604392 f
6 1.394153 g
7 -0.640342 h
8 -0.812858 i
9 -1.142764 j
10 0.744968 k
B 1 0.956003 b
2 0.064266 c
3 0.042286 d
4 -1.089578 e
5 0.534922 f
6 -0.545524 g
7 0.102778 h
8 -1.691460 i
9 -1.980935 j
10 1.226609 k
C 1 0.871654 b
2 0.396818 c
3 0.691537 d
4 1.923429 e
5 0.239363 f
6 -0.669168 g
7 -0.168082 h
8 0.209918 i
9 0.205527 j
10 0.490754 k
#1
2
Access the second level with get_level_values
, convert to a Series
, and call map
/replace
-
使用get_level_values访问第二级,转换为Series,并调用map / replace -
df['X'] = df.index.get_level_values(1).to_series().map(m).values
Or,
要么,
df['X'] = df.index.get_level_values(1).to_series().replace(m).values
Alternatively (inspired by OP), you can call map
on df.index.get_level_values
, and pass a callable (in this case, it would be m.get
) -
或者(受OP启发),您可以在df.index.get_level_values上调用map,并传递一个可调用的(在这种情况下,它将是m.get) -
df['X'] = df.index.get_level_values(1).map(m.get)
df
Vals X
Name Num
A 1 2.731237 b
2 0.180595 c
3 -1.428064 d
4 -0.622806 e
5 0.948709 f
6 -1.383310 g
7 0.177631 h
8 -1.071445 i
9 -0.183859 j
10 1.480641 k
B 1 -1.036380 b
2 1.031757 c
3 0.542989 d
4 -0.933676 e
5 -0.540661 f
6 -0.506969 g
7 0.572705 h
8 -1.363675 i
9 -0.588765 j
10 0.998691 k
C 1 -0.471536 b
2 -1.361124 c
3 -0.382200 d
4 0.694174 e
5 1.077779 f
6 -0.501285 g
7 0.961986 h
8 -0.285009 i
9 1.385881 j
10 1.490152 k
Here, I've got to call .values
because I want to be able to assign the result back to the dataframe without indexing alignment issues.
在这里,我必须调用.values,因为我希望能够将结果分配回数据帧而不会对齐对齐问题。
#2
3
Here is another shorthand that works:
这是另一种有效的简写:
df['X'] = df.index.map(lambda x: m.get(x[1]))
It is not invalid to use a dictionary in a lambda like that, it's just that (apparently) the index notation of dictionary value (e.g., m[x[1]]
) lookup doesn't work in this situation.
在这样的lambda中使用字典并非无效,只是(显然)字典值的索引符号(例如,m [x [1]])查找在这种情况下不起作用。
#3
2
rename
it then assign it back
重命名然后将其分配回来
df['New']=df.rename(index=m,level=1).index.get_level_values(1)
df
Out[132]:
Vals New
Name Num
A 1 -0.906266 b
2 0.321047 c
3 0.227720 d
4 3.040522 e
5 0.604392 f
6 1.394153 g
7 -0.640342 h
8 -0.812858 i
9 -1.142764 j
10 0.744968 k
B 1 0.956003 b
2 0.064266 c
3 0.042286 d
4 -1.089578 e
5 0.534922 f
6 -0.545524 g
7 0.102778 h
8 -1.691460 i
9 -1.980935 j
10 1.226609 k
C 1 0.871654 b
2 0.396818 c
3 0.691537 d
4 1.923429 e
5 0.239363 f
6 -0.669168 g
7 -0.168082 h
8 0.209918 i
9 0.205527 j
10 0.490754 k