Lets say I have a MultiIndex Series s
:
假设我有一个MultiIndex系列:
>>> s
values
a b
1 2 0.1
3 6 0.3
4 4 0.7
and I want to apply a function which uses the index of the row:
我想应用一个使用行索引的函数:
def f(x):
# conditions or computations using the indexes
if x.index[0] and ...:
other = sum(x.index) + ...
return something
How can I do s.apply(f)
for such a function? What is the recommended way to make this kind of operations? I expect to obtain a new Series with the values resulting from this function applied on each row and the same MultiIndex.
如何为这样的功能执行s.apply(f)?这种操作的推荐方法是什么?我期望获得一个新系列,其中每个行和相同的MultiIndex都应用了此函数产生的值。
5 个解决方案
#1
31
I don't believe apply
has access to the index; it treats each row as a numpy object, not a Series, as you can see:
我不相信申请可以访问索引;它将每一行视为一个numpy对象,而不是一个系列,你可以看到:
In [27]: s.apply(lambda x: type(x))
Out[27]:
a b
1 2 <type 'numpy.float64'>
3 6 <type 'numpy.float64'>
4 4 <type 'numpy.float64'>
To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.
要解决此限制,请将索引提升为列,应用函数,然后使用原始索引重新创建Series。
Series(s.reset_index().apply(f, axis=1).values, index=s.index)
Other approaches might use s.get_level_values
, which often gets a little ugly in my opinion, or s.iterrows()
, which is likely to be slower -- perhaps depending on exactly what f
does.
其他方法可能会使用s.get_level_values,这在我看来经常有点难看,或者s.iterrows(),这可能会更慢 - 可能取决于f究竟是什么。
#2
8
Make it a frame, return scalars if you want (so the result is a series)
使它成为一个框架,如果你想要返回标量(所以结果是一个系列)
Setup
建立
In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])
In [12]: s
Out[12]:
a 1
b 2
c 3
dtype: float64
Printing function
打印功能
In [13]: def f(x):
print type(x), x
return x
....:
In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a 1
b 2
c 3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a 1
b 2
c 3
Name: 0, dtype: float64
Out[14]:
0
a 1
b 2
c 3
Since you can return anything here, just return the scalars (access the index via the name
attribute)
既然你可以在这里返回任何内容,只需返回标量(通过name属性访问索引)
In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]:
a 5
b 2
c 3
dtype: float64
#3
3
Convert to DataFrame
and apply along row. You can access the index as x.name
. x
is also a Series
now with 1 value
转换为DataFrame并沿行应用。您可以将索引作为x.name访问。 x也是一个系列,现在有1个值
s.to_frame(0).apply(f, axis=1)[0]
#4
0
You may find it faster to use where
rather than apply
here:
您可能会发现使用它而不是在此处应用更快:
In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])
In [12]: s.where(s.index != 'a', 5)
Out[12]:
a 5
b 2
c 3
dtype: float64
Also you can use numpy-style logic/functions to any of the parts:
您还可以将numpy风格的逻辑/函数用于任何部分:
In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]:
a -1
b 5
c 7
dtype: float64
In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]:
a -1
b 5
c 7
dtype: float64
I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that apply
s are more readable...
我建议测试速度(因为效率取决于功能)。虽然,我发现申请更具可读性......
#5
0
You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().
如果使用DataFrame.apply()而不是Series.apply(),则可以在fucntion中作为参数访问整行。
def f1(row):
if row['I'] < 0.5:
return 0
else:
return 1
def f2(row):
if row['N1']==1:
return 0
else:
return 1
import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)
#1
31
I don't believe apply
has access to the index; it treats each row as a numpy object, not a Series, as you can see:
我不相信申请可以访问索引;它将每一行视为一个numpy对象,而不是一个系列,你可以看到:
In [27]: s.apply(lambda x: type(x))
Out[27]:
a b
1 2 <type 'numpy.float64'>
3 6 <type 'numpy.float64'>
4 4 <type 'numpy.float64'>
To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.
要解决此限制,请将索引提升为列,应用函数,然后使用原始索引重新创建Series。
Series(s.reset_index().apply(f, axis=1).values, index=s.index)
Other approaches might use s.get_level_values
, which often gets a little ugly in my opinion, or s.iterrows()
, which is likely to be slower -- perhaps depending on exactly what f
does.
其他方法可能会使用s.get_level_values,这在我看来经常有点难看,或者s.iterrows(),这可能会更慢 - 可能取决于f究竟是什么。
#2
8
Make it a frame, return scalars if you want (so the result is a series)
使它成为一个框架,如果你想要返回标量(所以结果是一个系列)
Setup
建立
In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])
In [12]: s
Out[12]:
a 1
b 2
c 3
dtype: float64
Printing function
打印功能
In [13]: def f(x):
print type(x), x
return x
....:
In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a 1
b 2
c 3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a 1
b 2
c 3
Name: 0, dtype: float64
Out[14]:
0
a 1
b 2
c 3
Since you can return anything here, just return the scalars (access the index via the name
attribute)
既然你可以在这里返回任何内容,只需返回标量(通过name属性访问索引)
In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]:
a 5
b 2
c 3
dtype: float64
#3
3
Convert to DataFrame
and apply along row. You can access the index as x.name
. x
is also a Series
now with 1 value
转换为DataFrame并沿行应用。您可以将索引作为x.name访问。 x也是一个系列,现在有1个值
s.to_frame(0).apply(f, axis=1)[0]
#4
0
You may find it faster to use where
rather than apply
here:
您可能会发现使用它而不是在此处应用更快:
In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])
In [12]: s.where(s.index != 'a', 5)
Out[12]:
a 5
b 2
c 3
dtype: float64
Also you can use numpy-style logic/functions to any of the parts:
您还可以将numpy风格的逻辑/函数用于任何部分:
In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]:
a -1
b 5
c 7
dtype: float64
In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]:
a -1
b 5
c 7
dtype: float64
I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that apply
s are more readable...
我建议测试速度(因为效率取决于功能)。虽然,我发现申请更具可读性......
#5
0
You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().
如果使用DataFrame.apply()而不是Series.apply(),则可以在fucntion中作为参数访问整行。
def f1(row):
if row['I'] < 0.5:
return 0
else:
return 1
def f2(row):
if row['N1']==1:
return 0
else:
return 1
import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)