访问pandas.Series.apply中的索引

时间:2022-04-20 13:28:14

Lets say I have a MultiIndex Series s:

假设我有一个MultiIndex系列:

>>> s
     values
a b
1 2  0.1 
3 6  0.3
4 4  0.7

and I want to apply a function which uses the index of the row:

我想应用一个使用行索引的函数:

def f(x):
   # conditions or computations using the indexes
   if x.index[0] and ...: 
   other = sum(x.index) + ...
   return something

How can I do s.apply(f) for such a function? What is the recommended way to make this kind of operations? I expect to obtain a new Series with the values resulting from this function applied on each row and the same MultiIndex.

如何为这样的功能执行s.apply(f)?这种操作的推荐方法是什么?我期望获得一个新系列,其中每个行和相同的MultiIndex都应用了此函数产生的值。

5 个解决方案

#1


31  

I don't believe apply has access to the index; it treats each row as a numpy object, not a Series, as you can see:

我不相信申请可以访问索引;它将每一行视为一个numpy对象,而不是一个系列,你可以看到:

In [27]: s.apply(lambda x: type(x))
Out[27]: 
a  b
1  2    <type 'numpy.float64'>
3  6    <type 'numpy.float64'>
4  4    <type 'numpy.float64'>

To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.

要解决此限制,请将索引提升为列,应用函数,然后使用原始索引重新创建Series。

Series(s.reset_index().apply(f, axis=1).values, index=s.index)

Other approaches might use s.get_level_values, which often gets a little ugly in my opinion, or s.iterrows(), which is likely to be slower -- perhaps depending on exactly what f does.

其他方法可能会使用s.get_level_values,这在我看来经常有点难看,或者s.iterrows(),这可能会更慢 - 可能取决于f究竟是什么。

#2


8  

Make it a frame, return scalars if you want (so the result is a series)

使它成为一个框架,如果你想要返回标量(所以结果是一个系列)

Setup

建立

In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])

In [12]: s
Out[12]: 
a    1
b    2
c    3
dtype: float64

Printing function

打印功能

In [13]: def f(x):
    print type(x), x
    return x
   ....: 

In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
Out[14]: 
   0
a  1
b  2
c  3

Since you can return anything here, just return the scalars (access the index via the name attribute)

既然你可以在这里返回任何内容,只需返回标量(通过name属性访问索引)

In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]: 
a    5
b    2
c    3
dtype: float64

#3


3  

Convert to DataFrame and apply along row. You can access the index as x.name. x is also a Series now with 1 value

转换为DataFrame并沿行应用。您可以将索引作为x.name访问。 x也是一个系列,现在有1个值

s.to_frame(0).apply(f, axis=1)[0]

#4


0  

You may find it faster to use where rather than apply here:

您可能会发现使用它而不是在此处应用更快:

In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])

In [12]: s.where(s.index != 'a', 5)
Out[12]: 
a    5
b    2
c    3
dtype: float64

Also you can use numpy-style logic/functions to any of the parts:

您还可以将numpy风格的逻辑/函数用于任何部分:

In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]: 
a   -1
b    5
c    7
dtype: float64

In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]: 
a   -1
b    5
c    7
dtype: float64

I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that applys are more readable...

我建议测试速度(因为效率取决于功能)。虽然,我发现申请更具可读性......

#5


0  

You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().

如果使用DataFrame.apply()而不是Series.apply(),则可以在fucntion中作为参数访问整行。

def f1(row):
    if row['I'] < 0.5:
        return 0
    else:
        return 1

def f2(row):
    if row['N1']==1:
        return 0
    else:
        return 1

import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)

#1


31  

I don't believe apply has access to the index; it treats each row as a numpy object, not a Series, as you can see:

我不相信申请可以访问索引;它将每一行视为一个numpy对象,而不是一个系列,你可以看到:

In [27]: s.apply(lambda x: type(x))
Out[27]: 
a  b
1  2    <type 'numpy.float64'>
3  6    <type 'numpy.float64'>
4  4    <type 'numpy.float64'>

To get around this limitation, promote the indexes to columns, apply your function, and recreate a Series with the original index.

要解决此限制,请将索引提升为列,应用函数,然后使用原始索引重新创建Series。

Series(s.reset_index().apply(f, axis=1).values, index=s.index)

Other approaches might use s.get_level_values, which often gets a little ugly in my opinion, or s.iterrows(), which is likely to be slower -- perhaps depending on exactly what f does.

其他方法可能会使用s.get_level_values,这在我看来经常有点难看,或者s.iterrows(),这可能会更慢 - 可能取决于f究竟是什么。

#2


8  

Make it a frame, return scalars if you want (so the result is a series)

使它成为一个框架,如果你想要返回标量(所以结果是一个系列)

Setup

建立

In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])

In [12]: s
Out[12]: 
a    1
b    2
c    3
dtype: float64

Printing function

打印功能

In [13]: def f(x):
    print type(x), x
    return x
   ....: 

In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
Out[14]: 
   0
a  1
b  2
c  3

Since you can return anything here, just return the scalars (access the index via the name attribute)

既然你可以在这里返回任何内容,只需返回标量(通过name属性访问索引)

In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]: 
a    5
b    2
c    3
dtype: float64

#3


3  

Convert to DataFrame and apply along row. You can access the index as x.name. x is also a Series now with 1 value

转换为DataFrame并沿行应用。您可以将索引作为x.name访问。 x也是一个系列,现在有1个值

s.to_frame(0).apply(f, axis=1)[0]

#4


0  

You may find it faster to use where rather than apply here:

您可能会发现使用它而不是在此处应用更快:

In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])

In [12]: s.where(s.index != 'a', 5)
Out[12]: 
a    5
b    2
c    3
dtype: float64

Also you can use numpy-style logic/functions to any of the parts:

您还可以将numpy风格的逻辑/函数用于任何部分:

In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]: 
a   -1
b    5
c    7
dtype: float64

In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]: 
a   -1
b    5
c    7
dtype: float64

I recommend testing for speed (as efficiency against apply will depend on the function). Although, I find that applys are more readable...

我建议测试速度(因为效率取决于功能)。虽然,我发现申请更具可读性......

#5


0  

You can access the whole row as argument inside the fucntion if you use DataFrame.apply() instead of Series.apply().

如果使用DataFrame.apply()而不是Series.apply(),则可以在fucntion中作为参数访问整行。

def f1(row):
    if row['I'] < 0.5:
        return 0
    else:
        return 1

def f2(row):
    if row['N1']==1:
        return 0
    else:
        return 1

import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)