在熊猫DataFrame的所有行中减去第一行。

时间:2021-08-16 21:23:37

I have a pandas dataframe:

我有一个熊猫档案:

a = pd.DataFrame(rand(5,6)*10, index=pd.DatetimeIndex(start='2005', periods=5, freq='A'))
a.columns = pd.MultiIndex.from_product([('A','B'),('a','b','c')])

I want to subtract the row a['2005'] from a. To do that I've tried this:

我要从a中减去a(2005)这一行,我试过了

In [22]:

a - a.ix['2005']

Out[22]:
    A   B
    a   b   c   a   b   c
2005-12-31  0   0   0   0   0   0
2006-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2007-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2008-12-31  NaN     NaN     NaN     NaN     NaN     NaN
2009-12-31  NaN     NaN     NaN     NaN     NaN     NaN

Which obviously doesn't work because pandas is lining up the index while doing the operation. This works:

这显然是行不通的,因为熊猫在做这个动作的时候正在排队。如此:

In [24]:

pd.DataFrame(a.values - a['2005'].values, index=a.index, columns=a.columns)

Out[24]:
    A   B
    a   b   c   a   b   c
2005-12-31  0.000000    0.000000    0.000000    0.000000    0.000000    0.000000
2006-12-31  -3.326761   -7.164628   8.188518    -0.863177   0.519587    -3.281982
2007-12-31  3.529531    -4.719756   8.444488    1.355366    7.468361    -4.023797
2008-12-31  3.139185    -8.420257   1.465101    -2.942519   1.219060    -5.146019
2009-12-31  -3.459710   0.519435    -1.049617   -2.779370   4.792227    -1.922461

But I don't want to have to form a new DataFrame every time I have to do this kind of operation. I've tried the apply() method like this: a.apply(lambda x: x-a['2005'].values) but I get ValueError: cannot copy sequence with size 6 to array axis with dimension 5 So I'm not really sure how to proceed. Is there a simple way to do this that I am not seeing? I think there should be an easy way to do this in place so you don't have to construct a new dataframe each time. I also tried the sub() method but the subtraction is only applied to the first row whereas I want to subtract the first row from each row in the dataframe.

但是我不想每次做这种操作时都要创建一个新的DataFrame。我尝试过apply()方法,比如a。应用(lambda x: x-a['2005'].values)但我获得ValueError:无法将大小为6的序列复制到维度5的数组轴,因此我不确定该如何继续。有没有一种简单的方法来做我看不到的事情?我认为应该有一种简单的方法来实现这一点,这样您就不必每次都构建一个新的dataframe。我还尝试了子()方法,但减法只应用于第一行,而我想要从dataframe中的每一行中减去第一行。

2 个解决方案

#1


5  

Pandas is great for aligning by index. So when you want Pandas to ignore the index, you need to drop the index. You can do that by converting the DataFrame a.loc['2005'] to a 1-dimensional NumPy array:

熊猫很适合按指数排列。因此,当你想让熊猫忽略这个指数时,你需要删除这个指数。您可以通过转换DataFrame a来实现这一点。loc['2005']到一维数字数组:

In [56]: a - a.loc['2005'].values.squeeze()
Out[56]: 
                   A                             B                    
                   a         b         c         a         b         c
2005-12-31  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
2006-12-31  0.325968  1.314776 -0.789328 -0.344669 -2.518857  7.361711
2007-12-31  0.084203  2.234445 -2.838454 -6.176795 -3.645513  8.955443
2008-12-31  3.798700  0.299529  1.303325 -2.770126 -1.284188  3.093806
2009-12-31  1.520930  2.660040  0.846996 -9.437851 -2.886603  6.705391

The squeeze method converts the NumPy array, a.loc['2005'], of shape to (1, 6) to an array of shape (6,). This allows the array to be broadcasted (during the subtraction) as desired.

挤压方法转换NumPy数组a。loc['2005'],形状为(1,6)到形状数组(6,6)。这使数组可以按需要广播(在减法期间)。

#2


2  

Here is a more verbose simple break down of how to do this.

这里有一个更详细的简单的方法。

First make a simple DataFrame to make it easier to understand.

首先创建一个简单的DataFrame,以便更容易理解。

import numpy as np
import pandas as pd
#make a simple DataFrame
df = pd.DataFrame(np.fromfunction(lambda i, j: i+1 , (3, 3), dtype=int))

Which will look like this

哪个像这样

# 1 1 1
# 2 2 2
# 3 3 3

Now get the values from the first row

现在从第一行得到值。

first_row = df.iloc[[0]].values[0]

Now use apply() to subtract the first row from the rest of the rows.

现在使用apply()将第一行从剩余的行中减去。

df.apply(lambda row: row - first_row, axis=1)

The result will look like this. See that 1 was subtracted from each row

结果是这样的。每一行都减去1

#  0 0 0
#  1 1 1
#  2 2 2

#1


5  

Pandas is great for aligning by index. So when you want Pandas to ignore the index, you need to drop the index. You can do that by converting the DataFrame a.loc['2005'] to a 1-dimensional NumPy array:

熊猫很适合按指数排列。因此,当你想让熊猫忽略这个指数时,你需要删除这个指数。您可以通过转换DataFrame a来实现这一点。loc['2005']到一维数字数组:

In [56]: a - a.loc['2005'].values.squeeze()
Out[56]: 
                   A                             B                    
                   a         b         c         a         b         c
2005-12-31  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
2006-12-31  0.325968  1.314776 -0.789328 -0.344669 -2.518857  7.361711
2007-12-31  0.084203  2.234445 -2.838454 -6.176795 -3.645513  8.955443
2008-12-31  3.798700  0.299529  1.303325 -2.770126 -1.284188  3.093806
2009-12-31  1.520930  2.660040  0.846996 -9.437851 -2.886603  6.705391

The squeeze method converts the NumPy array, a.loc['2005'], of shape to (1, 6) to an array of shape (6,). This allows the array to be broadcasted (during the subtraction) as desired.

挤压方法转换NumPy数组a。loc['2005'],形状为(1,6)到形状数组(6,6)。这使数组可以按需要广播(在减法期间)。

#2


2  

Here is a more verbose simple break down of how to do this.

这里有一个更详细的简单的方法。

First make a simple DataFrame to make it easier to understand.

首先创建一个简单的DataFrame,以便更容易理解。

import numpy as np
import pandas as pd
#make a simple DataFrame
df = pd.DataFrame(np.fromfunction(lambda i, j: i+1 , (3, 3), dtype=int))

Which will look like this

哪个像这样

# 1 1 1
# 2 2 2
# 3 3 3

Now get the values from the first row

现在从第一行得到值。

first_row = df.iloc[[0]].values[0]

Now use apply() to subtract the first row from the rest of the rows.

现在使用apply()将第一行从剩余的行中减去。

df.apply(lambda row: row - first_row, axis=1)

The result will look like this. See that 1 was subtracted from each row

结果是这样的。每一行都减去1

#  0 0 0
#  1 1 1
#  2 2 2