把两列放在一起

时间:2022-06-28 15:09:26

when I use this syntax it creates a series rather than adding a column to my new dataframe (sum). Please help.

当我使用这个语法时,它创建了一个系列,而不是在我的新dataframe (sum)中添加一个列。请帮助。

My code:

我的代码:

sum = data['variance'] = data.budget + data.actual

My Data (in dataframe df): (currently has everything except the budget - actual, I want to create a variance column?

我的数据(用dataframe df表示):(目前除了预算之外,其他都有)——实际上,我想创建一个方差列?

    cluster     date    budget  actual          | budget - actual
0   a   2014-01-01 00:00:00     11000   10000       1000
1   a   2014-02-01 00:00:00     1200    1000
2   a   2014-03-01 00:00:00     200     100
3   b   2014-04-01 00:00:00     200     300
4   b   2014-05-01 00:00:00     400     450
5   c   2014-06-01 00:00:00     700     1000
6   c   2014-07-01 00:00:00     1200    1000
7   c   2014-08-01 00:00:00     200     100
8   c   2014-09-01 00:00:00     200     300

1 个解决方案

#1


24  

I think you've misunderstood some python syntax, the following does two assignments:

我认为您误解了一些python语法,以下是两个作业:

In [11]: a = b = 1

In [12]: a
Out[12]: 1

In [13]: b
Out[13]: 1

So in your code it was as if you were doing:

所以在你的代码中就好像你在做:

sum = df['budget'] + df['actual']  # a Series
# and
df['variance'] = df['budget'] + df['actual']  # assigned to a column

The latter creates a new column for df:

后者为df创建了一个新专栏:

In [21]: df
Out[21]:
  cluster                 date  budget  actual
0       a  2014-01-01 00:00:00   11000   10000
1       a  2014-02-01 00:00:00    1200    1000
2       a  2014-03-01 00:00:00     200     100
3       b  2014-04-01 00:00:00     200     300
4       b  2014-05-01 00:00:00     400     450
5       c  2014-06-01 00:00:00     700    1000
6       c  2014-07-01 00:00:00    1200    1000
7       c  2014-08-01 00:00:00     200     100
8       c  2014-09-01 00:00:00     200     300

In [22]: df['variance'] = df['budget'] + df['actual']

In [23]: df
Out[23]:
  cluster                 date  budget  actual  variance
0       a  2014-01-01 00:00:00   11000   10000     21000
1       a  2014-02-01 00:00:00    1200    1000      2200
2       a  2014-03-01 00:00:00     200     100       300
3       b  2014-04-01 00:00:00     200     300       500
4       b  2014-05-01 00:00:00     400     450       850
5       c  2014-06-01 00:00:00     700    1000      1700
6       c  2014-07-01 00:00:00    1200    1000      2200
7       c  2014-08-01 00:00:00     200     100       300
8       c  2014-09-01 00:00:00     200     300       500

As an aside, you shouldn't use sum as a variable name as the overrides the built-in sum function.

顺便提一下,不应该将sum用作变量名,因为它覆盖了内置的sum函数。

#1


24  

I think you've misunderstood some python syntax, the following does two assignments:

我认为您误解了一些python语法,以下是两个作业:

In [11]: a = b = 1

In [12]: a
Out[12]: 1

In [13]: b
Out[13]: 1

So in your code it was as if you were doing:

所以在你的代码中就好像你在做:

sum = df['budget'] + df['actual']  # a Series
# and
df['variance'] = df['budget'] + df['actual']  # assigned to a column

The latter creates a new column for df:

后者为df创建了一个新专栏:

In [21]: df
Out[21]:
  cluster                 date  budget  actual
0       a  2014-01-01 00:00:00   11000   10000
1       a  2014-02-01 00:00:00    1200    1000
2       a  2014-03-01 00:00:00     200     100
3       b  2014-04-01 00:00:00     200     300
4       b  2014-05-01 00:00:00     400     450
5       c  2014-06-01 00:00:00     700    1000
6       c  2014-07-01 00:00:00    1200    1000
7       c  2014-08-01 00:00:00     200     100
8       c  2014-09-01 00:00:00     200     300

In [22]: df['variance'] = df['budget'] + df['actual']

In [23]: df
Out[23]:
  cluster                 date  budget  actual  variance
0       a  2014-01-01 00:00:00   11000   10000     21000
1       a  2014-02-01 00:00:00    1200    1000      2200
2       a  2014-03-01 00:00:00     200     100       300
3       b  2014-04-01 00:00:00     200     300       500
4       b  2014-05-01 00:00:00     400     450       850
5       c  2014-06-01 00:00:00     700    1000      1700
6       c  2014-07-01 00:00:00    1200    1000      2200
7       c  2014-08-01 00:00:00     200     100       300
8       c  2014-09-01 00:00:00     200     300       500

As an aside, you shouldn't use sum as a variable name as the overrides the built-in sum function.

顺便提一下,不应该将sum用作变量名,因为它覆盖了内置的sum函数。