从熊猫数据存储器中的其他列分配列的值

时间:2022-03-13 22:57:41

How do i assign columns in my dataframe to be equal to another column if/where condition is met?

如果满足条件,我如何在dataframe中分配列以等于另一个列?

Update
The problem
I need to assign many columns values (and sometimes a value from another column in that row) when the condition is met.

The condition is not the problem.

更新问题,当条件满足时,我需要分配许多列值(有时还有来自该行另一列的值)。条件不是问题所在。

I need an efficient way to do this:

我需要一个有效的方法来做到这一点:

df.loc[some condition it doesn't matter,
['a','b','c','d','e','f','g','x','y']]=df['z'],1,3,4,5,6,7,8,df['p']

Simplified example data

简化的示例数据

d = {'var' : pd.Series([10,61]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df=pd.DataFrame(d)

Condition if var is not missing and first digit is less than 5
Result make df.x=df.z & df.y=1

如果var没有丢失,第一个数字小于5,则使df.x=df。z & df.y = 1

Here is psuedo code that doesn't work, but it is what I would want.

这里是psuedo代码,它不能工作,但这正是我想要的。

df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['x','y']]=df['z'],1

but i get

但我得到

ValueError: cannot set using a list-like indexer with a different length than the value

ValueError:不能使用与值不同的列式索引器来设置。

ideal output

理想的输出

     c  var     x     z     y
0  100    10    x     x     1
1    0    61    None  x  None

The code below works, but is too inefficient because i need to assign values to multiple columns.

下面的代码可以工作,但是效率太低,因为我需要为多个列分配值。

df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['x']]=df['z']
df.loc[((df['var'].dropna().astype(str).str[0].astype(int) < 5)),
['y']]=1

2 个解决方案

#1


1  

You can work row wise:

你可以按行计算:

def f(row):
    if row['var'] is not None and int(str(row['var'])[0]) < 5:
        row[['x', 'y']] = row['z'], 1
    return row

>>> df.apply(f, axis=1)
     c  var     x   y  z
0  100   10     x   1  x
1    0   61  None NaN  x

To overwrite the original df:

覆盖原始df:

df = df.apply(f, axis=1)

#2


2  

This is one way of doing it:

这是一种方法:

import pandas as pd
import numpy as np

d = {'var' : pd.Series([1,6]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df = pd.DataFrame(d)

# Condition 1: if var is not missing
cond1 = ~df['var'].apply(np.isnan)
# Condition 2: first number is less than 5
cond2 = df['var'].apply(lambda x: int(str(x)[0])) < 5
mask = cond1 & cond2
df.ix[mask, 'x'] = df.ix[mask, 'z']
df.ix[mask, 'y'] = 1
print df

Output:

输出:

     c  var     x     y  z
0  100    1     x     1  x
1    0    6  None  None  x

As you can see, the Boolean mask has to be applied on both side of the assignment, and you need to broadcast the value 1 on the y column. It is probably cleaner to split the steps into multiple lines.

如您所见,布尔蒙版必须应用于赋值的两边,并且需要在y列上广播值1。将步骤拆分为多行可能更简单。

Question updated, edit: More generally, since some assignments depend on the other columns, and some assignments are just broadcasting along the column, you can do it in two steps:

问题更新,编辑:一般来说,由于有些作业取决于其他栏目,而有些作业只是沿着栏目广播,你可以分两个步骤来完成:

df.loc[conds, ['a','y']] = df.loc[conds, ['z','p']]
df.loc[conds, ['b','c','d','e','f','g','x']] = [1,3,4,5,6,7,8]

You may profile and see if this is efficient enough for your use case.

您可以对您的用例进行概要分析,看看这对您的用例是否足够有效。

#1


1  

You can work row wise:

你可以按行计算:

def f(row):
    if row['var'] is not None and int(str(row['var'])[0]) < 5:
        row[['x', 'y']] = row['z'], 1
    return row

>>> df.apply(f, axis=1)
     c  var     x   y  z
0  100   10     x   1  x
1    0   61  None NaN  x

To overwrite the original df:

覆盖原始df:

df = df.apply(f, axis=1)

#2


2  

This is one way of doing it:

这是一种方法:

import pandas as pd
import numpy as np

d = {'var' : pd.Series([1,6]),
'c' : pd.Series([100,0]),
'z' : pd.Series(['x','x']),
'y' : pd.Series([None,None]),
'x' : pd.Series([None,None])}
df = pd.DataFrame(d)

# Condition 1: if var is not missing
cond1 = ~df['var'].apply(np.isnan)
# Condition 2: first number is less than 5
cond2 = df['var'].apply(lambda x: int(str(x)[0])) < 5
mask = cond1 & cond2
df.ix[mask, 'x'] = df.ix[mask, 'z']
df.ix[mask, 'y'] = 1
print df

Output:

输出:

     c  var     x     y  z
0  100    1     x     1  x
1    0    6  None  None  x

As you can see, the Boolean mask has to be applied on both side of the assignment, and you need to broadcast the value 1 on the y column. It is probably cleaner to split the steps into multiple lines.

如您所见,布尔蒙版必须应用于赋值的两边,并且需要在y列上广播值1。将步骤拆分为多行可能更简单。

Question updated, edit: More generally, since some assignments depend on the other columns, and some assignments are just broadcasting along the column, you can do it in two steps:

问题更新,编辑:一般来说,由于有些作业取决于其他栏目,而有些作业只是沿着栏目广播,你可以分两个步骤来完成:

df.loc[conds, ['a','y']] = df.loc[conds, ['z','p']]
df.loc[conds, ['b','c','d','e','f','g','x']] = [1,3,4,5,6,7,8]

You may profile and see if this is efficient enough for your use case.

您可以对您的用例进行概要分析,看看这对您的用例是否足够有效。