Pandas数据框基于多个if语句添加字段

时间:2021-12-12 22:54:41

I'm quite new to Python and Pandas so this might be an obvious question.

我对Python和Pandas很陌生,所以这可能是一个显而易见的问题。

I have a dataframe with ages listed in it. I want to create a new field with an age banding. I can use the lambda statement to capture a single if / else statement but I want to use multiple if's e.g. if age < 18 then 'under 18' elif age < 40 then 'under 40' else '>40'.

我有一个列有年龄的数据框。我想创建一个带有年龄段的新领域。我可以使用lambda语句来捕获单个if / else语句,但我想使用多个if,例如如果年龄<18岁则“18岁以下”,年龄<40岁,然后“40岁以下”,“否则”> 40岁。

I don't think I can do this using lambda but am not sure how to do it in a different way. I have this code so far:

我不认为我可以使用lambda做到这一点,但我不知道如何以不同的方式做到这一点。到目前为止我有这个代码:

import pandas as pd
import numpy as n

d = {'Age' : pd.Series([36., 42., 6., 66., 38.]) }

df = pd.DataFrame(d)

df['Age_Group'] =  df['Age'].map(lambda x: '<18' if x < 19 else '>18')

print(df)

2 个解决方案

#1


47  

The pandas DataFrame provides a nice querying ability.

pandas DataFrame提供了很好的查询能力。

What you are trying to do can be done simply with:

您尝试做的事情可以通过以下方式完成:

# Set a default value
df['Age_Group'] = '<40'
# Set Age_Group value for all row indexes which Age are greater than 40
df['Age_Group'][df['Age'] > 40] = '>40'
# Set Age_Group value for all row indexes which Age are greater than 18 and < 40
df['Age_Group'][(df['Age'] > 18) & (df['Age'] < 40)] = '>18'
# Set Age_Group value for all row indexes which Age are less than 18
df['Age_Group'][df['Age'] < 18] = '<18'

The querying here is a powerful tool of the dataframe and will allow you to manipulate the DataFrame as you need.

这里的查询是数据框的强大工具,允许您根据需要操作DataFrame。

For more complex conditionals, you can specify multiple conditions by encapsulating each condition in parenthesis and separating them with a boolean operator ( eg. '&' or '|')

对于更复杂的条件,您可以通过将每个条件封装在括号中并使用布尔运算符(例如'&'或'|')分隔它们来指定多个条件。

You can see this in work here for the second conditional statement for setting >18.

你可以在这里看到这个用于设置> 18的第二个条件语句。

Edit:

You can read more about indexing of DataFrame and conditionals:

您可以阅读有关DataFrame和条件的索引的更多信息:

http://pandas.pydata.org/pandas-docs/dev/indexing.html#index-objects

Edit:

To see how it works:

要了解它是如何工作的:

>>> d = {'Age' : pd.Series([36., 42., 6., 66., 38.]) }
>>> df = pd.DataFrame(d)
>>> df
   Age
0   36
1   42
2    6
3   66
4   38
>>> df['Age_Group'] = '<40'
>>> df['Age_Group'][df['Age'] > 40] = '>40'
>>> df['Age_Group'][(df['Age'] > 18) & (df['Age'] < 40)] = '>18'
>>> df['Age_Group'][df['Age'] < 18] = '<18'
>>> df
   Age Age_Group
0   36       >18
1   42       >40
2    6       <18
3   66       >40
4   38       >18

Edit:

To see how to do this without the chaining [using EdChums approach].

在没有链接的情况下查看如何执行此操作[使用EdChums方法]。

>>> df['Age_Group'] = '<40'
>>> df.loc[df['Age'] < 40,'Age_Group'] = '<40'
>>> df.loc[(df['Age'] > 18) & (df['Age'] < 40), 'Age_Group'] = '>18'
>>> df.loc[df['Age'] < 18,'Age_Group'] = '<18'
>>> df
   Age Age_Group
0   36       >18
1   42       <40
2    6       <18
3   66       <40
4   38       >18

#2


9  

You can also do a nested np.where()

你也可以做一个嵌套的np.where()

df['Age_group'] = np.where(df.Age<18, 'under 18',
                           np.where(df.Age<40,'under 40', '>40'))

#1


47  

The pandas DataFrame provides a nice querying ability.

pandas DataFrame提供了很好的查询能力。

What you are trying to do can be done simply with:

您尝试做的事情可以通过以下方式完成:

# Set a default value
df['Age_Group'] = '<40'
# Set Age_Group value for all row indexes which Age are greater than 40
df['Age_Group'][df['Age'] > 40] = '>40'
# Set Age_Group value for all row indexes which Age are greater than 18 and < 40
df['Age_Group'][(df['Age'] > 18) & (df['Age'] < 40)] = '>18'
# Set Age_Group value for all row indexes which Age are less than 18
df['Age_Group'][df['Age'] < 18] = '<18'

The querying here is a powerful tool of the dataframe and will allow you to manipulate the DataFrame as you need.

这里的查询是数据框的强大工具,允许您根据需要操作DataFrame。

For more complex conditionals, you can specify multiple conditions by encapsulating each condition in parenthesis and separating them with a boolean operator ( eg. '&' or '|')

对于更复杂的条件,您可以通过将每个条件封装在括号中并使用布尔运算符(例如'&'或'|')分隔它们来指定多个条件。

You can see this in work here for the second conditional statement for setting >18.

你可以在这里看到这个用于设置> 18的第二个条件语句。

Edit:

You can read more about indexing of DataFrame and conditionals:

您可以阅读有关DataFrame和条件的索引的更多信息:

http://pandas.pydata.org/pandas-docs/dev/indexing.html#index-objects

Edit:

To see how it works:

要了解它是如何工作的:

>>> d = {'Age' : pd.Series([36., 42., 6., 66., 38.]) }
>>> df = pd.DataFrame(d)
>>> df
   Age
0   36
1   42
2    6
3   66
4   38
>>> df['Age_Group'] = '<40'
>>> df['Age_Group'][df['Age'] > 40] = '>40'
>>> df['Age_Group'][(df['Age'] > 18) & (df['Age'] < 40)] = '>18'
>>> df['Age_Group'][df['Age'] < 18] = '<18'
>>> df
   Age Age_Group
0   36       >18
1   42       >40
2    6       <18
3   66       >40
4   38       >18

Edit:

To see how to do this without the chaining [using EdChums approach].

在没有链接的情况下查看如何执行此操作[使用EdChums方法]。

>>> df['Age_Group'] = '<40'
>>> df.loc[df['Age'] < 40,'Age_Group'] = '<40'
>>> df.loc[(df['Age'] > 18) & (df['Age'] < 40), 'Age_Group'] = '>18'
>>> df.loc[df['Age'] < 18,'Age_Group'] = '<18'
>>> df
   Age Age_Group
0   36       >18
1   42       <40
2    6       <18
3   66       <40
4   38       >18

#2


9  

You can also do a nested np.where()

你也可以做一个嵌套的np.where()

df['Age_group'] = np.where(df.Age<18, 'under 18',
                           np.where(df.Age<40,'under 40', '>40'))