Pandas:替换数据帧中的列值

时间:2021-06-28 21:44:20

I'm trying to replace the values in one column of a dataframe. The column ('female') only contains the values 'female' and 'male'.

我正在尝试替换数据帧的一列中的值。列(“女性”)仅包含“女性”和“男性”的值。

I have tried the following:

我尝试过以下方法:

w['female']['female']='1'
w['female']['male']='0' 

But receive the exact same copy of the previous results.

但是收到以前结果的完全相同的副本。

I would ideally like to get some output which resembles the following loop element-wise.

理想情况下,我希望得到一些类似于以下循环元素的输出。

if w['female'] =='female':
    w['female'] = '1';
else:
    w['female'] = '0';

I've looked through the gotchas documentation (http://pandas.pydata.org/pandas-docs/stable/gotchas.html) but cannot figure out why nothing happens.

我查看了陷阱文档(http://pandas.pydata.org/pandas-docs/stable/gotchas.html),但无法弄清楚为什么没有发生。

Any help will be appreciated.

任何帮助将不胜感激。

8 个解决方案

#1


116  

If I understand right, you want something like this:

如果我理解正确,你想要这样的东西:

w['female'] = w['female'].map({'female': 1, 'male': 0})

(Here I convert the values to numbers instead of strings containing numbers. You can convert them to "1" and "0", if you really want, but I'm not sure why you'd want that.)

(这里我将值转换为数字而不是包含数字的字符串。如果你真的想要,可以将它们转换为“1”和“0”,但我不确定你为什么要这样做。)

The reason your code doesn't work is because using ['female'] on a column (the second 'female' in your w['female']['female']) doesn't mean "select rows where the value is 'female'". It means to select rows where the index is 'female', of which there may not be any in your DataFrame.

你的代码不起作用的原因是因为在列上使用['female'](w ['female'] ['female']中的第二个'female')并不意味着“选择值为的行'女'”。这意味着选择索引为“female”的行,其中DataFrame中可能没有。

#2


65  

You can edit a subset of a dataframe by using loc:

您可以使用loc编辑数据框的子集:

df.loc[<row selection>, <column selection>]

In this case:

在这种情况下:

w.loc[w.female != 'female', 'female'] = 0
w.loc[w.female == 'female', 'female'] = 1

#3


19  

w.female.replace(to_replace=dict(female=1, male=0), inplace=True)

See pandas.DataFrame.replace() docs.

请参阅pandas.DataFrame.replace()文档。

#4


16  

Slight variation:

轻微变化:

w.female.replace(['male', 'female'], [1, 0], inplace=True)

#5


11  

This should also work:

这应该也有效:

w.female[w.female == 'female'] = 1 
w.female[w.female == 'male']   = 0

#6


5  

Alternatively there is the built-in function pd.get_dummies for these kinds of assignments:

另外,还有内置函数pd.get_dummies用于这些类型的赋值:

w['female'] = pd.get_dummies(w['female'],drop_first = True)

This gives you a data frame with two columns, one for each value that occurs in w['female'], of which you drop the first (because you can infer it from the one that is left). The new column is automatically named as the string that you replaced.

这为您提供了一个包含两列的数据框,每个列对应于w ['female']中出现的每个值,其中您放弃第一列(因为您可以从剩下的那个中推断出它)。新列将自动命名为您替换的字符串。

This is especially useful if you have categorical variables with more than two possible values. This function creates as many dummy variables needed to distinguish between all cases. Be careful then that you don't assign the entire data frame to a single column, but instead, if w['female'] could be 'male', 'female' or 'neutral', do something like this:

如果您的分类变量具有两个以上的可能值,则此功能尤其有用。此函数创建区分所有情况所需的虚拟变量。请注意,不要将整个数据框分配到单个列,而是如果w ['female']可以是“男性”,“女性”或“中性”,请执行以下操作:

w = pd.concat([w, pd.get_dummies(w['female'], drop_first = True)], axis = 1])
w.drop('female', axis = 1, inplace = True)

Then you are left with two new columns giving you the dummy coding of 'female' and you got rid of the column with the strings.

然后你会留下两个新的列给你一个“女性”的虚拟编码,然后你用字符串去除了列。

#7


4  

You can also use apply with .get i.e.

你也可以使用.get ie

w['female'] = w['female'].apply({'male':0, 'female':1}.get):

w ['female'] = w ['female']。apply({'male':0,'female':1} .get):

w = pd.DataFrame({'female':['female','male','female']})
print(w)

Dataframe w:

数据帧w:

   female
0  female
1    male
2  female

Using apply to replace values from the dictionary:

使用apply替换字典中的值:

w['female'] = w['female'].apply({'male':0, 'female':1}.get)
print(w)

Result:

结果:

   female
0       1
1       0
2       1 

Note: apply with dictionary should be used if all the possible values of the columns in the dataframe are defined in the dictionary else, it will have empty for those not defined in dictionary.

注意:如果数据框中列的所有可能值都在字典中定义,则应使用apply with dictionary,对于那些未在字典中定义的值,它将为空。

#8


1  

There is also a function in pandas called factorize which you can use to automatically do this type of work. It converts labels to numbers: ['male', 'female', 'male'] -> [0, 1, 0]. See this answer for more information.

在pandas中还有一个名为factorize的函数,您可以使用它来自动执行此类工作。它将标签转换为数字:['男性','女性','男性'] - > [0,1,0]。有关更多信息,请参阅此答案。

#1


116  

If I understand right, you want something like this:

如果我理解正确,你想要这样的东西:

w['female'] = w['female'].map({'female': 1, 'male': 0})

(Here I convert the values to numbers instead of strings containing numbers. You can convert them to "1" and "0", if you really want, but I'm not sure why you'd want that.)

(这里我将值转换为数字而不是包含数字的字符串。如果你真的想要,可以将它们转换为“1”和“0”,但我不确定你为什么要这样做。)

The reason your code doesn't work is because using ['female'] on a column (the second 'female' in your w['female']['female']) doesn't mean "select rows where the value is 'female'". It means to select rows where the index is 'female', of which there may not be any in your DataFrame.

你的代码不起作用的原因是因为在列上使用['female'](w ['female'] ['female']中的第二个'female')并不意味着“选择值为的行'女'”。这意味着选择索引为“female”的行,其中DataFrame中可能没有。

#2


65  

You can edit a subset of a dataframe by using loc:

您可以使用loc编辑数据框的子集:

df.loc[<row selection>, <column selection>]

In this case:

在这种情况下:

w.loc[w.female != 'female', 'female'] = 0
w.loc[w.female == 'female', 'female'] = 1

#3


19  

w.female.replace(to_replace=dict(female=1, male=0), inplace=True)

See pandas.DataFrame.replace() docs.

请参阅pandas.DataFrame.replace()文档。

#4


16  

Slight variation:

轻微变化:

w.female.replace(['male', 'female'], [1, 0], inplace=True)

#5


11  

This should also work:

这应该也有效:

w.female[w.female == 'female'] = 1 
w.female[w.female == 'male']   = 0

#6


5  

Alternatively there is the built-in function pd.get_dummies for these kinds of assignments:

另外,还有内置函数pd.get_dummies用于这些类型的赋值:

w['female'] = pd.get_dummies(w['female'],drop_first = True)

This gives you a data frame with two columns, one for each value that occurs in w['female'], of which you drop the first (because you can infer it from the one that is left). The new column is automatically named as the string that you replaced.

这为您提供了一个包含两列的数据框,每个列对应于w ['female']中出现的每个值,其中您放弃第一列(因为您可以从剩下的那个中推断出它)。新列将自动命名为您替换的字符串。

This is especially useful if you have categorical variables with more than two possible values. This function creates as many dummy variables needed to distinguish between all cases. Be careful then that you don't assign the entire data frame to a single column, but instead, if w['female'] could be 'male', 'female' or 'neutral', do something like this:

如果您的分类变量具有两个以上的可能值,则此功能尤其有用。此函数创建区分所有情况所需的虚拟变量。请注意,不要将整个数据框分配到单个列,而是如果w ['female']可以是“男性”,“女性”或“中性”,请执行以下操作:

w = pd.concat([w, pd.get_dummies(w['female'], drop_first = True)], axis = 1])
w.drop('female', axis = 1, inplace = True)

Then you are left with two new columns giving you the dummy coding of 'female' and you got rid of the column with the strings.

然后你会留下两个新的列给你一个“女性”的虚拟编码,然后你用字符串去除了列。

#7


4  

You can also use apply with .get i.e.

你也可以使用.get ie

w['female'] = w['female'].apply({'male':0, 'female':1}.get):

w ['female'] = w ['female']。apply({'male':0,'female':1} .get):

w = pd.DataFrame({'female':['female','male','female']})
print(w)

Dataframe w:

数据帧w:

   female
0  female
1    male
2  female

Using apply to replace values from the dictionary:

使用apply替换字典中的值:

w['female'] = w['female'].apply({'male':0, 'female':1}.get)
print(w)

Result:

结果:

   female
0       1
1       0
2       1 

Note: apply with dictionary should be used if all the possible values of the columns in the dataframe are defined in the dictionary else, it will have empty for those not defined in dictionary.

注意:如果数据框中列的所有可能值都在字典中定义,则应使用apply with dictionary,对于那些未在字典中定义的值,它将为空。

#8


1  

There is also a function in pandas called factorize which you can use to automatically do this type of work. It converts labels to numbers: ['male', 'female', 'male'] -> [0, 1, 0]. See this answer for more information.

在pandas中还有一个名为factorize的函数,您可以使用它来自动执行此类工作。它将标签转换为数字:['男性','女性','男性'] - > [0,1,0]。有关更多信息,请参阅此答案。