为指定列替换一个值的熊猫

I need to apply a function to a subset of columns in a dataframe. consider the following toy example:

我需要对dataframe中的一组列应用一个函数。考虑以下玩具示例:

pdf = pd.DataFrame({'a' : [1, 2, 3], 'b' : [2, 3, 4], 'c' : [5, 6, 7]})arb_cols = ['a', 'b']

what I want to do is this:

我想做的是:

[df[c] = df[c].apply(lambda x : 99 if x == 2 else x) for c in arb_cols]

But this is bad syntax. Is it possible to accomplish such a task without a for loop?

但这是糟糕的语法。没有for循环是否可能完成这样的任务?

3 个解决方案

#1

With mask

与面具

pdf.mask(pdf.loc[:,arb_cols]==2,99).assign(c=pdf.c)Out[1190]:     a   b  c0   1  99  51  99   3  62   3   4  7

Or with assign

或与分配

pdf.assign(**pdf.loc[:,arb_cols].mask(pdf.loc[:,arb_cols]==2,99))Out[1193]:     a   b  c0   1  99  51  99   3  62   3   4  7

#2

Do not use pd.Series.apply when you can use vectorised functions.

不要使用pd.Series。当您可以使用矢量化函数时应用。

For example, the below should be efficient for larger dataframes even though there is an outer loop:

例如，对于较大的数据aframes，下面的代码应该是有效的，即使有一个外部循环:

for col in arb_cols:    pdf.loc[pdf[col] == 2, col] = 99

Another option it to use pd.DataFrame.replace:

另一个使用pd.DataFrame.replace的选项:

pdf[arb_cols] = pdf[arb_cols].replace(2, 99)

Yet another option is to use numpy.where:

另一种选择是使用numpy.where:

import numpy as nppdf[arb_cols] = np.where(pdf[arb_cols] == 2, 99, pdf[arb_cols])

#3

For this case it would probably be better to use applymap if you need to apply a custom function

对于这种情况，如果需要应用自定义函数，最好使用applymap

pdf[arb_cols] = pdf[arb_cols].applymap(lambda x : 99 if x == 2 else x)

#1