Python2.7:无法使用np.where和np.nan方法创建空值

时间:2022-03-20 20:20:29

I'm having difficulty resolving an issue whereby after using np.where to compare 2 row values within a column (position), with the result being assigned to a new column (null value is created if condition is false), i am unable to use fillna method to replace the null values with the values of the newly created columns

我很难解决一个问题,即在使用np.where来比较列(位置)中的2个行值,并将结果分配给新列(如果condition为false则创建空值),我无法解决使用fillna方法将空值替换为新创建的列的值

Upon using df.isnull().sum() to check for null values, the results show that there are no null values for the newly created columns (even though i have used np.nan)

在使用df.isnull()。sum()检查空值时,结果显示新创建的列没有空值(即使我使用了np.nan)

In summary, I want to merge the values within the 3 columns: clear lap, overtaken, overtook.

总之,我想合并3列中的值:清除圈,超越,超越。

df['clear lap?'] = np.where((df['position'] == df['position'].shift()), str("clear"), np.nan)
df['overtaken'] = np.where((df['position'] > df['position'].shift()), str("got overtaken"), np.nan)
df['overtook'] = np.where((df['position'] < df['position'].shift()), str("overtook"), np.nan)

df['clear lap?'].fillna(df['overtaken'], inplace=True)
df['clear lap?'].fillna(df['overtook'], inplace=True)

Python2.7:无法使用np.where和np.nan方法创建空值

Python2.7:无法使用np.where和np.nan方法创建空值

2 个解决方案

#1


1  

Let's try an experiment.

我们来试试吧。

>>> v = np.random.choice(2, 10) 
>>> v
array([0, 0, 1, 1, 0, 0, 0, 1, 1, 0])

>>> np.where(v, 'overtook', np.nan)
array(['nan', 'nan', 'overtook', 'overtook', 'nan', 'nan', 'nan',
       'overtook', 'overtook', 'nan'],
      dtype='<U32')

Because np.where by default returns an array with homogenous dtypes, you have np.nan values coerced to strings, so you get 'nan' instead of NaN.

因为np.where默认返回一个具有同质dtypes的数组,所以你有np.nan值强制转换为字符串,所以你得到'nan'而不是NaN。

One workaround would be to perform substitution with a pd.Series object, like this -

一种解决方法是使用pd.Series对象执行替换,如下所示 -

>>> s = pd.Series(v)
>>> m = s.gt(0)
>>> s[m] = 'overtook'
>>> s[~m] = np.nan
s
0         NaN
1         NaN
2    overtook
3    overtook
4         NaN
5         NaN
6         NaN
7    overtook
8    overtook
9         NaN
dtype: object

#2


0  

COLDSPEED already explained what might happen. I found some other similar problems Numpy NaN returning as 'nan'

COLDSPEED已经解释了可能发生的事情。我发现其他一些类似的问题Numpy NaN作为'nan'返回

As suggested by jezrael in the question above, try to use

正如jezrael在上面的问题中所建议的,尝试使用

df = df.replace('nan', np.nan)

to fix this if you still want to use np.where

如果您仍想使用np.where,请解决此问题

Also, think you could use

另外,认为你可以使用

df.isin(["nan", np.nan])

or

df['clear lap?'].isin(["nan", np.nan])

to check whether there are "nan" created by accident in a series or your data frame.

检查系列或数据框中是否有意外创建的“nan”。

#1


1  

Let's try an experiment.

我们来试试吧。

>>> v = np.random.choice(2, 10) 
>>> v
array([0, 0, 1, 1, 0, 0, 0, 1, 1, 0])

>>> np.where(v, 'overtook', np.nan)
array(['nan', 'nan', 'overtook', 'overtook', 'nan', 'nan', 'nan',
       'overtook', 'overtook', 'nan'],
      dtype='<U32')

Because np.where by default returns an array with homogenous dtypes, you have np.nan values coerced to strings, so you get 'nan' instead of NaN.

因为np.where默认返回一个具有同质dtypes的数组,所以你有np.nan值强制转换为字符串,所以你得到'nan'而不是NaN。

One workaround would be to perform substitution with a pd.Series object, like this -

一种解决方法是使用pd.Series对象执行替换,如下所示 -

>>> s = pd.Series(v)
>>> m = s.gt(0)
>>> s[m] = 'overtook'
>>> s[~m] = np.nan
s
0         NaN
1         NaN
2    overtook
3    overtook
4         NaN
5         NaN
6         NaN
7    overtook
8    overtook
9         NaN
dtype: object

#2


0  

COLDSPEED already explained what might happen. I found some other similar problems Numpy NaN returning as 'nan'

COLDSPEED已经解释了可能发生的事情。我发现其他一些类似的问题Numpy NaN作为'nan'返回

As suggested by jezrael in the question above, try to use

正如jezrael在上面的问题中所建议的,尝试使用

df = df.replace('nan', np.nan)

to fix this if you still want to use np.where

如果您仍想使用np.where,请解决此问题

Also, think you could use

另外,认为你可以使用

df.isin(["nan", np.nan])

or

df['clear lap?'].isin(["nan", np.nan])

to check whether there are "nan" created by accident in a series or your data frame.

检查系列或数据框中是否有意外创建的“nan”。