I'm having difficulty resolving an issue whereby after using np.where to compare 2 row values within a column (position), with the result being assigned to a new column (null value is created if condition is false), i am unable to use fillna method to replace the null values with the values of the newly created columns
我很难解决一个问题,即在使用np.where来比较列(位置)中的2个行值,并将结果分配给新列(如果condition为false则创建空值),我无法解决使用fillna方法将空值替换为新创建的列的值
Upon using df.isnull().sum() to check for null values, the results show that there are no null values for the newly created columns (even though i have used np.nan)
在使用df.isnull()。sum()检查空值时,结果显示新创建的列没有空值(即使我使用了np.nan)
In summary, I want to merge the values within the 3 columns: clear lap, overtaken, overtook.
总之,我想合并3列中的值:清除圈,超越,超越。
df['clear lap?'] = np.where((df['position'] == df['position'].shift()), str("clear"), np.nan)
df['overtaken'] = np.where((df['position'] > df['position'].shift()), str("got overtaken"), np.nan)
df['overtook'] = np.where((df['position'] < df['position'].shift()), str("overtook"), np.nan)
df['clear lap?'].fillna(df['overtaken'], inplace=True)
df['clear lap?'].fillna(df['overtook'], inplace=True)
2 个解决方案
#1
1
Let's try an experiment.
我们来试试吧。
>>> v = np.random.choice(2, 10)
>>> v
array([0, 0, 1, 1, 0, 0, 0, 1, 1, 0])
>>> np.where(v, 'overtook', np.nan)
array(['nan', 'nan', 'overtook', 'overtook', 'nan', 'nan', 'nan',
'overtook', 'overtook', 'nan'],
dtype='<U32')
Because np.where
by default returns an array with homogenous dtype
s, you have np.nan
values coerced to strings, so you get 'nan'
instead of NaN
.
因为np.where默认返回一个具有同质dtypes的数组,所以你有np.nan值强制转换为字符串,所以你得到'nan'而不是NaN。
One workaround would be to perform substitution with a pd.Series
object, like this -
一种解决方法是使用pd.Series对象执行替换,如下所示 -
>>> s = pd.Series(v)
>>> m = s.gt(0)
>>> s[m] = 'overtook'
>>> s[~m] = np.nan
s
0 NaN
1 NaN
2 overtook
3 overtook
4 NaN
5 NaN
6 NaN
7 overtook
8 overtook
9 NaN
dtype: object
#2
0
COLDSPEED already explained what might happen. I found some other similar problems Numpy NaN returning as 'nan'
COLDSPEED已经解释了可能发生的事情。我发现其他一些类似的问题Numpy NaN作为'nan'返回
As suggested by jezrael in the question above, try to use
正如jezrael在上面的问题中所建议的,尝试使用
df = df.replace('nan', np.nan)
to fix this if you still want to use np.where
如果您仍想使用np.where,请解决此问题
Also, think you could use
另外,认为你可以使用
df.isin(["nan", np.nan])
or
df['clear lap?'].isin(["nan", np.nan])
to check whether there are "nan"
created by accident in a series or your data frame.
检查系列或数据框中是否有意外创建的“nan”。
#1
1
Let's try an experiment.
我们来试试吧。
>>> v = np.random.choice(2, 10)
>>> v
array([0, 0, 1, 1, 0, 0, 0, 1, 1, 0])
>>> np.where(v, 'overtook', np.nan)
array(['nan', 'nan', 'overtook', 'overtook', 'nan', 'nan', 'nan',
'overtook', 'overtook', 'nan'],
dtype='<U32')
Because np.where
by default returns an array with homogenous dtype
s, you have np.nan
values coerced to strings, so you get 'nan'
instead of NaN
.
因为np.where默认返回一个具有同质dtypes的数组,所以你有np.nan值强制转换为字符串,所以你得到'nan'而不是NaN。
One workaround would be to perform substitution with a pd.Series
object, like this -
一种解决方法是使用pd.Series对象执行替换,如下所示 -
>>> s = pd.Series(v)
>>> m = s.gt(0)
>>> s[m] = 'overtook'
>>> s[~m] = np.nan
s
0 NaN
1 NaN
2 overtook
3 overtook
4 NaN
5 NaN
6 NaN
7 overtook
8 overtook
9 NaN
dtype: object
#2
0
COLDSPEED already explained what might happen. I found some other similar problems Numpy NaN returning as 'nan'
COLDSPEED已经解释了可能发生的事情。我发现其他一些类似的问题Numpy NaN作为'nan'返回
As suggested by jezrael in the question above, try to use
正如jezrael在上面的问题中所建议的,尝试使用
df = df.replace('nan', np.nan)
to fix this if you still want to use np.where
如果您仍想使用np.where,请解决此问题
Also, think you could use
另外,认为你可以使用
df.isin(["nan", np.nan])
or
df['clear lap?'].isin(["nan", np.nan])
to check whether there are "nan"
created by accident in a series or your data frame.
检查系列或数据框中是否有意外创建的“nan”。