如何用前面的数据来代替NaNs ?

时间:2021-09-25 19:32:41

Suppose I have a DataFrame with some NaNs:

假设我有一个带有NaNs的DataFrame:

>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df
    0   1   2
0   1   2   3
1   4 NaN NaN
2 NaN NaN   9

What I need to do is replace every NaN with the first non-NaN value in the same column above it. It is assumed that the first row will never contain a NaN. So for the previous example the result would be

我需要做的是用上面同一列中的第一个非NaN值替换每个NaN。假设第一行永远不会包含NaN。对于前面的例子,结果是

   0  1  2
0  1  2  3
1  4  2  3
2  4  2  9

I can just loop through the whole DataFrame column-by-column, element-by-element and set the values directly, but is there an easy (optimally a loop-free) way of achieving this?

我可以对整个DataFrame逐列、元素逐个元素进行循环,并直接设置值,但是否有一种简单(最优)的实现方法?

5 个解决方案

#1


73  

You could use the fillna method on the DataFrame and specify the method as ffill (forward fill):

您可以在DataFrame上使用fillna方法,指定方法为ffill (forward fill):

>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df.fillna(method='ffill')
   0  1  2
0  1  2  3
1  4  2  3
2  4  2  9

This method...

这个方法…

propagate[s] last valid observation forward to next valid

将最后的有效观察传播到下一个有效的。

To go the opposite way, there's also a bfill method.

相反,还有一个bfill方法。

This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True:

此方法不修改插入的DataFrame -您需要将返回的DataFrame重新绑定到一个变量,或者指定inplace=True:

df.fillna(method='ffill', inplace=True)

#2


10  

You can use pandas.DataFrame.fillna with the method='ffill' option. 'ffill' stands for 'forward fill' and will propagate last valid observation forward. The alternative is 'bfill' which works the same way, but backwards.

您可以使用pandas.DataFrame。使用方法='ffill'选项的fillna。“ffill”代表“forward fill”,将传播最后有效的观察内容。另一种选择是“bfill”,它的工作方式是相同的,但是是反向的。

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df = df.fillna(method='ffill')

print(df)
#   0  1  2
#0  1  2  3
#1  4  2  3
#2  4  2  9

There is also a direct synonym function for this, pandas.DataFrame.ffill, to make things simpler.

还有一个直接的同义词函数,pandas.DataFrame。让事情更简单。

#3


6  

The accepted answer is perfect. I had a related but slightly different situation where I had to fill in forward but only within groups. In case someone has the same need, know that fillna works on a DataFrameGroupBy object.

公认的答案是完美的。我有一个相关的,但稍微有点不同的情况,我必须填写前进,但只在小组内。如果有人有同样的需求,要知道fillna在DataFrameGroupBy对象上工作。

>>> example = pd.DataFrame({'number':[0,1,2,nan,4,nan,6,7,8,9],'name':list('aaabbbcccc')})
>>> example
  name  number
0    a     0.0
1    a     1.0
2    a     2.0
3    b     NaN
4    b     4.0
5    b     NaN
6    c     6.0
7    c     7.0
8    c     8.0
9    c     9.0
>>> example.groupby('name')['number'].fillna(method='ffill') # fill in row 5 but not row 3
0    0.0
1    1.0
2    2.0
3    NaN
4    4.0
5    4.0
6    6.0
7    7.0
8    8.0
9    9.0
Name: number, dtype: float64

#4


4  

One thing that I noticed when trying this solution is that if you have N/A at the start or the end of the array, ffill and bfill don't quite work. You need both.

我在尝试这个解决方案时注意到的一件事是如果在数组的开始或结束处有N/A, ffill和bfill不能正常工作。你需要两个。

In [224]: df = pd.DataFrame([None, 1, 2, 3, None, 4, 5, 6, None])

In [225]: df.ffill()
Out[225]:
     0
0  NaN
1  1.0
...
7  6.0
8  6.0

In [226]: df.bfill()
Out[226]:
     0
0  1.0
1  1.0
...
7  6.0
8  NaN

In [227]: df.bfill().ffill()
Out[227]:
     0
0  1.0
1  1.0
...
7  6.0
8  6.0

#5


0  

ffill now has it's own method pd.DataFrame.ffill

ffill现在有自己的方法pd.DataFrame.ffill

df.ffill()

     0    1    2
0  1.0  2.0  3.0
1  4.0  2.0  3.0
2  4.0  2.0  9.0

#1


73  

You could use the fillna method on the DataFrame and specify the method as ffill (forward fill):

您可以在DataFrame上使用fillna方法,指定方法为ffill (forward fill):

>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df.fillna(method='ffill')
   0  1  2
0  1  2  3
1  4  2  3
2  4  2  9

This method...

这个方法…

propagate[s] last valid observation forward to next valid

将最后的有效观察传播到下一个有效的。

To go the opposite way, there's also a bfill method.

相反,还有一个bfill方法。

This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True:

此方法不修改插入的DataFrame -您需要将返回的DataFrame重新绑定到一个变量,或者指定inplace=True:

df.fillna(method='ffill', inplace=True)

#2


10  

You can use pandas.DataFrame.fillna with the method='ffill' option. 'ffill' stands for 'forward fill' and will propagate last valid observation forward. The alternative is 'bfill' which works the same way, but backwards.

您可以使用pandas.DataFrame。使用方法='ffill'选项的fillna。“ffill”代表“forward fill”,将传播最后有效的观察内容。另一种选择是“bfill”,它的工作方式是相同的,但是是反向的。

import pandas as pd

df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
df = df.fillna(method='ffill')

print(df)
#   0  1  2
#0  1  2  3
#1  4  2  3
#2  4  2  9

There is also a direct synonym function for this, pandas.DataFrame.ffill, to make things simpler.

还有一个直接的同义词函数,pandas.DataFrame。让事情更简单。

#3


6  

The accepted answer is perfect. I had a related but slightly different situation where I had to fill in forward but only within groups. In case someone has the same need, know that fillna works on a DataFrameGroupBy object.

公认的答案是完美的。我有一个相关的,但稍微有点不同的情况,我必须填写前进,但只在小组内。如果有人有同样的需求,要知道fillna在DataFrameGroupBy对象上工作。

>>> example = pd.DataFrame({'number':[0,1,2,nan,4,nan,6,7,8,9],'name':list('aaabbbcccc')})
>>> example
  name  number
0    a     0.0
1    a     1.0
2    a     2.0
3    b     NaN
4    b     4.0
5    b     NaN
6    c     6.0
7    c     7.0
8    c     8.0
9    c     9.0
>>> example.groupby('name')['number'].fillna(method='ffill') # fill in row 5 but not row 3
0    0.0
1    1.0
2    2.0
3    NaN
4    4.0
5    4.0
6    6.0
7    7.0
8    8.0
9    9.0
Name: number, dtype: float64

#4


4  

One thing that I noticed when trying this solution is that if you have N/A at the start or the end of the array, ffill and bfill don't quite work. You need both.

我在尝试这个解决方案时注意到的一件事是如果在数组的开始或结束处有N/A, ffill和bfill不能正常工作。你需要两个。

In [224]: df = pd.DataFrame([None, 1, 2, 3, None, 4, 5, 6, None])

In [225]: df.ffill()
Out[225]:
     0
0  NaN
1  1.0
...
7  6.0
8  6.0

In [226]: df.bfill()
Out[226]:
     0
0  1.0
1  1.0
...
7  6.0
8  NaN

In [227]: df.bfill().ffill()
Out[227]:
     0
0  1.0
1  1.0
...
7  6.0
8  6.0

#5


0  

ffill now has it's own method pd.DataFrame.ffill

ffill现在有自己的方法pd.DataFrame.ffill

df.ffill()

     0    1    2
0  1.0  2.0  3.0
1  4.0  2.0  3.0
2  4.0  2.0  9.0