I use Python and have an array with values 1.0 , 2.0 , 3.0 , 4.0 , 5.0 , 6.0 and np.nan as NoData.
我使用Python并拥有一个值为1.0,2.0,3.0,4.0,5.0,6.0和np.nan为NoData的数组。
I want to fill all "nan" with a value. This value should be the majority of the surrounding values.
我想用一个值填充所有“nan”。该值应该是周围值的大部分。
For example:
1 1 1 1 1
1 n 1 2 2
1 3 3 2 1
1 3 2 3 1
"n" shall present "nan" in this example. The majority of its neighbors have the value 1. Thus, "nan" shall get replaced by value 1.
在该示例中,“n”将呈现“nan”。其大多数邻居的值为1.因此,“nan”将被值1替换。
Note, that the holes consisting of "nan" can be of the size 1 to 5. For example (maximum size of 5 nan):
注意,由“nan”组成的孔的尺寸可以是1到5.例如(最大尺寸为5纳米):
1 1 1 1 1
1 n n n 2
1 n n 2 1
1 3 2 3 1
Here the hole of "nan" have the following surrounding values:
这里“nan”的洞有以下周围的值:
surrounding_values = [1,1,1,1,1,2,1,2,3,2,3,1,1,1] -> Majority = 1
I tried the following code:
我尝试了以下代码:
from sklearn.preprocessing import Imputer
array = np.array(.......) #consisting of 1.0-6.0 & np.nan
imp = Imputer(strategy="most_frequent")
fill = imp.fit_transform(array)
This works pretty good. However, it only uses one axis (0 = column, 1 = row). The default is 0 (column), so it uses the majority of the surrounding values of the same column. For example:
这非常好用。但是,它只使用一个轴(0 =列,1 =行)。默认值为0(列),因此它使用同一列的大多数周围值。例如:
Array
2 1 2 1 1
2 n 2 2 2
2 1 2 2 1
1 3 2 3 1
Filled Array
2 1 2 1 1
2 1 2 2 2
2 1 2 2 1
1 3 2 3 1
So here you see, although the majority is 2, the majority of the surrounding column-values is 1 and thus it becomes 1 instead of 2.
所以在这里你可以看到,虽然大多数是2,但是大多数周围的列值都是1,因此它变为1而不是2。
As a result, I need to find another method using python. Any suggestions or ideas?
因此,我需要使用python找到另一种方法。有什么建议或想法吗?
SUPPLEMENT:
Here you see the result, after I added the very helpfull improvement of Martin Valgur.
在我添加了Martin Valgur非常有帮助的改进后,你会看到结果。
Think of "0" as sea (blue) and of the other values (> 0) as land (red).
将“0”视为海(蓝色),将其他值(> 0)视为陆地(红色)。
If there is a "little" sea surrounded by land (the sea can again have the size 1-5 px) it will get land, as you can successfully see in the result-image. If the surrounded sea is bigger than 5px or outside the land, the sea wont gain land (This is not visible in the image, because it is not the case).
如果有一个被陆地包围的“小”海(海洋的大小可以再次为1-5 px),那么它将获得陆地,因为您可以在结果图像中成功查看。如果被包围的海域大于5px或者在陆地之外,海洋将不会获得陆地(这在图像中是不可见的,因为事实并非如此)。
If there is 1px "nan" with more majority of sea than land, it will still become land (In this example it has 50/50).
如果有1px“nan”,其中大部分海洋比陆地多,它仍将成为陆地(在这个例子中它有50/50)。
The following picture shows what I need. At the border between sea (value=0) and land (value>0), the "nan"-pixel needs to get the value of the majority of the land-values.
下图显示了我的需求。在海(值= 0)和陆地(值> 0)之间的边界处,“纳”像素需要获得大多数地值的值。
That sounds difficult and I hope that I could explain it vividly.
这听起来很难,我希望我能够生动地解释它。
3 个解决方案
#1
2
A possible solution using label()
and binary_dilation()
from scipy.ndimage
:
使用scipy.ndimage中的label()和binary_dilation()的可能解决方案:
import numpy as np
from scipy.ndimage import label, binary_dilation
from collections import Counter
def impute(arr):
imputed_array = np.copy(arr)
mask = np.isnan(arr)
labels, count = label(mask)
for idx in range(1, count + 1):
hole = labels == idx
surrounding_values = arr[binary_dilation(hole) & ~hole]
most_frequent = Counter(surrounding_values).most_common(1)[0][0]
imputed_array[hole] = most_frequent
return imputed_array
EDIT: Regarding your loosely-related follow-up question, you can extend the above code to achieve what you are after:
编辑:关于你的松散相关的后续问题,你可以扩展上面的代码来实现你的目标:
import numpy as np
from scipy.ndimage import label, binary_dilation, binary_closing
def fill_land(arr):
output = np.copy(arr)
# Fill NaN-s
mask = np.isnan(arr)
labels, count = label(mask)
for idx in range(1, count + 1):
hole = labels == idx
surrounding_values = arr[binary_dilation(hole) & ~hole]
output[hole] = any(surrounding_values)
# Fill lakes
land = output.astype(bool)
lakes = binary_closing(land) & ~land
labels, count = label(lakes)
for idx in range(1, count + 1):
lake = labels == idx
output[lake] = lake.sum() < 6
return output
#2
1
i dont found any lib, so i wrote a function, if case all None in the middle of the array you can use these
我没有找到任何lib,所以我写了一个函数,如果在数组中间的所有None都可以使用这些
import numpy as np
from collections import Counter
def getModulusSurround(data):
tempdata = list(filter(lambda x: x, data))
c = Counter(tempdata)
if c.most_common(1)[0][0]:
return(c.most_common(1)[0][0])
def main():
array = [[1, 2, 2, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, None, 6, 7],
[1, 4, 2, 3, 4],
[4, 6, 2, 2, 4]]
array = np.array(array)
for i in range(5):
for j in range(5):
if array[i,j] == None:
temparray = array[i-1:i+2,j-1:j+2]
array[i,j] = getModulusSurround(temparray.flatten())
print(array)
main()
#3
0
After the incredible help of Martin Valgur, I have the result I need.
在Martin Valgur的不可思议的帮助下,我得到了我需要的结果。
Therefore, I added the following lines to Martins code:
因此,我在Martins代码中添加了以下行:
from scipy.ndimage import label, binary_dilation
from scipy.stats import mode
def impute(arr):
imputed_array = np.copy(arr)
mask = np.isnan(arr)
labels, count = label(mask)
for idx in range(1, count + 1):
hole = labels == idx
surrounding_values = arr[binary_dilation(hole) & ~hole]
sv_list = np.ndarray.tolist(surrounding_values) #!
for sv in sv_list: #!
if sv == 0:
sv_list.remove(sv)
surrounding_values = np.array(sv_list)
imputed_array[hole] = mode(surrounding_values).mode[0]
return imputed_array
#1
2
A possible solution using label()
and binary_dilation()
from scipy.ndimage
:
使用scipy.ndimage中的label()和binary_dilation()的可能解决方案:
import numpy as np
from scipy.ndimage import label, binary_dilation
from collections import Counter
def impute(arr):
imputed_array = np.copy(arr)
mask = np.isnan(arr)
labels, count = label(mask)
for idx in range(1, count + 1):
hole = labels == idx
surrounding_values = arr[binary_dilation(hole) & ~hole]
most_frequent = Counter(surrounding_values).most_common(1)[0][0]
imputed_array[hole] = most_frequent
return imputed_array
EDIT: Regarding your loosely-related follow-up question, you can extend the above code to achieve what you are after:
编辑:关于你的松散相关的后续问题,你可以扩展上面的代码来实现你的目标:
import numpy as np
from scipy.ndimage import label, binary_dilation, binary_closing
def fill_land(arr):
output = np.copy(arr)
# Fill NaN-s
mask = np.isnan(arr)
labels, count = label(mask)
for idx in range(1, count + 1):
hole = labels == idx
surrounding_values = arr[binary_dilation(hole) & ~hole]
output[hole] = any(surrounding_values)
# Fill lakes
land = output.astype(bool)
lakes = binary_closing(land) & ~land
labels, count = label(lakes)
for idx in range(1, count + 1):
lake = labels == idx
output[lake] = lake.sum() < 6
return output
#2
1
i dont found any lib, so i wrote a function, if case all None in the middle of the array you can use these
我没有找到任何lib,所以我写了一个函数,如果在数组中间的所有None都可以使用这些
import numpy as np
from collections import Counter
def getModulusSurround(data):
tempdata = list(filter(lambda x: x, data))
c = Counter(tempdata)
if c.most_common(1)[0][0]:
return(c.most_common(1)[0][0])
def main():
array = [[1, 2, 2, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, None, 6, 7],
[1, 4, 2, 3, 4],
[4, 6, 2, 2, 4]]
array = np.array(array)
for i in range(5):
for j in range(5):
if array[i,j] == None:
temparray = array[i-1:i+2,j-1:j+2]
array[i,j] = getModulusSurround(temparray.flatten())
print(array)
main()
#3
0
After the incredible help of Martin Valgur, I have the result I need.
在Martin Valgur的不可思议的帮助下,我得到了我需要的结果。
Therefore, I added the following lines to Martins code:
因此,我在Martins代码中添加了以下行:
from scipy.ndimage import label, binary_dilation
from scipy.stats import mode
def impute(arr):
imputed_array = np.copy(arr)
mask = np.isnan(arr)
labels, count = label(mask)
for idx in range(1, count + 1):
hole = labels == idx
surrounding_values = arr[binary_dilation(hole) & ~hole]
sv_list = np.ndarray.tolist(surrounding_values) #!
for sv in sv_list: #!
if sv == 0:
sv_list.remove(sv)
surrounding_values = np.array(sv_list)
imputed_array[hole] = mode(surrounding_values).mode[0]
return imputed_array