I have data points that represent a coordinates for a 2D array (matrix). The points are regularly gridded, except that data points are missing from some grid positions.
我有数据点代表2D数组(矩阵)的坐标。这些点经常被网格化,除了某些网格位置缺少数据点。
For example, consider some XYZ data that fits on a regular 0.1 grid with shape (3, 4). There are gaps and missing points, so there are 5 points, and not 12:
例如,考虑一些XYZ数据适合于具有形状(3,4)的常规0.1网格。有缺口和缺失点,所以有5分,而不是12分:
import numpy as np
X = np.array([0.4, 0.5, 0.4, 0.4, 0.7])
Y = np.array([1.0, 1.0, 1.1, 1.2, 1.2])
Z = np.array([3.3, 2.5, 3.6, 3.8, 1.8])
# Evaluate the regular grid dimension values
Xr = np.linspace(X.min(), X.max(), np.round((X.max() - X.min()) / np.diff(np.unique(X)).min()) + 1)
Yr = np.linspace(Y.min(), Y.max(), np.round((Y.max() - Y.min()) / np.diff(np.unique(Y)).min()) + 1)
print('Xr={0}; Yr={1}'.format(Xr, Yr))
# Xr=[ 0.4 0.5 0.6 0.7]; Yr=[ 1. 1.1 1.2]
What I would like to see is shown in this image (backgrounds: black=base-0 index; grey=coordinate value; colour=matrix value; white=missing).
我希望看到的是这张图片(背景:黑色=基础0指数;灰色=坐标值;颜色=矩阵值;白色=缺失)。
Here's what I have, which is intuitive with a for loop:
这就是我所拥有的,这对于for循环来说是直观的:
ar = np.ma.array(np.zeros((len(Yr), len(Xr)), dtype=Z.dtype), mask=True)
for x, y, z in zip(X, Y, Z):
j = (np.abs(Xr - x)).argmin()
i = (np.abs(Yr - y)).argmin()
ar[i, j] = z
print(ar)
# [[3.3 2.5 -- --]
# [3.6 -- -- --]
# [3.8 -- -- 1.8]]
Is there a more NumPythonic way of vectorising the approach to return a 2D array ar
? Or is the for loop necessary?
有没有更多的NumPythonic方法来矢量化返回2D阵列的方法?或者for循环是否必要?
4 个解决方案
#1
7
You can do it on one line with np.histogram2d
你可以用np.histogram2d在一行上完成
data = np.histogram2d(Y, X, bins=[len(Yr),len(Xr)], weights=Z)
print(data[0])
[[ 3.3 2.5 0. 0. ]
[ 3.6 0. 0. 0. ]
[ 3.8 0. 0. 1.8]]
#2
2
You can use X
and Y
to create the X-Y coordinates on a 0.1
spaced grid extending from the min to max of X
and min to max of Y
and then inserting Z's
into those specific positions. This would avoid using linspace
to get Xr
and Yr
and as such must be quite efficient. Here's the implementation -
您可以使用X和Y在0.1间距网格上创建X-Y坐标,该网格从最小值X到最小值X延伸到最大值Y,然后将Z插入这些特定位置。这将避免使用linspace来获得Xr和Yr,因此必须非常有效。这是实施 -
def indexing_based(X,Y,Z):
# Convert X's and Y's to indices on a 0.1 spaced grid
X_int = np.round((X*10)).astype(int)
Y_int = np.round((Y*10)).astype(int)
X_idx = X_int - X_int.min()
Y_idx = Y_int - Y_int.min()
# Setup output array and index it with X_idx & Y_idx to set those as Z
out = np.zeros((Y_idx.max()+1,X_idx.max()+1))
out[Y_idx,X_idx] = Z
return out
Runtime tests -
运行时测试 -
This section compare the indexing-based
approach against the other np.histogram2d
based solution for performance -
本节将基于索引的方法与其他基于np.histogram2d的性能解决方案进行比较 -
In [132]: # Create unique couples X-Y (as needed to work with histogram2d)
...: data = np.random.randint(0,1000,(5000,2))
...: data1 = data[np.lexsort(data.T),:]
...: mask = ~np.all(np.diff(data1,axis=0)==0,axis=1)
...: data2 = data1[np.append([True],mask)]
...:
...: X = (data2[:,0]).astype(float)/10
...: Y = (data2[:,1]).astype(float)/10
...: Z = np.random.randint(0,1000,(X.size))
...:
In [133]: def histogram_based(X,Y,Z): # From other np.histogram2d based solution
...: Xr = np.linspace(X.min(), X.max(), np.round((X.max() - X.min()) / np.diff(np.unique(X)).min()) + 1)
...: Yr = np.linspace(Y.min(), Y.max(), np.round((Y.max() - Y.min()) / np.diff(np.unique(Y)).min()) + 1)
...: data = np.histogram2d(Y, X, bins=[len(Yr),len(Xr)], weights=Z)
...: return data[0]
...:
In [134]: %timeit histogram_based(X,Y,Z)
10 loops, best of 3: 22.8 ms per loop
In [135]: %timeit indexing_based(X,Y,Z)
100 loops, best of 3: 2.11 ms per loop
#3
1
You could use a scipy coo_matrix. It allows you to construct a sparse matrix from coordinates and data. See examples on the attached link.
你可以使用scipy coo_matrix。它允许您根据坐标和数据构造稀疏矩阵。请参阅所附链接上的示例。
http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.coo_matrix.html
Hope that helps.
希望有所帮助。
#4
1
The sparse
matrix is the first solution that came to mind, but since X
and Y
are floats, it's a little messy:
稀疏矩阵是我想到的第一个解决方案,但由于X和Y是浮点数,所以它有点乱:
In [624]: I=((X-.4)*10).round().astype(int)
In [625]: J=((Y-1)*10).round().astype(int)
In [626]: I,J
Out[626]: (array([0, 1, 0, 0, 3]), array([0, 0, 1, 2, 2]))
In [627]: sparse.coo_matrix((Z,(J,I))).A
Out[627]:
array([[ 3.3, 2.5, 0. , 0. ],
[ 3.6, 0. , 0. , 0. ],
[ 3.8, 0. , 0. , 1.8]])
It still needs, in one way or other, to match those coordinates with [0,1,2...] indexes. My quick cheat was to just scale the values linearly. Even so I had to take care when converting floats to ints.
它仍然需要以某种方式将这些坐标与[0,1,2 ...]索引相匹配。我的快速欺骗就是线性地扩展数值。即使如此,在将浮子转换为整数时我也要小心。
sparse.coo_matrix
works because a natural way of defining a sparse matrix is with (i, j, data)
tuples, which of course can be translated to I
, J
, Data
lists or arrays.
sparse.coo_matrix的工作原理是因为定义稀疏矩阵的一种自然方式是使用(i,j,data)元组,这当然可以转换为I,J,数据列表或数组。
I rather like the historgram solution, even though I haven't had occasion to use it.
我更喜欢历史解决方案,即使我没有机会使用它。
#1
7
You can do it on one line with np.histogram2d
你可以用np.histogram2d在一行上完成
data = np.histogram2d(Y, X, bins=[len(Yr),len(Xr)], weights=Z)
print(data[0])
[[ 3.3 2.5 0. 0. ]
[ 3.6 0. 0. 0. ]
[ 3.8 0. 0. 1.8]]
#2
2
You can use X
and Y
to create the X-Y coordinates on a 0.1
spaced grid extending from the min to max of X
and min to max of Y
and then inserting Z's
into those specific positions. This would avoid using linspace
to get Xr
and Yr
and as such must be quite efficient. Here's the implementation -
您可以使用X和Y在0.1间距网格上创建X-Y坐标,该网格从最小值X到最小值X延伸到最大值Y,然后将Z插入这些特定位置。这将避免使用linspace来获得Xr和Yr,因此必须非常有效。这是实施 -
def indexing_based(X,Y,Z):
# Convert X's and Y's to indices on a 0.1 spaced grid
X_int = np.round((X*10)).astype(int)
Y_int = np.round((Y*10)).astype(int)
X_idx = X_int - X_int.min()
Y_idx = Y_int - Y_int.min()
# Setup output array and index it with X_idx & Y_idx to set those as Z
out = np.zeros((Y_idx.max()+1,X_idx.max()+1))
out[Y_idx,X_idx] = Z
return out
Runtime tests -
运行时测试 -
This section compare the indexing-based
approach against the other np.histogram2d
based solution for performance -
本节将基于索引的方法与其他基于np.histogram2d的性能解决方案进行比较 -
In [132]: # Create unique couples X-Y (as needed to work with histogram2d)
...: data = np.random.randint(0,1000,(5000,2))
...: data1 = data[np.lexsort(data.T),:]
...: mask = ~np.all(np.diff(data1,axis=0)==0,axis=1)
...: data2 = data1[np.append([True],mask)]
...:
...: X = (data2[:,0]).astype(float)/10
...: Y = (data2[:,1]).astype(float)/10
...: Z = np.random.randint(0,1000,(X.size))
...:
In [133]: def histogram_based(X,Y,Z): # From other np.histogram2d based solution
...: Xr = np.linspace(X.min(), X.max(), np.round((X.max() - X.min()) / np.diff(np.unique(X)).min()) + 1)
...: Yr = np.linspace(Y.min(), Y.max(), np.round((Y.max() - Y.min()) / np.diff(np.unique(Y)).min()) + 1)
...: data = np.histogram2d(Y, X, bins=[len(Yr),len(Xr)], weights=Z)
...: return data[0]
...:
In [134]: %timeit histogram_based(X,Y,Z)
10 loops, best of 3: 22.8 ms per loop
In [135]: %timeit indexing_based(X,Y,Z)
100 loops, best of 3: 2.11 ms per loop
#3
1
You could use a scipy coo_matrix. It allows you to construct a sparse matrix from coordinates and data. See examples on the attached link.
你可以使用scipy coo_matrix。它允许您根据坐标和数据构造稀疏矩阵。请参阅所附链接上的示例。
http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.sparse.coo_matrix.html
Hope that helps.
希望有所帮助。
#4
1
The sparse
matrix is the first solution that came to mind, but since X
and Y
are floats, it's a little messy:
稀疏矩阵是我想到的第一个解决方案,但由于X和Y是浮点数,所以它有点乱:
In [624]: I=((X-.4)*10).round().astype(int)
In [625]: J=((Y-1)*10).round().astype(int)
In [626]: I,J
Out[626]: (array([0, 1, 0, 0, 3]), array([0, 0, 1, 2, 2]))
In [627]: sparse.coo_matrix((Z,(J,I))).A
Out[627]:
array([[ 3.3, 2.5, 0. , 0. ],
[ 3.6, 0. , 0. , 0. ],
[ 3.8, 0. , 0. , 1.8]])
It still needs, in one way or other, to match those coordinates with [0,1,2...] indexes. My quick cheat was to just scale the values linearly. Even so I had to take care when converting floats to ints.
它仍然需要以某种方式将这些坐标与[0,1,2 ...]索引相匹配。我的快速欺骗就是线性地扩展数值。即使如此,在将浮子转换为整数时我也要小心。
sparse.coo_matrix
works because a natural way of defining a sparse matrix is with (i, j, data)
tuples, which of course can be translated to I
, J
, Data
lists or arrays.
sparse.coo_matrix的工作原理是因为定义稀疏矩阵的一种自然方式是使用(i,j,data)元组,这当然可以转换为I,J,数据列表或数组。
I rather like the historgram solution, even though I haven't had occasion to use it.
我更喜欢历史解决方案,即使我没有机会使用它。