I have a lot of data in database under (x, y, value) triplet form.
I would like to be able to create dynamically a 2d numpy array from this data by setting value
at the coords (x,y)
of the array.
我在(x,y,value)三元组形式的数据库中有很多数据。我希望能够通过在数组的坐标(x,y)处设置值,从该数据动态创建2d numpy数组。
For instance if I have :
例如,如果我有:
(0,0,8)
(0,1,5)
(0,2,3)
(1,0,4)
(1,1,0)
(1,2,0)
(2,0,1)
(2,1,2)
(2,2,5)
The resulting array should be :
结果数组应该是:
Array([[8,5,3],[4,0,0],[1,2,5]])
I'm new to numpy, is there any method in numpy to do so ? If not, what approach would you advice to do this ?
我是numpy的新手,有什么方法可以这么做吗?如果没有,您会建议采取什么方法?
3 个解决方案
#1
3
Extending the answer from @MaxU, in case the coordinates are not ordered in a grid fashion (or in case some coordinates are missing), you can create your array as follows:
从@MaxU扩展答案,如果坐标没有以网格方式排序(或者如果缺少某些坐标),您可以按如下方式创建数组:
import numpy as np
a = np.array([(0,0,8),(0,1,5),(0,2,3),
(1,0,4),(1,1,0),(1,2,0),
(2,0,1),(2,1,2),(2,2,5)])
Here a
represents your coordinates. It is an (N, 3)
array, where N
is the number of coordinates (it doesn't have to contain ALL the coordinates). The first column of a
(a[:, 0]
) contains the Y positions while the second columne (a[:, 1]
) contains the X positions. Similarly, the last column (a[:, 2]
) contains your values.
这里a代表你的坐标。它是一个(N,3)数组,其中N是坐标数(它不必包含所有坐标)。 a(a [:,0])的第一列包含Y位置,而第二列(a [:,1])包含X位置。同样,最后一列(a [:,2])包含您的值。
Then you can extract the maximum dimensions of your target array:
然后,您可以提取目标数组的最大尺寸:
# Maximum Y and X coordinates
ymax = a[:, 0].max()
xmax = a[:, 1].max()
# Target array
target = np.zeros((ymax+1, xmax+1), a.dtype)
And finally, fill the array with data from your coordinates:
最后,使用坐标中的数据填充数组:
target[a[:, 0], a[:, 1]] = a[:, 2]
The line above sets values in target
at a[:, 0]
(all Y) and a[:, 1]
(all X) locations to their corresponding a[:, 2]
value (your value).
上面的行将[:,0](所有Y)和[:,1](所有X)位置的目标值设置为相应的[:,2]值(您的值)。
>>> target
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
Additionally, if you have missing coordinates, and you want to replace those missing values by some number, you can initialize the array as:
此外,如果您缺少坐标,并且想要用某个数字替换这些缺失值,则可以将数组初始化为:
default_value = -1
target = np.full((ymax+1, xmax+1), default_value, a.type)
This way, the coordinates not present in your list will be filled with -1
in the target array/
这样,列表中不存在的坐标将在目标数组中填充-1
#2
2
is that what you want?
那是你要的吗?
In [37]: a = np.array([(0,0,8)
....: ,(0,1,5)
....: ,(0,2,3)
....: ,(1,0,4)
....: ,(1,1,0)
....: ,(1,2,0)
....: ,(2,0,1)
....: ,(2,1,2)
....: ,(2,2,5)])
In [38]:
In [38]: a
Out[38]:
array([[0, 0, 8],
[0, 1, 5],
[0, 2, 3],
[1, 0, 4],
[1, 1, 0],
[1, 2, 0],
[2, 0, 1],
[2, 1, 2],
[2, 2, 5]])
In [39]:
In [39]: a[:, 2].reshape(3,len(a)//3)
Out[39]:
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
or a bit more flexible (after your comment):
或者更灵活(在你的评论之后):
In [48]: a[:, 2].reshape([int(len(a) ** .5)] * 2)
Out[48]:
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
Explanation:
说明:
this gives you the 3rd column (value):
这给你第3列(值):
In [42]: a[:, 2]
Out[42]: array([8, 5, 3, 4, 0, 0, 1, 2, 5])
In [49]: [int(len(a) ** .5)]
Out[49]: [3]
In [50]: [int(len(a) ** .5)] * 2
Out[50]: [3, 3]
#3
2
Why not using sparse matrices? (which is pretty much the format of your triplets.)
为什么不使用稀疏矩阵? (这几乎是你的三胞胎的格式。)
First split the triplets in rows, columns, and data using numpy.hsplit()
. (Use numpy.squeeze()
to convert the resulting 2d arrays to 1d arrays.)
首先使用numpy.hsplit()在行,列和数据中拆分三元组。 (使用numpy.squeeze()将生成的2d数组转换为1d数组。)
>>> row, col, data = [np.squeeze(splt) for splt
... in np.hsplit(tripets, tripets.shape[-1])]
Use the sparse matrix in coordinate format, and convert it to an array.
以坐标格式使用稀疏矩阵,并将其转换为数组。
>>> from scipy.sparse import coo_matrix
>>> coo_matrix((data, (row, col))).toarray()
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
#1
3
Extending the answer from @MaxU, in case the coordinates are not ordered in a grid fashion (or in case some coordinates are missing), you can create your array as follows:
从@MaxU扩展答案,如果坐标没有以网格方式排序(或者如果缺少某些坐标),您可以按如下方式创建数组:
import numpy as np
a = np.array([(0,0,8),(0,1,5),(0,2,3),
(1,0,4),(1,1,0),(1,2,0),
(2,0,1),(2,1,2),(2,2,5)])
Here a
represents your coordinates. It is an (N, 3)
array, where N
is the number of coordinates (it doesn't have to contain ALL the coordinates). The first column of a
(a[:, 0]
) contains the Y positions while the second columne (a[:, 1]
) contains the X positions. Similarly, the last column (a[:, 2]
) contains your values.
这里a代表你的坐标。它是一个(N,3)数组,其中N是坐标数(它不必包含所有坐标)。 a(a [:,0])的第一列包含Y位置,而第二列(a [:,1])包含X位置。同样,最后一列(a [:,2])包含您的值。
Then you can extract the maximum dimensions of your target array:
然后,您可以提取目标数组的最大尺寸:
# Maximum Y and X coordinates
ymax = a[:, 0].max()
xmax = a[:, 1].max()
# Target array
target = np.zeros((ymax+1, xmax+1), a.dtype)
And finally, fill the array with data from your coordinates:
最后,使用坐标中的数据填充数组:
target[a[:, 0], a[:, 1]] = a[:, 2]
The line above sets values in target
at a[:, 0]
(all Y) and a[:, 1]
(all X) locations to their corresponding a[:, 2]
value (your value).
上面的行将[:,0](所有Y)和[:,1](所有X)位置的目标值设置为相应的[:,2]值(您的值)。
>>> target
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
Additionally, if you have missing coordinates, and you want to replace those missing values by some number, you can initialize the array as:
此外,如果您缺少坐标,并且想要用某个数字替换这些缺失值,则可以将数组初始化为:
default_value = -1
target = np.full((ymax+1, xmax+1), default_value, a.type)
This way, the coordinates not present in your list will be filled with -1
in the target array/
这样,列表中不存在的坐标将在目标数组中填充-1
#2
2
is that what you want?
那是你要的吗?
In [37]: a = np.array([(0,0,8)
....: ,(0,1,5)
....: ,(0,2,3)
....: ,(1,0,4)
....: ,(1,1,0)
....: ,(1,2,0)
....: ,(2,0,1)
....: ,(2,1,2)
....: ,(2,2,5)])
In [38]:
In [38]: a
Out[38]:
array([[0, 0, 8],
[0, 1, 5],
[0, 2, 3],
[1, 0, 4],
[1, 1, 0],
[1, 2, 0],
[2, 0, 1],
[2, 1, 2],
[2, 2, 5]])
In [39]:
In [39]: a[:, 2].reshape(3,len(a)//3)
Out[39]:
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
or a bit more flexible (after your comment):
或者更灵活(在你的评论之后):
In [48]: a[:, 2].reshape([int(len(a) ** .5)] * 2)
Out[48]:
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])
Explanation:
说明:
this gives you the 3rd column (value):
这给你第3列(值):
In [42]: a[:, 2]
Out[42]: array([8, 5, 3, 4, 0, 0, 1, 2, 5])
In [49]: [int(len(a) ** .5)]
Out[49]: [3]
In [50]: [int(len(a) ** .5)] * 2
Out[50]: [3, 3]
#3
2
Why not using sparse matrices? (which is pretty much the format of your triplets.)
为什么不使用稀疏矩阵? (这几乎是你的三胞胎的格式。)
First split the triplets in rows, columns, and data using numpy.hsplit()
. (Use numpy.squeeze()
to convert the resulting 2d arrays to 1d arrays.)
首先使用numpy.hsplit()在行,列和数据中拆分三元组。 (使用numpy.squeeze()将生成的2d数组转换为1d数组。)
>>> row, col, data = [np.squeeze(splt) for splt
... in np.hsplit(tripets, tripets.shape[-1])]
Use the sparse matrix in coordinate format, and convert it to an array.
以坐标格式使用稀疏矩阵,并将其转换为数组。
>>> from scipy.sparse import coo_matrix
>>> coo_matrix((data, (row, col))).toarray()
array([[8, 5, 3],
[4, 0, 0],
[1, 2, 5]])