Background: The data I'm using is being extracted from a netCDF4
object, which creates a numpy masked array at initialization, but does not appear to support the numpy reshape()
method, making it only possible to reshape after all the data has been copied = way too slow.
背景:我正在使用的数据是从netCDF4对象中提取的,它在初始化时创建了一个numpy屏蔽数组,但似乎没有支持numpy整形()方法,这使得在所有数据被复制后,只可能重新进行重构。
Question: How can I sub-sample a 1-D array, that is basically a flattened 2-D array, without reshaping it?
问:我怎样才能对一个一维数组进行子样本,它基本上是一个扁平的二维数组,而不需要重新构造它?
import numpy
a1 = np.array([[1,2,3,4],
[11,22,33,44],
[111,222,333,444],
[1111,2222,3333,4444],
[11111,22222,33333,44444]])
a2 = np.ravel(a1)
rows, cols = a1.shape
row1 = 1
row2 = 3
col1 = 1
col2 = 3
I would like to use a fast slicing method that doesn't require reshaping the 1-D array to a 2-D array.
我想使用一种快速切片方法,它不需要将一维数组转换为二维数组。
Desired Output:
期望的输出:
np.ravel(a1[row1:row2, col1:col2])
>> array([ 22, 33, 222, 333])
I got as far as getting the start and ending positions, but this just selects ALL data between these points (i.e. extra columns).
我得到了开始和结束的位置,但这只是选择了这些点之间的所有数据(即额外的列)。
idx_start = (row1 * cols) + col1
idx_end = (row2 * cols) + col2
Update: I just tried Jaime's brilliant answer, but it appears that netCDF4
won't allow for 2-D indices.
更新:我刚刚尝试了Jaime的聪明答案,但似乎netCDF4不允许2d索引。
z = dataset.variables["z"][idx]
File "netCDF4.pyx", line 2613, in netCDF4.Variable.__getitem__ (netCDF4.c:29583)
File "/usr/local/lib/python2.7/dist-packages/netCDF4_utils.py", line 141, in _StartCountStride
raise IndexError("Index cannot be multidimensional.")
IndexError: Index cannot be multidimensional.
3 个解决方案
#1
1
You can get what you want with a combination of np.ogrid
and np.ravel_multi_index
:
你可以用np来得到你想要的。ogrid np.ravel_multi_index:
>>> a1
array([ 1, 2, 3, 4, 11, 22, 33, 44, 111,
222, 333, 444, 1111, 2222, 3333, 4444, 11111, 22222,
33333, 44444])
>>> idx = np.ravel_multi_index((np.ogrid[1:3,1:3]), (5, 4))
>>> a1[idx]
array([[ 22, 33],
[222, 333]])
You could of course ravel this array to get a 1D return if that's what you are after. Notice also that this is a copy of your original data, not a view.
你当然可以拉这个数组得到1D返回如果这是你想要的。还要注意,这是原始数据的副本,而不是视图。
EDIT You can keep the same general approach, replacing np.ogrid
with np.mgrid
and reshaping it to get a flat return:
编辑您可以保持相同的通用方法,替换np。ogrid np。mgrid并对其进行重新调整以获得平坦的回报:
>>> idx = np.ravel_multi_index((np.mgrid[1:3,1:3].reshape(2, -1)), (5, 4))
>>> a1[idx]
array([ 22, 33, 222, 333])
#2
0
I came up with this, and though it doesn't copy ALL of the data, it is still copying data that I don't want into memory. This can probably be improved and I hope there is a better solution out there.
我想到了这个,虽然它没有复制所有的数据,但它仍然在复制我不想要的数据。这可能会得到改进,我希望有更好的解决方案。
zi = 0
# Create zero array with the appropriate length for the data subset
z = np.zeros((col2 - col1) * (row2 - row1))
# Process number of rows for which data is being extracted
for i in range(row2 - row1):
# Pull row, then desired elements of that row into buffer
tmp = ((dataset.variables["z"][(i*cols):((i*cols)+cols)])[col1:col2])
# Add each item in buffer sequentially to data array
for j in tmp:
z[zi] = j
# Keep a count of what index position the next data point goes to
zi += 1
#3
0
Here a lean proposition
这里一个精益的命题
a1 = np.array([[1,2,3,4],
[11,22,33,44],
[111,222,333,444],
[1111,2222,3333,4444],
[11111,22222,33333,44444]])
row1 = 1; row2 = 3; ix = slice(row1,row2)
col1 = 1; col2 = 3; iy = slice(col1,col2)
n = (row2-row1)*(col2-col1)
print(a1[ix,iy]); print()
print(a1[ix,iy].reshape(1,n))
.
[[ 22 33]
[222 333]]
[[ 22 33 222 333]]
reshape in Python is not expensive, and slice is fast.
在Python中进行重构并不昂贵,而且切片速度很快。
#1
1
You can get what you want with a combination of np.ogrid
and np.ravel_multi_index
:
你可以用np来得到你想要的。ogrid np.ravel_multi_index:
>>> a1
array([ 1, 2, 3, 4, 11, 22, 33, 44, 111,
222, 333, 444, 1111, 2222, 3333, 4444, 11111, 22222,
33333, 44444])
>>> idx = np.ravel_multi_index((np.ogrid[1:3,1:3]), (5, 4))
>>> a1[idx]
array([[ 22, 33],
[222, 333]])
You could of course ravel this array to get a 1D return if that's what you are after. Notice also that this is a copy of your original data, not a view.
你当然可以拉这个数组得到1D返回如果这是你想要的。还要注意,这是原始数据的副本,而不是视图。
EDIT You can keep the same general approach, replacing np.ogrid
with np.mgrid
and reshaping it to get a flat return:
编辑您可以保持相同的通用方法,替换np。ogrid np。mgrid并对其进行重新调整以获得平坦的回报:
>>> idx = np.ravel_multi_index((np.mgrid[1:3,1:3].reshape(2, -1)), (5, 4))
>>> a1[idx]
array([ 22, 33, 222, 333])
#2
0
I came up with this, and though it doesn't copy ALL of the data, it is still copying data that I don't want into memory. This can probably be improved and I hope there is a better solution out there.
我想到了这个,虽然它没有复制所有的数据,但它仍然在复制我不想要的数据。这可能会得到改进,我希望有更好的解决方案。
zi = 0
# Create zero array with the appropriate length for the data subset
z = np.zeros((col2 - col1) * (row2 - row1))
# Process number of rows for which data is being extracted
for i in range(row2 - row1):
# Pull row, then desired elements of that row into buffer
tmp = ((dataset.variables["z"][(i*cols):((i*cols)+cols)])[col1:col2])
# Add each item in buffer sequentially to data array
for j in tmp:
z[zi] = j
# Keep a count of what index position the next data point goes to
zi += 1
#3
0
Here a lean proposition
这里一个精益的命题
a1 = np.array([[1,2,3,4],
[11,22,33,44],
[111,222,333,444],
[1111,2222,3333,4444],
[11111,22222,33333,44444]])
row1 = 1; row2 = 3; ix = slice(row1,row2)
col1 = 1; col2 = 3; iy = slice(col1,col2)
n = (row2-row1)*(col2-col1)
print(a1[ix,iy]); print()
print(a1[ix,iy].reshape(1,n))
.
[[ 22 33]
[222 333]]
[[ 22 33 222 333]]
reshape in Python is not expensive, and slice is fast.
在Python中进行重构并不昂贵,而且切片速度很快。