Given an array:
给定一个数组:
In [122]: arr = np.array([[1, 3, 7], [4, 9, 8]]); arr
Out[122]:
array([[1, 3, 7],
[4, 9, 8]])
And given its indices:
鉴于它的指标:
In [127]: np.indices(arr.shape)
Out[127]:
array([[[0, 0, 0],
[1, 1, 1]],
[[0, 1, 2],
[0, 1, 2]]])
How would I be able to stack them neatly one against the other to form a new 2D array? This is what I'd like:
我怎样才能将它们整齐地相互叠加,形成一个新的2D数组呢?这就是我想要的:
array([[0, 0, 1],
[0, 1, 3],
[0, 2, 7],
[1, 0, 4],
[1, 1, 9],
[1, 2, 8]])
This is my current solution:
这是我目前的解决方案:
def foo(arr):
return np.hstack((np.indices(arr.shape).reshape(2, arr.size).T, arr.reshape(-1, 1)))
It works, but is there something shorter/more elegant to carry this operation out?
它是有效的,但是有什么更短/更优雅的东西来执行这个操作吗?
2 个解决方案
#1
2
Using array-initialization
and then broadcasted-assignment
for assigning indices and the array values in subsequent steps -
使用数组初始化,然后在后续步骤中分配索引和数组值。
def indices_merged_arr(arr):
m,n = arr.shape
I,J = np.ogrid[:m,:n]
out = np.empty((m,n,3), dtype=arr.dtype)
out[...,0] = I
out[...,1] = J
out[...,2] = arr
out.shape = (-1,3)
return out
Note that we are avoiding the use of np.indices(arr.shape)
, which could have slowed things down.
请注意,我们正在避免使用np.indices(arr.shape),这可能会使事情变慢。
Sample run -
样本运行-
In [10]: arr = np.array([[1, 3, 7], [4, 9, 8]])
In [11]: indices_merged_arr(arr)
Out[11]:
array([[0, 0, 1],
[0, 1, 3],
[0, 2, 7],
[1, 0, 4],
[1, 1, 9],
[1, 2, 8]])
Performance
性能
arr = np.random.randn(100000, 2)
%timeit df = pd.DataFrame(np.hstack((np.indices(arr.shape).reshape(2, arr.size).T,\
arr.reshape(-1, 1))), columns=['x', 'y', 'value'])
100 loops, best of 3: 4.97 ms per loop
%timeit pd.DataFrame(indices_merged_arr_divakar(arr), columns=['x', 'y', 'value'])
100 loops, best of 3: 3.82 ms per loop
%timeit pd.DataFrame(indices_merged_arr_eric(arr), columns=['x', 'y', 'value'], dtype=np.float32)
100 loops, best of 3: 5.59 ms per loop
Note: Timings include conversion to pandas
dataframe, that is the eventual use case for this solution.
注意:时间包括转换到熊猫dataframe,这是这个解决方案的最终用例。
#2
2
A more generic answer for nd arrays, that handles other dtypes correctly:
一个更通用的nd数组的答案,正确地处理其他dtype:
def indices_merged_arr(arr):
out = np.empty(arr.shape, dtype=[
('index', np.intp, arr.ndim),
('value', arr.dtype)
])
out['value'] = arr
for i, l in enumerate(arr.shape):
shape = (1,)*i + (-1,) + (1,)*(arr.ndim-1-i)
out['index'][..., i] = np.arange(l).reshape(shape)
return out.ravel()
This returns a structured array with an index column and a value column, which can be of different types.
这将返回一个包含索引列和值列的结构化数组,它们可以是不同的类型。
#1
2
Using array-initialization
and then broadcasted-assignment
for assigning indices and the array values in subsequent steps -
使用数组初始化,然后在后续步骤中分配索引和数组值。
def indices_merged_arr(arr):
m,n = arr.shape
I,J = np.ogrid[:m,:n]
out = np.empty((m,n,3), dtype=arr.dtype)
out[...,0] = I
out[...,1] = J
out[...,2] = arr
out.shape = (-1,3)
return out
Note that we are avoiding the use of np.indices(arr.shape)
, which could have slowed things down.
请注意,我们正在避免使用np.indices(arr.shape),这可能会使事情变慢。
Sample run -
样本运行-
In [10]: arr = np.array([[1, 3, 7], [4, 9, 8]])
In [11]: indices_merged_arr(arr)
Out[11]:
array([[0, 0, 1],
[0, 1, 3],
[0, 2, 7],
[1, 0, 4],
[1, 1, 9],
[1, 2, 8]])
Performance
性能
arr = np.random.randn(100000, 2)
%timeit df = pd.DataFrame(np.hstack((np.indices(arr.shape).reshape(2, arr.size).T,\
arr.reshape(-1, 1))), columns=['x', 'y', 'value'])
100 loops, best of 3: 4.97 ms per loop
%timeit pd.DataFrame(indices_merged_arr_divakar(arr), columns=['x', 'y', 'value'])
100 loops, best of 3: 3.82 ms per loop
%timeit pd.DataFrame(indices_merged_arr_eric(arr), columns=['x', 'y', 'value'], dtype=np.float32)
100 loops, best of 3: 5.59 ms per loop
Note: Timings include conversion to pandas
dataframe, that is the eventual use case for this solution.
注意:时间包括转换到熊猫dataframe,这是这个解决方案的最终用例。
#2
2
A more generic answer for nd arrays, that handles other dtypes correctly:
一个更通用的nd数组的答案,正确地处理其他dtype:
def indices_merged_arr(arr):
out = np.empty(arr.shape, dtype=[
('index', np.intp, arr.ndim),
('value', arr.dtype)
])
out['value'] = arr
for i, l in enumerate(arr.shape):
shape = (1,)*i + (-1,) + (1,)*(arr.ndim-1-i)
out['index'][..., i] = np.arange(l).reshape(shape)
return out.ravel()
This returns a structured array with an index column and a value column, which can be of different types.
这将返回一个包含索引列和值列的结构化数组,它们可以是不同的类型。