在numpy数组的每一行中随机排列项目

时间:2021-12-16 21:22:49

I have a numpy array like the following:

我有一个numpy数组,如下所示:

Xtrain = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [1, 7, 3]])

I want to shuffle the items of each row separately, but do not want the shuffle to be the same for each row (as in several examples just shuffle column order).

我想分别洗牌每一行的项目,但不希望洗牌对每一行都是相同的(在一些例子中,只是洗牌列的顺序)。

For example, I want an output like the following:

例如,我需要如下输出:

output = np.array([[3, 2, 1],
                   [4, 6, 5],
                   [7, 3, 1]])

How can I randomly shuffle each of the rows randomly in an efficient way? My actual np array is over 100000 rows and 1000 columns.

我怎样才能有效地随机地排列每一行?我实际的np数组超过100000行1000列。

5 个解决方案

#1


6  

Since you want to only shuffle the columns you can just perform the shuffling on transposed of your matrix:

既然你只想洗牌列,你就可以对矩阵的转置进行洗牌:

In [86]: np.random.shuffle(Xtrain.T)

In [87]: Xtrain
Out[87]: 
array([[2, 3, 1],
       [5, 6, 4],
       [7, 3, 1]])

Note that random.suffle() on a 2D array shuffles the rows not items in each rows. i.e. changes the position of the rows. Therefor if your change the position of the transposed matrix rows you're actually shuffling the columns of your original array.

注意,二维数组上的random.suffle()将不会在每一行中排列项。即改变行的位置。因此,如果你改变了转置矩阵行的位置,你实际上是在变换原始数组的列。

If you still want a completely independent shuffle you can create random indexes for each row and then create the final array with a simple indexing:

如果您仍然想要完全独立的洗牌,您可以为每一行创建随机索引,然后使用一个简单的索引:

In [172]: def crazyshuffle(arr):
     ...:     x, y = arr.shape
     ...:     rows = np.indices((x,y))[0]
     ...:     cols = [np.random.permutation(y) for _ in range(x)]
     ...:     return arr[rows, cols]
     ...: 

Demo:

演示:

In [173]: crazyshuffle(Xtrain)
Out[173]: 
array([[1, 3, 2],
       [6, 5, 4],
       [7, 3, 1]])

In [174]: crazyshuffle(Xtrain)
Out[174]: 
array([[2, 3, 1],
       [4, 6, 5],
       [1, 3, 7]])

#2


3  

This solution is not efficient by any means, but I had fun thinking about it, so wrote it down. Basically, you ravel the array, and create an array of row labels, and an array of indices. You shuffle the index array, and index the original and row label arrays with that. Then you apply a stable argsort to the row labels to gather the data into rows. Apply that index and reshape and viola, data shuffled independently by rows:

这个解决方案无论如何都不是有效的,但是我很喜欢思考它,所以把它写下来。基本上,对数组进行分解,并创建一个行标签数组和一个索引数组。你重新排列索引数组,然后用它索引原始的和行标签数组。然后对行标签应用一个稳定的argsort,将数据收集到行中。应用该索引、重新塑造和维奥拉,数据按行独立排列:

import numpy as np

r, c = 3, 4  # x.shape

x = np.arange(12) + 1  # Already raveled 
inds = np.arange(x.size)
rows = np.repeat(np.arange(r).reshape(-1, 1), c, axis=1).ravel()

np.random.shuffle(inds)
x = x[inds]
rows = rows[inds]

inds = np.argsort(rows, kind='mergesort')
x = x[inds].reshape(r, c)

Here is an IDEOne Link

这是一个理想链接

#3


2  

From: https://github.com/numpy/numpy/issues/5173

来自:https://github.com/numpy/numpy/issues/5173

def disarrange(a, axis=-1):
    """
    Shuffle `a` in-place along the given axis.

    Apply numpy.random.shuffle to the given axis of `a`.
    Each one-dimensional slice is shuffled independently.
    """
    b = a.swapaxes(axis, -1)
    # Shuffle `b` in-place along the last axis.  `b` is a view of `a`,
    # so `a` is shuffled in place, too.
    shp = b.shape[:-1]
    for ndx in np.ndindex(shp):
        np.random.shuffle(b[ndx])
    return

#4


2  

We can create a random 2-dimensional matrix, sort it by each row, and then use the index matrix given by argsort to reorder the target matrix.

我们可以创建一个随机的二维矩阵,按每一行排序,然后使用argsort给出的索引矩阵对目标矩阵重新排序。

target = np.random.randint(10, size=(5, 5))
# [[7 4 0 2 5]
# [5 6 4 8 7]
# [6 4 7 9 5]
# [8 6 6 2 8]
# [8 1 6 7 3]]

shuffle_helper = np.argsort(np.random.rand(5,5), axis=1)
# [[0 4 3 2 1]
# [4 2 1 3 0]
# [1 2 3 4 0]
# [1 2 4 3 0]
# [1 2 3 0 4]]

target[np.arange(shuffle_helper.shape[0])[:, None], shuffle_helper]
# array([[7, 5, 2, 0, 4],
#       [7, 4, 6, 8, 5],
#       [4, 7, 9, 5, 6],
#       [6, 6, 8, 2, 8],
#       [1, 6, 7, 8, 3]])

Explanation

解释

  • We use np.random.rand and argsort to mimic the effect from shuffling.
  • 我们使用np.random。rand和argsort模仿了移动的效果。
  • random.rand gives randomness.
  • 随机的。兰德给随机性。
  • Then, we use argsort with axis=1 to help rank each row. This creates the index that can be used for reordering.
  • 然后,我们使用argsort和axis=1对每一行进行排序。这将创建可用于重新排序的索引。

#5


1  

Lets say you have array a with shape 100000 x 1000.

假设你有一个10万x 1000的数组。

b = np.random.choice(100000 * 1000, (100000, 1000), replace=False)
ind = np.argsort(b, axis=1)
a_shuffled = a[np.arange(100000)[:,np.newaxis], ind]

I don't know if this is faster than loop, because it needs sorting, but with this solution maybe you will invent something better, for example with np.argpartition instead of np.argsort

我不知道它是否比循环快,因为它需要排序,但是有了这个解,你可能会发明更好的东西,比如np。argpartition代替np.argsort

#1


6  

Since you want to only shuffle the columns you can just perform the shuffling on transposed of your matrix:

既然你只想洗牌列,你就可以对矩阵的转置进行洗牌:

In [86]: np.random.shuffle(Xtrain.T)

In [87]: Xtrain
Out[87]: 
array([[2, 3, 1],
       [5, 6, 4],
       [7, 3, 1]])

Note that random.suffle() on a 2D array shuffles the rows not items in each rows. i.e. changes the position of the rows. Therefor if your change the position of the transposed matrix rows you're actually shuffling the columns of your original array.

注意,二维数组上的random.suffle()将不会在每一行中排列项。即改变行的位置。因此,如果你改变了转置矩阵行的位置,你实际上是在变换原始数组的列。

If you still want a completely independent shuffle you can create random indexes for each row and then create the final array with a simple indexing:

如果您仍然想要完全独立的洗牌,您可以为每一行创建随机索引,然后使用一个简单的索引:

In [172]: def crazyshuffle(arr):
     ...:     x, y = arr.shape
     ...:     rows = np.indices((x,y))[0]
     ...:     cols = [np.random.permutation(y) for _ in range(x)]
     ...:     return arr[rows, cols]
     ...: 

Demo:

演示:

In [173]: crazyshuffle(Xtrain)
Out[173]: 
array([[1, 3, 2],
       [6, 5, 4],
       [7, 3, 1]])

In [174]: crazyshuffle(Xtrain)
Out[174]: 
array([[2, 3, 1],
       [4, 6, 5],
       [1, 3, 7]])

#2


3  

This solution is not efficient by any means, but I had fun thinking about it, so wrote it down. Basically, you ravel the array, and create an array of row labels, and an array of indices. You shuffle the index array, and index the original and row label arrays with that. Then you apply a stable argsort to the row labels to gather the data into rows. Apply that index and reshape and viola, data shuffled independently by rows:

这个解决方案无论如何都不是有效的,但是我很喜欢思考它,所以把它写下来。基本上,对数组进行分解,并创建一个行标签数组和一个索引数组。你重新排列索引数组,然后用它索引原始的和行标签数组。然后对行标签应用一个稳定的argsort,将数据收集到行中。应用该索引、重新塑造和维奥拉,数据按行独立排列:

import numpy as np

r, c = 3, 4  # x.shape

x = np.arange(12) + 1  # Already raveled 
inds = np.arange(x.size)
rows = np.repeat(np.arange(r).reshape(-1, 1), c, axis=1).ravel()

np.random.shuffle(inds)
x = x[inds]
rows = rows[inds]

inds = np.argsort(rows, kind='mergesort')
x = x[inds].reshape(r, c)

Here is an IDEOne Link

这是一个理想链接

#3


2  

From: https://github.com/numpy/numpy/issues/5173

来自:https://github.com/numpy/numpy/issues/5173

def disarrange(a, axis=-1):
    """
    Shuffle `a` in-place along the given axis.

    Apply numpy.random.shuffle to the given axis of `a`.
    Each one-dimensional slice is shuffled independently.
    """
    b = a.swapaxes(axis, -1)
    # Shuffle `b` in-place along the last axis.  `b` is a view of `a`,
    # so `a` is shuffled in place, too.
    shp = b.shape[:-1]
    for ndx in np.ndindex(shp):
        np.random.shuffle(b[ndx])
    return

#4


2  

We can create a random 2-dimensional matrix, sort it by each row, and then use the index matrix given by argsort to reorder the target matrix.

我们可以创建一个随机的二维矩阵,按每一行排序,然后使用argsort给出的索引矩阵对目标矩阵重新排序。

target = np.random.randint(10, size=(5, 5))
# [[7 4 0 2 5]
# [5 6 4 8 7]
# [6 4 7 9 5]
# [8 6 6 2 8]
# [8 1 6 7 3]]

shuffle_helper = np.argsort(np.random.rand(5,5), axis=1)
# [[0 4 3 2 1]
# [4 2 1 3 0]
# [1 2 3 4 0]
# [1 2 4 3 0]
# [1 2 3 0 4]]

target[np.arange(shuffle_helper.shape[0])[:, None], shuffle_helper]
# array([[7, 5, 2, 0, 4],
#       [7, 4, 6, 8, 5],
#       [4, 7, 9, 5, 6],
#       [6, 6, 8, 2, 8],
#       [1, 6, 7, 8, 3]])

Explanation

解释

  • We use np.random.rand and argsort to mimic the effect from shuffling.
  • 我们使用np.random。rand和argsort模仿了移动的效果。
  • random.rand gives randomness.
  • 随机的。兰德给随机性。
  • Then, we use argsort with axis=1 to help rank each row. This creates the index that can be used for reordering.
  • 然后,我们使用argsort和axis=1对每一行进行排序。这将创建可用于重新排序的索引。

#5


1  

Lets say you have array a with shape 100000 x 1000.

假设你有一个10万x 1000的数组。

b = np.random.choice(100000 * 1000, (100000, 1000), replace=False)
ind = np.argsort(b, axis=1)
a_shuffled = a[np.arange(100000)[:,np.newaxis], ind]

I don't know if this is faster than loop, because it needs sorting, but with this solution maybe you will invent something better, for example with np.argpartition instead of np.argsort

我不知道它是否比循环快,因为它需要排序,但是有了这个解,你可能会发明更好的东西,比如np。argpartition代替np.argsort