从2D数组中获取随机的行集。

时间:2022-09-26 21:31:54

I have a very large 2D array which looks something like this:

我有一个非常大的二维数组它看起来是这样的:

a=
[[a1, b1, c1],
 [a2, b2, c2],
 ...,
 [an, bn, cn]]

Using numpy, is there an easy way to get a new 2D array with e.g. 2 random rows from the initial array a (without replacement)?

使用numpy,是否有一种简单的方法可以从初始数组a(不替换)中获得带有2个随机行的2D数组?

e.g.

如。

b=
[[a4,  b4,  c4],
 [a99, b99, c99]]

4 个解决方案

#1


95  

>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
       [3, 2, 0],
       [0, 2, 1],
       [1, 1, 4],
       [3, 2, 2],
       [0, 1, 0],
       [1, 3, 1],
       [0, 4, 1],
       [2, 4, 2],
       [3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
       [1, 3, 1]])

Putting it together for a general case:

把它放在一起作为一般情况:

A[np.random.randint(A.shape[0], size=2), :]

For non replacement (numpy 1.7.0+):

For non replacement (numpy 1.7.0+):

A[np.random.choice(A.shape[0], 2, replace=False), :]

I do not believe there is a good way to generate random list without replacement before 1.7. Perhaps you can setup a small definition that ensures the two values are not the same.

我不相信在1.7之前没有一个好的方法来生成随机列表。也许您可以设置一个小定义来确保两个值不相同。

#2


25  

This is an old post, but this is what works best for me:

这是一个古老的帖子,但这是最适合我的:

A[np.random.choice(A.shape[0], num_rows_2_sample, replace=False)]

change the replace=False to True to get the same thing, but with replacement.

将replace=False更改为True以获得相同的结果,但要使用替换。

#3


19  

Another option is to create a random mask if you just want to down-sample your data by a certain factor. Say I want to down-sample to 25% of my original data set, which is currently held in the array data_arr:

另一种选择是创建一个随机掩码,如果您只是想按某个因素对数据进行下样。假设我要将原始数据集的25%降采样,该数据集目前保存在数组data_arr中:

# generate random boolean mask the length of data
# use p 0.75 for False and 0.25 for True
mask = numpy.random.choice([False, True], len(data_arr), p=[0.75, 0.25])

Now you can call data_arr[mask] and return ~25% of the rows, randomly sampled.

现在,您可以调用data_arr[mask]并返回随机抽样的大约25%的行。

#4


4  

If you need the same rows but just a random sample then,

如果你需要相同的行但只是一个随机样本,

import random
new_array = random.sample(old_array,x)

Here x, has to be an 'int' defining the number of rows you want to randomly pick.

这里x,必须是一个'int'定义你想随机选择的行数。

#1


95  

>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
       [3, 2, 0],
       [0, 2, 1],
       [1, 1, 4],
       [3, 2, 2],
       [0, 1, 0],
       [1, 3, 1],
       [0, 4, 1],
       [2, 4, 2],
       [3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
       [1, 3, 1]])

Putting it together for a general case:

把它放在一起作为一般情况:

A[np.random.randint(A.shape[0], size=2), :]

For non replacement (numpy 1.7.0+):

For non replacement (numpy 1.7.0+):

A[np.random.choice(A.shape[0], 2, replace=False), :]

I do not believe there is a good way to generate random list without replacement before 1.7. Perhaps you can setup a small definition that ensures the two values are not the same.

我不相信在1.7之前没有一个好的方法来生成随机列表。也许您可以设置一个小定义来确保两个值不相同。

#2


25  

This is an old post, but this is what works best for me:

这是一个古老的帖子,但这是最适合我的:

A[np.random.choice(A.shape[0], num_rows_2_sample, replace=False)]

change the replace=False to True to get the same thing, but with replacement.

将replace=False更改为True以获得相同的结果,但要使用替换。

#3


19  

Another option is to create a random mask if you just want to down-sample your data by a certain factor. Say I want to down-sample to 25% of my original data set, which is currently held in the array data_arr:

另一种选择是创建一个随机掩码,如果您只是想按某个因素对数据进行下样。假设我要将原始数据集的25%降采样,该数据集目前保存在数组data_arr中:

# generate random boolean mask the length of data
# use p 0.75 for False and 0.25 for True
mask = numpy.random.choice([False, True], len(data_arr), p=[0.75, 0.25])

Now you can call data_arr[mask] and return ~25% of the rows, randomly sampled.

现在,您可以调用data_arr[mask]并返回随机抽样的大约25%的行。

#4


4  

If you need the same rows but just a random sample then,

如果你需要相同的行但只是一个随机样本,

import random
new_array = random.sample(old_array,x)

Here x, has to be an 'int' defining the number of rows you want to randomly pick.

这里x,必须是一个'int'定义你想随机选择的行数。