如何在Matlab[复制]中从数据集选择随机样本

时间:2021-06-11 09:10:06

Possible Duplicate:
How do I randomly select k points from N points in MATLAB?

可能的复制:我如何从MATLAB的N点随机选择k点?

Let's say I have a dataset that includes 10,000 rows of data. What is the best way to create a subset that includes 1,000 randomly chosen rows?

假设我有一个包含10,000行数据的数据集。创建包含1000个随机选择的行的子集的最佳方法是什么?

4 个解决方案

#1


32  

You can use randperm for this task:

你可以使用randperm来完成这个任务:

Sampling without replacement:

不重复抽样:

nRows = 10000; % number of rows
nSample = 1000; % number of samples

rndIDX = randperm(nRows); 

newSample = data(rndIDX(1:nSample), :); 

Sampling with replacement:

放回抽样:

nRows = 10000; % number of rows
nSample = 1000; % number of samples

rndIDX = randi(nRows, nSample, 1); 

newSample = data(rndIDX, :); 

#2


6  

Use randperm in combination with the number of rows. If x is your matrix:

使用randperm与行数相结合。如果x是你的矩阵

nrows = size(x,1);
nrand = 1000; % Choose 1000 rows
assert(nrand<=nrows, 'You cannot choose more rows than exist in the matrix');
rand_rows = randperm(nrows, nrand);
xx = x(rand_rows,:);  % Select the random rows from x

#3


4  

If you have the statistics toolbox R2012+, you can use datasample.

如果您有统计工具箱R2012+,您可以使用datasample。

subset = datasample(data,1000)

subset will be a randomly selected subset of data consisting of 1000 samples.

子集将是一个随机选择的数据子集,包含1000个样本。

To sample without replacement, use:

不更换样品,使用:

subset = datasample(data,1000,'Replace',false)

If you have an older version of the toolbox, you can use randsample:

如果你有一个旧版本的工具箱,你可以使用randsample:

rndIdx = randsample(size(data,1),1000,true); % with replacement
subset = samples(rndIdx(1:1000), :);

rndIdx = randsample(size(data,1),1000,false); % without replacement
subset = samples(rndIdx(1:1000), :);

But using randsample is more or less the same as H.Muster's answer (which I have accepted as the best because it doesn't require any toolbox).

但使用randsample或多或少与H相同。集合的答案(我认为它是最好的,因为它不需要任何工具箱)。

Note: For more info on the difference between sampling with replacement vs. sampling without replacement, see this page.

注意:如果想要更多的信息,在替换的抽样和没有替换的抽样之间的区别,请参阅这一页。

#4


1  

Not sure if you written any code so far. The following mathworks link shows examples of random sampling. Take a look at it for ideas.

目前还不确定是否编写了任何代码。下面的mathworks链接显示了随机抽样的例子。看看它的想法。

Also a code here with randsample from statistics toolbox. Just a logic and you may have to adjust it accordingly.

这里还有一个来自统计工具箱的randsample的代码。只是一个逻辑,你可能需要相应地调整它。

matrix m of N rows pull a random sample of n rows from m

矩阵m (N)行从m取N行随机抽样。

Sample = m(randsample(1:N,n),:)

示例= m(randsample(1:N,N):)

randsample(1:N,n)

randsample(1:N,N)

Above results in a sequence of n random integers from 1 to N.

上面的结果是一个n个随机整数序列,从1到n。

#1


32  

You can use randperm for this task:

你可以使用randperm来完成这个任务:

Sampling without replacement:

不重复抽样:

nRows = 10000; % number of rows
nSample = 1000; % number of samples

rndIDX = randperm(nRows); 

newSample = data(rndIDX(1:nSample), :); 

Sampling with replacement:

放回抽样:

nRows = 10000; % number of rows
nSample = 1000; % number of samples

rndIDX = randi(nRows, nSample, 1); 

newSample = data(rndIDX, :); 

#2


6  

Use randperm in combination with the number of rows. If x is your matrix:

使用randperm与行数相结合。如果x是你的矩阵

nrows = size(x,1);
nrand = 1000; % Choose 1000 rows
assert(nrand<=nrows, 'You cannot choose more rows than exist in the matrix');
rand_rows = randperm(nrows, nrand);
xx = x(rand_rows,:);  % Select the random rows from x

#3


4  

If you have the statistics toolbox R2012+, you can use datasample.

如果您有统计工具箱R2012+,您可以使用datasample。

subset = datasample(data,1000)

subset will be a randomly selected subset of data consisting of 1000 samples.

子集将是一个随机选择的数据子集,包含1000个样本。

To sample without replacement, use:

不更换样品,使用:

subset = datasample(data,1000,'Replace',false)

If you have an older version of the toolbox, you can use randsample:

如果你有一个旧版本的工具箱,你可以使用randsample:

rndIdx = randsample(size(data,1),1000,true); % with replacement
subset = samples(rndIdx(1:1000), :);

rndIdx = randsample(size(data,1),1000,false); % without replacement
subset = samples(rndIdx(1:1000), :);

But using randsample is more or less the same as H.Muster's answer (which I have accepted as the best because it doesn't require any toolbox).

但使用randsample或多或少与H相同。集合的答案(我认为它是最好的,因为它不需要任何工具箱)。

Note: For more info on the difference between sampling with replacement vs. sampling without replacement, see this page.

注意:如果想要更多的信息,在替换的抽样和没有替换的抽样之间的区别,请参阅这一页。

#4


1  

Not sure if you written any code so far. The following mathworks link shows examples of random sampling. Take a look at it for ideas.

目前还不确定是否编写了任何代码。下面的mathworks链接显示了随机抽样的例子。看看它的想法。

Also a code here with randsample from statistics toolbox. Just a logic and you may have to adjust it accordingly.

这里还有一个来自统计工具箱的randsample的代码。只是一个逻辑,你可能需要相应地调整它。

matrix m of N rows pull a random sample of n rows from m

矩阵m (N)行从m取N行随机抽样。

Sample = m(randsample(1:N,n),:)

示例= m(randsample(1:N,N):)

randsample(1:N,n)

randsample(1:N,N)

Above results in a sequence of n random integers from 1 to N.

上面的结果是一个n个随机整数序列,从1到n。