I am trying to build some machine learning models,
我正在尝试构建一些机器学习模型,
so i need a training data and a validation data
所以我需要一个训练数据和一个验证数据
so suppose I have N number of examples, I want to select random x examples in a data frame.
所以假设我有N个例子,我想在数据框中选择随机x个例子。
For example, suppose I have 100 examples, and I need 10 random numbers, is there a way (to efficiently) generate 10 random INTEGER numbers for me to extract the training data out of my sample data?
例如,假设我有100个例子,我需要10个随机数,是否有办法(有效地)为我生成10个随机INTEGER数,以便从我的样本数据中提取训练数据?
I tried using while loop, and slowly change the repeated numbers, but the running time is not very ideal, so I am looking for a more efficient way to do it.
我尝试使用while循环,并慢慢改变重复的数字,但运行时间不是很理想,所以我正在寻找一种更有效的方法来做到这一点。
Can anyone help please?
有人可以帮忙吗?
2 个解决方案
#1
20
sample
does this:
样本做到这一点:
$ sample.int(100, 10)
[1] 58 83 54 68 53 4 71 11 75 90
will generate ten random numbers from the range 1–100. You probably want replace = TRUE
, which samples with replacing:
将生成1-100范围内的十个随机数。您可能需要replace = TRUE,其中包含替换样本:
> sample.int(20, 10, replace = TRUE)
[1] 10 2 11 13 9 9 3 13 3 17
More generally, sample
samples n
observations from a vector of arbitrary values.
更一般地,样本样本n来自任意值的向量的观察。
#2
0
If I understand correctly, you are trying to create a hold-out sampling. This is usually done using probabilities. So if you have n.rows
samples and want a fraction of training.fraction
to be used for training, you may do something like this:
如果我理解正确,您正在尝试创建一个保留样本。这通常使用概率来完成。因此,如果您有n.rows样本并希望将一小部分training.fraction用于训练,您可以执行以下操作:
select.training <- runif(n=n.rows) < training.fraction
data.training <- my.data[select.training, ]
data.testing <- my.data[!select.training, ]
If you want to specify EXACT number of training cases, you may do something like:
如果要指定完整数量的培训案例,您可以执行以下操作:
indices.training <- sample(x=seq(n.rows), size=training.size, replace=FALSE) #replace=FALSE makes sure the indices are unique
data.training <- my.data[indices.training, ]
data.testing <- my.data[-indices.training, ] #note that index negation means "take everything except for those"
#1
20
sample
does this:
样本做到这一点:
$ sample.int(100, 10)
[1] 58 83 54 68 53 4 71 11 75 90
will generate ten random numbers from the range 1–100. You probably want replace = TRUE
, which samples with replacing:
将生成1-100范围内的十个随机数。您可能需要replace = TRUE,其中包含替换样本:
> sample.int(20, 10, replace = TRUE)
[1] 10 2 11 13 9 9 3 13 3 17
More generally, sample
samples n
observations from a vector of arbitrary values.
更一般地,样本样本n来自任意值的向量的观察。
#2
0
If I understand correctly, you are trying to create a hold-out sampling. This is usually done using probabilities. So if you have n.rows
samples and want a fraction of training.fraction
to be used for training, you may do something like this:
如果我理解正确,您正在尝试创建一个保留样本。这通常使用概率来完成。因此,如果您有n.rows样本并希望将一小部分training.fraction用于训练,您可以执行以下操作:
select.training <- runif(n=n.rows) < training.fraction
data.training <- my.data[select.training, ]
data.testing <- my.data[!select.training, ]
If you want to specify EXACT number of training cases, you may do something like:
如果要指定完整数量的培训案例,您可以执行以下操作:
indices.training <- sample(x=seq(n.rows), size=training.size, replace=FALSE) #replace=FALSE makes sure the indices are unique
data.training <- my.data[indices.training, ]
data.testing <- my.data[-indices.training, ] #note that index negation means "take everything except for those"