
时间:2021-07-11 01:00:05

I am trying to build some machine learning models,


so i need a training data and a validation data


so suppose I have N number of examples, I want to select random x examples in a data frame.


For example, suppose I have 100 examples, and I need 10 random numbers, is there a way (to efficiently) generate 10 random INTEGER numbers for me to extract the training data out of my sample data?


I tried using while loop, and slowly change the repeated numbers, but the running time is not very ideal, so I am looking for a more efficient way to do it.


Can anyone help please?


2 个解决方案



sample does this:


$ sample.int(100, 10)
 [1] 58 83 54 68 53  4 71 11 75 90

will generate ten random numbers from the range 1–100. You probably want replace = TRUE, which samples with replacing:

将生成1-100范围内的十个随机数。您可能需要replace = TRUE,其中包含替换样本:

> sample.int(20, 10, replace = TRUE)
 [1] 10  2 11 13  9  9  3 13  3 17

More generally, sample samples n observations from a vector of arbitrary values.




If I understand correctly, you are trying to create a hold-out sampling. This is usually done using probabilities. So if you have n.rows samples and want a fraction of training.fraction to be used for training, you may do something like this:


select.training <- runif(n=n.rows) < training.fraction
data.training <- my.data[select.training, ]
data.testing <- my.data[!select.training, ]

If you want to specify EXACT number of training cases, you may do something like:


indices.training <- sample(x=seq(n.rows), size=training.size, replace=FALSE) #replace=FALSE makes sure the indices are unique
data.training <- my.data[indices.training, ]
data.testing <- my.data[-indices.training, ] #note that index negation means "take everything except for those"



sample does this:


$ sample.int(100, 10)
 [1] 58 83 54 68 53  4 71 11 75 90

will generate ten random numbers from the range 1–100. You probably want replace = TRUE, which samples with replacing:

将生成1-100范围内的十个随机数。您可能需要replace = TRUE,其中包含替换样本:

> sample.int(20, 10, replace = TRUE)
 [1] 10  2 11 13  9  9  3 13  3 17

More generally, sample samples n observations from a vector of arbitrary values.




If I understand correctly, you are trying to create a hold-out sampling. This is usually done using probabilities. So if you have n.rows samples and want a fraction of training.fraction to be used for training, you may do something like this:


select.training <- runif(n=n.rows) < training.fraction
data.training <- my.data[select.training, ]
data.testing <- my.data[!select.training, ]

If you want to specify EXACT number of training cases, you may do something like:


indices.training <- sample(x=seq(n.rows), size=training.size, replace=FALSE) #replace=FALSE makes sure the indices are unique
data.training <- my.data[indices.training, ]
data.testing <- my.data[-indices.training, ] #note that index negation means "take everything except for those"