I have a set of files on which I need to apply rpart
algorithm. Some of these files takes too long for computation. How can I skip such cases (eg. cases that take more than an hour) and continue on to the next one?
我有一组文件,我需要在其上应用rpart算法。其中一些文件的计算时间太长。我怎样才能跳过这种情况(例如,需要一个多小时的案例)并继续下一个案例?
for (i in num)
{
print(i)
infilename = filenames[i]
tmpData = read.table(infilename, header = TRUE, sep= "\t")
retval = rpart(fmla[i], dat=tmpData, method = "class")
print (retval)
}
Edit: Based on suggestin from @Dwin, I am doing the following but it does not work. Where I am doing wrong?
编辑:基于来自@Dwin的suggestin,我正在做以下但是它不起作用。我在哪里做错了?
for (i in num)
{
print(i)
infilename = filenames[i]
tmpData = read.table(infilename, header = TRUE, sep= "\t")
retVal= NULL
setTimeLimit(cpu=10)
retval = try(rpart(fmla, dat=tmpData, method = "class") )
print (retval)
}
1 个解决方案
#1
1
Because you are just using regular R functions (and not coding this from scratch), you will need to come up with some way to estimate the conditions leading to excessive times. This might be a test that looks at the dimensions of a dataframe and skips the next rpart
computation if the product of dim(dfrm)
exceeds a certain threshold.
因为您只是使用常规R函数(而不是从头开始编码),所以您需要想出一些方法来估计导致过多时间的条件。这可能是一个测试,它会查看数据帧的维度,如果dim(dfrm)的乘积超过某个阈值,则会跳过下一个rpart计算。
retval = if(prod(dim(tmpData)) < 1e6) {
rpart(fmla[i], dat=tmpData, method = "class") }
Notice that at the moment you are overwriting retval
with every loop iteration rather than storing it to a durable object.
请注意,此时您将使用每次循环迭代覆盖retval,而不是将其存储到持久对象中。
You could also try using the functions setTimeLimit
and setSessionLimit
but these will throw an error condition and you may need to put your code inside a try
function to recover gracefully:
您也可以尝试使用函数setTimeLimit和setSessionLimit但这些会抛出一个错误条件,您可能需要将您的代码放在try函数中以优雅地恢复:
setTimeLimit(cpu=2)
for (i in 4:8) {x <- 1:10^i;x=x^3}
max(x)
#[1] 1e+24
# did not exceed the limits
x^(1/3)
#[1] 1 2 3 4 5 6 7Error: reached CPU time limit
#1
1
Because you are just using regular R functions (and not coding this from scratch), you will need to come up with some way to estimate the conditions leading to excessive times. This might be a test that looks at the dimensions of a dataframe and skips the next rpart
computation if the product of dim(dfrm)
exceeds a certain threshold.
因为您只是使用常规R函数(而不是从头开始编码),所以您需要想出一些方法来估计导致过多时间的条件。这可能是一个测试,它会查看数据帧的维度,如果dim(dfrm)的乘积超过某个阈值,则会跳过下一个rpart计算。
retval = if(prod(dim(tmpData)) < 1e6) {
rpart(fmla[i], dat=tmpData, method = "class") }
Notice that at the moment you are overwriting retval
with every loop iteration rather than storing it to a durable object.
请注意,此时您将使用每次循环迭代覆盖retval,而不是将其存储到持久对象中。
You could also try using the functions setTimeLimit
and setSessionLimit
but these will throw an error condition and you may need to put your code inside a try
function to recover gracefully:
您也可以尝试使用函数setTimeLimit和setSessionLimit但这些会抛出一个错误条件,您可能需要将您的代码放在try函数中以优雅地恢复:
setTimeLimit(cpu=2)
for (i in 4:8) {x <- 1:10^i;x=x^3}
max(x)
#[1] 1e+24
# did not exceed the limits
x^(1/3)
#[1] 1 2 3 4 5 6 7Error: reached CPU time limit