在R中测试统计功能的指南?

时间:2021-04-26 17:02:51

Question: I am testing functions in a package that I am developing and would like to know if you can suggest some general guidelines for how to do this. The functions include a large range of statistical modeling, transformations, subsetting, and plotting. Is there a 'standard' or some sufficient test?

问:我正在开发的软件包中对函数进行测试,想知道您是否能提供一些通用的指导方针。函数包括大量的统计建模、转换、子设置和绘图。是否有一个“标准”或一些充分的测试?

An Example: the test that prompted me ask this question,

一个例子:促使我问这个问题的测试,

The function dtheta:

函数dtheta:

dtheta <- function(x) {
  ## find the quantile of the mean
  q.mean <- mean(mean(x) >= x)
  ## find the quantiles of ucl and lcl (q.mean +/- 0.15)
  q.ucl  <- q.mean + 0.15
  q.lcl  <- q.mean - 0.15
  qs <- c(q.lcl, q.mean, q.ucl)
  ## find the lcl, mean, and ucl of the vector
  c(quantile(x,qs), var(x), sqrt(var(x))/mean(x))
}

Step 1: make test data:

步骤1:制作测试数据:

set.seed(100) # per Dirk's recommendation
test <- rnorm(100000,10,1)

Step 2: compare the expected output from the function with the actual output from the function:

步骤2:将函数的期望输出与函数的实际输出进行比较:

 expected <- quantile(test, c(0.35, 0.65, 0.5))
 actual   <- dtheta(test)[1:3]
 signif(expected,2) %in% signif(actual,2)

Step 3: maybe do another test

第三步:也许再做一个测试

test2 <- runif(100000, 0, 100)
expected <- c(35, 50, 65)
actual   <- dtheta(test2)
expected %in% signif(actual,2)

Step 4: if true, consider function 'functional'

步骤4:如果为真,考虑函数“functional”

4 个解决方案

#1


6  

It depends on what exactly you want to test. Next to Dirks recommendations, svUnit or the RUnit package VitoshKa mentioned, I'd like to add a few things :

这取决于您想要测试什么。除了Dirks的推荐、svUnit或VitoshKa提到的RUnit包之外,我还想补充一些东西:

  • Indeed, set the seed, but make sure you try the function with different seeds as well. Some functions fail only once every ten times you try. Especially when optimization is involved, this becomes crucial. replicate() is a nice function to use in this context.
  • 确实,设置种子,但确保你也尝试了不同种子的功能。有些函数每十次尝试一次就失败一次。特别是涉及到优化时,这就变得至关重要。在此上下文中,复制()是一个很好的函数。
  • Think very well about the input you want to test. You should test a number of "odd" cases that don't really resemble the "perfect" dataset. I always test at least 10 (simulated) datasets of different sizes.
  • 仔细考虑要测试的输入。您应该测试一些与“完美”数据集不太相似的“奇数”情况。我总是测试至少10个不同大小的(模拟的)数据集。
  • Fool-proof the function: I also throw in some data types that are not the ones the function is meant for. Wrong type input is likely going to happen at one point, and the last thing you want is a function returning a bogus result without a warning. If you use that function later on in some other code, debugging that code can and will! be hell. Been there, done that, bought the t-shirt...
  • 万无一失的函数:我还添加了一些数据类型,这些数据类型并不是函数的本意。错误的类型输入可能会在某一时刻发生,您最不希望看到的是一个函数返回一个没有警告的伪结果。如果您稍后在其他代码中使用该函数,那么调试该代码可以并将会!是地狱。去那里,做那个,买t恤……

An example on extended testing of datasets: what would you like to see as output in these cases? Is this the result you'd expect? Not according to the test you did.

关于数据集扩展测试的一个示例:在这些情况下,您希望看到什么输出?这是你期望的结果吗?不是根据你做的测试。

> test3 <- rep(12,100000) # data with only 1 value
> expected <- c(12, 12, 12)
> actual   <- dtheta(test3) 
Error in quantile.default(x, qs) : 'probs' outside [0,1]

>  test4 <- rbinom(100000,30,0.5) # large dataset with a limited amount of values
>  expected <- quantile(test4,c(0.35, 0.50, 0.65))
>  actual   <- dtheta(test4)
>  expected %in% signif(actual,2)
[1] FALSE  TRUE  TRUE

> test5 <- runif(100,0,100) # small dataset. 
> expected <- c(35, 50, 65)
> actual   <- dtheta(test5)
> expected %in% signif(actual,2)
[1] FALSE FALSE FALSE

edit : corrected code so tests are a bit more senseful.

编辑:正确的代码,所以测试更有意义。

#2


6  

You need to write

你需要写

  1. tests that show you get the right answer when you input sensible values

    当您输入合理的值时,显示您得到正确答案的测试

  2. tests that show your function fails correctly when you input nonsense.

    当输入无意义时,显示函数正确失败的测试。

  3. test for all boundary cases

    测试所有边界情况

There is a huge amount of literature on different strategies for testing software; Wikipedia's software testing page is as good a place as any to start.

关于测试软件的不同策略有大量的文献;*的软件测试页面是一个很好的起点。

Looking at your example:

看着你的例子:

What happens when you input a string/dataframe/list?
What about negative x or imaginary x?
How about vector/array x?
If only positive x is allowed, then what happens at x = 0?

当您输入一个字符串/dataframe/list时,会发生什么?负x或者虚x呢?向量/数组x怎么样?如果只允许正x,那么x = 0时会发生什么?

Note that subfunctions (that are only called by your functions and never by the user) need less input checking because you have more control over what goes into the function.

注意,子函数(仅由函数调用,从不由用户调用)需要更少的输入检查,因为您可以对函数中的内容进行更多的控制。

#3


5  

Nice question.

好问题。

Besides generalities such as setting a seed, I would recommend that you look at some of the tests in the R sources. The directory tests/ in the source has a wealth of these; some of the packages in R Base (such as tools) also have subdirectory tests/.

除了诸如设置种子之类的一般性之外,我建议您查看R源中的一些测试。目录测试/在源代码中有丰富的这些;R库中的一些包(比如工具)也有子目录测试/。

#4


3  

It's already appeared as a comment, but I'll add it as a bona fidey answer. R does have a few automated testing packages to help with this kind of thing, the main two being Runit and testthat. I've briefly used runit, and recently started using testthat in more depth (I can't really give any good advantages / disadvantages of one over another though !).

它已经作为评论出现了,但我将把它作为一个真诚的回答。R确实有一些自动化的测试包来帮助处理这种事情,主要的两个是Runit和testthat。我曾短暂地使用过runit,最近我开始更深入地使用testthat(尽管如此,我还是不能给出一个的优点/缺点!)

Automated testing allows you to setup these test cases, as well as others as suggested above like;

自动化测试允许您设置这些测试用例,以及上面建议的其他测试用例;

  • Boundary Tests
  • 边界测试
  • Stress Tests (less need to test for accuracy, just throw data at it and see if it falls over)
  • 压力测试(不需要对准确性进行测试,只需向它扔数据,看看它是否会掉下来)
  • Dealing with different input
  • 处理不同的输入
  • Dealing with different underlying platforms / locales
  • 处理不同的底层平台/地区

#1


6  

It depends on what exactly you want to test. Next to Dirks recommendations, svUnit or the RUnit package VitoshKa mentioned, I'd like to add a few things :

这取决于您想要测试什么。除了Dirks的推荐、svUnit或VitoshKa提到的RUnit包之外,我还想补充一些东西:

  • Indeed, set the seed, but make sure you try the function with different seeds as well. Some functions fail only once every ten times you try. Especially when optimization is involved, this becomes crucial. replicate() is a nice function to use in this context.
  • 确实,设置种子,但确保你也尝试了不同种子的功能。有些函数每十次尝试一次就失败一次。特别是涉及到优化时,这就变得至关重要。在此上下文中,复制()是一个很好的函数。
  • Think very well about the input you want to test. You should test a number of "odd" cases that don't really resemble the "perfect" dataset. I always test at least 10 (simulated) datasets of different sizes.
  • 仔细考虑要测试的输入。您应该测试一些与“完美”数据集不太相似的“奇数”情况。我总是测试至少10个不同大小的(模拟的)数据集。
  • Fool-proof the function: I also throw in some data types that are not the ones the function is meant for. Wrong type input is likely going to happen at one point, and the last thing you want is a function returning a bogus result without a warning. If you use that function later on in some other code, debugging that code can and will! be hell. Been there, done that, bought the t-shirt...
  • 万无一失的函数:我还添加了一些数据类型,这些数据类型并不是函数的本意。错误的类型输入可能会在某一时刻发生,您最不希望看到的是一个函数返回一个没有警告的伪结果。如果您稍后在其他代码中使用该函数,那么调试该代码可以并将会!是地狱。去那里,做那个,买t恤……

An example on extended testing of datasets: what would you like to see as output in these cases? Is this the result you'd expect? Not according to the test you did.

关于数据集扩展测试的一个示例:在这些情况下,您希望看到什么输出?这是你期望的结果吗?不是根据你做的测试。

> test3 <- rep(12,100000) # data with only 1 value
> expected <- c(12, 12, 12)
> actual   <- dtheta(test3) 
Error in quantile.default(x, qs) : 'probs' outside [0,1]

>  test4 <- rbinom(100000,30,0.5) # large dataset with a limited amount of values
>  expected <- quantile(test4,c(0.35, 0.50, 0.65))
>  actual   <- dtheta(test4)
>  expected %in% signif(actual,2)
[1] FALSE  TRUE  TRUE

> test5 <- runif(100,0,100) # small dataset. 
> expected <- c(35, 50, 65)
> actual   <- dtheta(test5)
> expected %in% signif(actual,2)
[1] FALSE FALSE FALSE

edit : corrected code so tests are a bit more senseful.

编辑:正确的代码,所以测试更有意义。

#2


6  

You need to write

你需要写

  1. tests that show you get the right answer when you input sensible values

    当您输入合理的值时,显示您得到正确答案的测试

  2. tests that show your function fails correctly when you input nonsense.

    当输入无意义时,显示函数正确失败的测试。

  3. test for all boundary cases

    测试所有边界情况

There is a huge amount of literature on different strategies for testing software; Wikipedia's software testing page is as good a place as any to start.

关于测试软件的不同策略有大量的文献;*的软件测试页面是一个很好的起点。

Looking at your example:

看着你的例子:

What happens when you input a string/dataframe/list?
What about negative x or imaginary x?
How about vector/array x?
If only positive x is allowed, then what happens at x = 0?

当您输入一个字符串/dataframe/list时,会发生什么?负x或者虚x呢?向量/数组x怎么样?如果只允许正x,那么x = 0时会发生什么?

Note that subfunctions (that are only called by your functions and never by the user) need less input checking because you have more control over what goes into the function.

注意,子函数(仅由函数调用,从不由用户调用)需要更少的输入检查,因为您可以对函数中的内容进行更多的控制。

#3


5  

Nice question.

好问题。

Besides generalities such as setting a seed, I would recommend that you look at some of the tests in the R sources. The directory tests/ in the source has a wealth of these; some of the packages in R Base (such as tools) also have subdirectory tests/.

除了诸如设置种子之类的一般性之外,我建议您查看R源中的一些测试。目录测试/在源代码中有丰富的这些;R库中的一些包(比如工具)也有子目录测试/。

#4


3  

It's already appeared as a comment, but I'll add it as a bona fidey answer. R does have a few automated testing packages to help with this kind of thing, the main two being Runit and testthat. I've briefly used runit, and recently started using testthat in more depth (I can't really give any good advantages / disadvantages of one over another though !).

它已经作为评论出现了,但我将把它作为一个真诚的回答。R确实有一些自动化的测试包来帮助处理这种事情,主要的两个是Runit和testthat。我曾短暂地使用过runit,最近我开始更深入地使用testthat(尽管如此,我还是不能给出一个的优点/缺点!)

Automated testing allows you to setup these test cases, as well as others as suggested above like;

自动化测试允许您设置这些测试用例,以及上面建议的其他测试用例;

  • Boundary Tests
  • 边界测试
  • Stress Tests (less need to test for accuracy, just throw data at it and see if it falls over)
  • 压力测试(不需要对准确性进行测试,只需向它扔数据,看看它是否会掉下来)
  • Dealing with different input
  • 处理不同的输入
  • Dealing with different underlying platforms / locales
  • 处理不同的底层平台/地区