使用随机输入测试最佳实践

NOTE: I mention the next couple of paragraphs as background. If you just want a TL;DR, feel free to skip down to the numbered questions as they are only indirectly related to this info.

注意:我提到接下来的几段作为背景。如果您只想要TL; DR,请随意跳到编号问题,因为它们只是间接与此信息相关。

I'm currently writing a python script that does some stuff with POSIX dates (among other things). Unit testing these seems a little bit difficult though, since there's such a wide range of dates and times that can be encountered.

我正在编写一个python脚本,用POSIX日期(以及其他东西)做一些事情。单元测试这些看起来有点困难,因为可以遇到如此广泛的日期和时间。

Of course, it's impractical for me to try to test every single date/time combination possible, so I think I'm going to try a unit test that randomizes the inputs and then reports what the inputs were if the test failed. Statisically speaking, I figure that I can achieve a bit more completeness of testing than I could if I tried to think of all potential problem areas (due to missing things) or testing all cases (due to sheer infeasability), assuming that I run it enough times.

当然,我尝试测试每一个可能的日期/时间组合是不切实际的,所以我想我会尝试一个随机化输入的单元测试,然后报告测试失败时的输入。从统计学的角度来说,我认为如果我试图考虑所有潜在的问题区域(由于缺少的东西)或测试所有情况(由于纯粹的不可行性),我可以实现更多的测试完整性,假设我运行它足够的时间。

So here are a few questions (mainly indirectly related to the above ):

所以这里有几个问题(主要与上述间接相关):

What types of code are good candidates for randomized testing? What types of code aren't?
- How do I go about determining the number of times to run the code with randomized inputs? I ask this because I want to have a large enough sample to determine any bugs, but don't want to wait a week to get my results.
- Are these kinds of tests well suited for unit tests, or is there another kind of test that it works well with?
- Are there any other best practices for doing this kind of thing?

什么类型的代码是随机测试的理想选择?什么类型的代码不是?如何确定使用随机输入运行代码的次数?我问这个是因为我想要一个足够大的样本来确定任何错误,但是不想等一个星期才能得到我的结果。这些测试是否非常适合单元测试,还是有其他类型的测试可以很好地适应?做这种事还有其他最佳做法吗?

9 个解决方案

#1

I agree with Federico - randomised testing is counterproductive. If a test won't reliably pass or fail, it's very hard to fix it and know it's fixed. (This is also a problem when you introduce an unreliable dependency, of course.)

我同意费德里科 - 随机测试适得其反。如果测试不能可靠地通过或失败,则很难修复并知道它是固定的。 (当然,当你引入一个不可靠的依赖时,这也是一个问题。)

Instead, however, you might like to make sure you've got good data coverage in other ways. For instance:

但是,您可能希望确保以其他方式获得良好的数据覆盖率。例如:

Make sure you have tests for the start, middle and end of every month of every year between 1900 and 2100 (if those are suitable for your code, of course).

确保您在1900年到2100年之间每年的每个月的开始,中间和结束都有测试(当然,如果这些测试适合您的代码)。

Use a variety of cultures, or "all of them" if that's known.

如果知道的话,可以使用各种文化或“所有文化”。

Try "day 0" and "one day after the end of each month" etc.

尝试“第0天”和“每月结束后的一天”等。

In short, still try a lot of values, but do so programmatically and repeatably. You don't need every value you try to be a literal in a test - it's fine to loop round all known values for one axis of your testing, etc.

简而言之,仍然尝试很多值,但是以编程方式和可重复的方式执行。您不需要在测试中尝试成为文字的每个值 - 可以循环测试的一个轴的所有已知值,等等。

You'll never get complete coverage, but it will at least be repeatable.

你永远不会完全覆盖,但它至少是可重复的。

EDIT: I'm sure there are places where random tests are useful, although probably not for unit tests. However, in this case I'd like to suggest something: use one RNG to create a random but known seed, and then seed a new RNG with that value - and log it. That way if something interesting happens you will be able to reproduce it by starting an RNG with the logged seed.

编辑:我确信有些地方的随机测试很有用,尽管可能不适用于单元测试。但是,在这种情况下,我想提出一些建议:使用一个RNG创建一个随机但已知的种子,然后使用该值播种新的RNG - 并记录它。这样,如果发生了一些有趣的事情,您将能够通过使用已记录的种子启动RNG来重现它。

#2

With respect to the 3rd question, in my opinion random tests are not well suited for unit testing. If applied to the same piece of code, a unit test should succeed always, or fail always (i.e., wrong behavior due to bugs should be reproducible). You could however use random techniques to generate a large data set, then use that data set within your unit tests; there's nothing wrong with it.

关于第3个问题,我认为随机测试不适合单元测试。如果应用于同一段代码,则单元测试应始终成功,或者始终失败(即,由于错误导致的错误行为应该是可重现的)。但是,您可以使用随机技术生成大型数据集,然后在单元测试中使用该数据集;这没什么不对。

#3

Wow, great question! Some thoughts:

哇,好问题!一些想法:

Random testing is always a good confidence building activity, though as you mentioned, it's best suited to certain types of code.

随机测试始终是一项很好的建立信任活动,但正如您所提到的,它最适合某些类型的代码。

It's an excellent way to stress-test any code whose performance may be related to the number of times it's been executed, or to the sequence of inputs.

对于性能可能与其执行次数或输入序列相关的任何代码进行压力测试是一种很好的方法。

For fairly simple code, or code that expects a limited type of input, I'd prefer systematic test that explicitly cover all of the likely cases, samples of each unlikely or pathological case, and all the boundary conditions.

对于相当简单的代码或需要有限类型输入的代码,我更喜欢明确涵盖所有可能情况的系统测试,每个不太可能或病态情况的样本以及所有边界条件。

#4

Q1) I found that distributed systems with lots of concurrency are good candidates for randomized testing. It is hard to create all possible scenarios for such applications, but random testing can expose problems that you never thought about.

Q1)我发现具有大量并发性的分布式系统是随机测试的良好候选者。很难为这些应用程序创建所有可能的场景,但随机测试可能会暴露您从未想过的问题。

Q2) I guess you could try to use statistics to build an confidence interval around having discovered all "bugs". But the practical answer is: run your randomized tests as many times as you can afford.

Q2)我猜你可以尝试使用统计数据来建立一个发现所有“错误”的置信区间。但实际的答案是:尽可能多地进行随机测试。

Q3) I have found that randomized testing is useful but after you have written the normal battery of unit, integration and regression tests. You should integrate your randomized tests as part of the normal test suite, though probably a small run. If nothing else, you avoid bit rot in the tests themselves, and get some modicum coverage as the team runs the tests with different random inputs.

Q3)我发现随机测试很有用,但是你已经编写了正常的单元,积分和回归测试电池。您应该将随机测试作为正常测试套件的一部分进行集成,尽管可能只是一小部分。如果没有别的,你可以避免测试本身的位腐烂,并在团队使用不同的随机输入运行测试时得到一些小的覆盖。

Q4) When writing randomized tests, make sure you save the random seed with the results of the tests. There is nothing more frustrating than finding that your random tests caught a bug, and not being able to run the test again with the same input. Make sure your test can either be executed with the saved seed too.

Q4)在编写随机测试时,请确保使用测试结果保存随机种子。没有什么比发现你的随机测试捕获到一个错误,并且无法使用相同的输入再次运行测试更令人沮丧。确保您的测试也可以使用保存的种子执行。

#5

A few things:

一些东西:

With random testing, you can't really tell how good a piece of code is, but you can tell how bad it is.

通过随机测试,您无法确定一段代码有多好,但您可以知道它有多糟糕。

Random testing is better suited for things that have random inputs -- a prime example is anything that's exposed to users. So, for example, something that randomly clicks & types all over your app (or OS) is a good test of general robustness.

随机测试更适合具有随机输入的事物 - 一个主要的例子是任何暴露给用户的事物。因此,例如,在您的应用程序(或操作系统)上随机点击和键入的内容是对一般稳健性的良好测试。

Similarly, developers count as users. So something that randomly assembles a GUI from your framework is another good candidate.

同样,开发人员也算作用户。因此,从框架中随机组装GUI的东西是另一个好的候选者。

Again, you're not going to find all the bugs this way -- what you're looking for is "if I do a million whacky things, do ANY of them result in system corruption?" If not, you can feel some level of confidence that your app/OS/SDK/whatever might hold up to a few days' exposure to users.

再说一次,你不会以这种方式找到所有的错误 - 你要找的是“如果我做了一百万个糟糕的事情,那么它们中的任何一个都会导致系统损坏吗?”如果没有,您可以对您的应用/操作系统/ SDK /任何可能持续几天的用户信任感到一定程度的信心。

...But, more importantly, if your random-beater-upper test app can crash your app/OS/SDK in about 5 minutes, that's about how long you'll have until the first fire-drill if you try to ship that sucker.

...但是,更重要的是,如果你的随机拍打上层测试应用程序可能会在大约5分钟内崩溃你的应用程序/操作系统/ SDK,这就是你在第一次消防演习之前你需要多长时间才能发货吸盘。

Also note: REPRODUCIBILITY IS IMPORTANT IN TESTING! Hence, have your test-tool log the random-seed that it used, and have a parameter to start with the same seed. In addition, have it either start from a known "base state" (i.e., reinstall everything from an image on a server & start there) or some recreatable base-state (i.e., reinstall from that image, then alter it according to some random-seed that the test tool takes as a parameter.)

另请注意:重复性在测试中非常重要!因此,让您的测试工具记录它使用的随机种子,并有一个参数以相同的种子开始。另外,让它从一个已知的“基本状态”开始(即,从服务器上的图像重新安装所有内容并从那里开始)或一些可恢复的基本状态(即,从该图像重新安装,然后根据一些随机更改它 - 测试工具将其作为参数。)

Of course, the developers will appreciate if the tool has nice things like "save state every 20,000 events" and "stop right before event #" and "step forward 1/10/100 events." This will greatly aid them in reproducing the problem, finding and fixing it.

当然,如果该工具具有诸如“每20,000个事件保存状态”和“在事件#之前停止”和“向前迈步1/10/100事件”这样的好东西,开发人员将会很感激。这将极大地帮助他们重现问题,找到并修复它。

As someone else pointed out, servers are another thing exposed to users. Get yourself a list of 1,000,000 URLs (grep from server logs), then feed them to your random number generator.

正如其他人所指出的那样,服务器是另一个暴露给用户的东西。获取1,000,000个URL列表(来自服务器日志的grep),然后将它们提供给随机数生成器。

And remember: "system went 24 hours of random pounding without errors" does not mean it's ready to ship, it just means it's stable enough to start some serious testing. Before it can do that, QA should feel free to say "look, your POS can't even last 24 hours under life-like random user simulation -- you fix that, I'm going to spend some time writing better tools."

并记住:“系统24小时随机冲击没有错误”并不意味着它已准备好发货,它只是意味着它足够稳定以开始一些严肃的测试。在它可以做到这一点之前,QA应该随意说“看,你的POS甚至不能在生命中随机用户模拟24小时 - 你解决了这个问题,我将花一些时间编写更好的工具。”

Oh yeah, one last thing: in addition to the "pound it as fast & hard as you can" tests, have the ability to do "exactly what a real user [who was perhaps deranged, or a baby bounding the keyboard/mouse] would do." That is, if you're doing random user-events; do them at the speed that a very-fast typist or very-fast mouse-user could do (with occasional delays, to simulate a SLOW person), in addition to "as fast as my program can spit-out events." These are two **very different* types of tests, and will get very different reactions when bugs are found.

哦,是的,最后一件事:除了“尽可能快速和坚硬”测试之外,还有能力做到“真正的用户[可能是疯狂的人,或者是键盘/鼠标的婴儿]会做。”也就是说,如果您正在进行随机用户事件;以非常快的打字员或非常快的鼠标用户可以做到的速度(偶尔延迟,模拟一个慢人),以及“我的程序可以快速吐出事件”。这些是两种**非常不同的*类型的测试,并且在发现错误时会得到非常不同的反应。

#6

To make tests reproducible, simply use a fixed seed start value. That ensures the same data is used whenever the test runs. Tests will reliably pass or fail.

要使测试可重现,只需使用固定的种子起始值。这确保了测试运行时使用相同的数据。测试将可靠地通过或失败。

Good / bad candidates? Randomized tests are good at finding edge cases (exceptions). A problem is to define the correct result of a randomized input.

好/坏候选人?随机测试擅长发现边缘情况(例外)。问题是定义随机输入的正确结果。

Determining the number of times to run the code: Simply try it out, if it takes too long reduce the iteration count. You may want to use a code coverage tool to find out what part of your application is actually tested.

确定运行代码的次数:只需尝试一下,如果花费太长时间减少迭代次数。您可能希望使用代码覆盖率工具来找出实际测试的应用程序的哪个部分。

Are these kinds of tests well suited for unit tests? Yes.

这些测试是否非常适合单元测试?是。

#7

This might be slightly off-topic, but if you're using .net, there is Pex, which does something similar to randomized testing, but with more intuition by attempting to generate a "random" test case that exercises all of the paths through your code.

这可能稍微偏离主题,但是如果你使用的是.net,那么Pex会做类似于随机测试的东西,但是通过尝试生成一个“随机”测试用例来执行所有路径的更直观你的代码。

#8

Here is my answer to a similar question: Is it a bad practice to randomly-generate test data?. Other answers may be useful as well.

以下是我对类似问题的回答:随机生成测试数据是不好的做法?其他答案也可能有用。

Random testing is a bad practice a long as you don't have a solution for the oracle problem, i.e., determining which is the expected outcome of your software given its input.

只要您没有解决oracle问题的解决方案,即确定哪个是您的软件的预期结果,随机测试是一种不好的做法。

If you solved the oracle problem, you can get one step further than simple random input generation. You can choose input distributions such that specific parts of your software get exercised more than with simple random.

如果你解决了oracle问题,你可以比简单的随机输入生成更进一步。您可以选择输入分布,这样您的软件的特定部分就可以比简单随机的更多地运用。

You then switch from random testing to statistical testing.

然后,您可以从随机测试切换到统计测试。
if (a > 0)
    // Do Foo
else (if b < 0)
    // Do Bar
else
    // Do Foobar
If you select a and b randomly in int range, you exercise Foo 50% of the time, Bar 25% of the time and Foobar 25% of the time. It is likely that you will find more bugs in Foo than in Bar or Foobar.

如果你在int范围内随机选择a和b,你可以50%的时间锻炼Foo,25%的时间锻炼Foo,25%的时间锻炼Foobar。您可能会发现Foo中的错误比Bar或Foobar中的错误更多。

If you select a such that it is negative 66.66% of the time, Bar and Foobar get exercised more than with your first distribution. Indeed the three branches get exercised each 33.33% of the time.

如果您选择66.66%的时间为负值,那么Bar和Foobar的运动量将超过您的第一次分配。实际上,这三个分支机构每33.33%的时间被运用一次。

Of course, if your observed outcome is different than your expected outcome, you have to log everything that can be useful to reproduce the bug.

当然,如果您观察到的结果与预期结果不同,则必须记录可能有助于重现错误的所有内容。

#9

Random testing has the huge advantage that individual tests can be generated for extremely low cost. This is true even if you only have a partial oracle (for example, does the software crash?)

随机测试具有巨大的优势,可以以极低的成本生成单个测试。即使您只有部分oracle,也是如此(例如,软件崩溃了吗?)

In a complex system, random testing will find bugs that are difficult to find by any other means. Think about what this means for security testing: even if you don't do random testing, the black hats will, and they will find bugs you missed.

在复杂的系统中,随机测试会发现很难通过任何其他方式找到的错误。想一想这对安全测试意味着什么:即使你不进行随机测试,黑帽子也会这样,他们会发现你错过的错误。

A fascinating subfield of random testing is randomized differential testing, where two or more systems that are supposed to show the same behavior are stimulated with a common input. If their behavior differs, a bug (in one or both) has been found. This has been applied with great effect to testing of compilers, and invariably finds bugs in any compiler that has not been previously confronted with the technique. Even if you have only one compiler you can try it on different optimization settings to look for varying results, and of course crashes always mean bugs.

随机测试的一个迷人的子领域是随机差分测试,其中两个或多个应该显示相同行为的系统用共同输入激励。如果他们的行为不同,则会发现一个错误(在一个或两个中)。这已经被应用于编译器的测试,并且总是发现任何以前没有遇到过该技术的编译器中的错误。即使您只有一个编译器,您也可以在不同的优化设置上尝试查找不同的结果,当然崩溃总是意味着错误。

#1