如何测试解释器或编译器?

时间:2021-11-14 20:45:47

I've been experimenting with creating an interpreter for Brainfuck, and while quite simple to make and get up and running, part of me wants to be able to run tests against it. I can't seem to fathom how many tests one might have to write to test all the possible instruction combinations to ensure that the implementation is proper.

我一直在尝试为Brainfuck创建一个解释器,虽然很容易制作和启动并运行,但我的一部分希望能够针对它运行测试。我似乎无法理解可能需要编写多少测试来测试所有可能的指令组合以确保实现是正确的。

Obviously, with Brainfuck, the instruction set is small, but I can't help but think that as more instructions are added, your test code would grow exponentially. More so than your typical tests at any rate.

显然,使用Brainfuck,指令集很小,但我不禁认为随着更多指令的添加,您的测试代码将呈指数级增长。比任何速度的典型测试都要多。

Now, I'm about as newbie as you can get in terms of writing compilers and interpreters, so my assumptions could very well be way off base.

现在,我就像编写编译器和解释器一样可以得到新手,所以我的假设很可能会偏离基础。

Basically, where do you even begin with testing on something like this?

基本上,你甚至从哪里开始测试这样的东西?

6 个解决方案

#1


Testing a compiler is a little different from testing some other kinds of apps, because it's OK for the compiler to produce different assembly-code versions of a program as long as they all do the right thing. However, if you're just testing an interpreter, it's pretty much the same as any other text-based application. Here is a Unix-centric view:

测试编译器与测试其他类型的应用程序略有不同,因为编译器可以生成程序的不同汇编代码版本,只要它们都做正确的事情。但是,如果您只是测试一个解释器,它几乎与任何其他基于文本的应用程序相同。这是一个以Unix为中心的视图:

  1. You will want to build up a regression test suite. Each test should have
    • Source code you will interpret, say test001.bf
    • 您将解释的源代码,例如test001.bf

    • Standard input to the program you will interpret, say test001.0
    • 您将解释的程序的标准输入,例如test001.0

    • What you expect the interpreter to produce on standard output, say test001.1
    • 你期望翻译人员在标准输出上产生什么,比如test001.1

    • What you expect the interpreter to produce on standard error, say test001.2 (you care about standard error because you want to test your interpreter's error messages)
    • 您期望解释器在标准错误上产生什么,比如test001.2(您关心标准错误,因为您想测试解释器的错误消息)

  2. 您将需要构建回归测试套件。每个测试都应该有你要解释的源代码,比如test001.bf你要解释的程序的标准输入,比如test001.0你希望翻译在标准输出上产生什么,比如test001.1你期望翻译生成什么标准错误,比如test001.2(你关心标准错误,因为你想测试你的解释器的错误信息)

  3. You will need a "run test" script that does something like the following

    您将需要一个“运行测试”脚本,它执行以下操作

    function fail {
      echo "Unexpected differences on $1:"
      diff $2 $3
      exit 1
    }
    
    for testname
    do
      tmp1=$(tempfile)
      tmp2=$(tempfile)
      brainfuck $testname.bf < $testname.0 > $tmp1 2> $tmp2
      [ cmp -s $testname.1 $tmp1 ] || fail "stdout" $testname.1 $tmp1
      [ cmp -s $testname.2 $tmp2 ] || fail "stderr" $testname.2 $tmp2
    done
    
  4. You will find it helpful to have a "create test" script that does something like

    你会发现有一个“创建测试”脚本可以做类似的事情

    brainfuck $testname.bf < $testname.0 > $testname.1 2> $testname.2
    

    You run this only when you're totally confident that the interpreter works for that case.

    只有当您完全确信解释器适用于该情况时,才会运行此操作。

  5. You keep your test suite under source control.

    您将测试套件保持在源代码管理之下。

  6. It's convenient to embellish your test script so you can leave out files that are expected to be empty.

    修饰测试脚本很方便,因此可以省去预期为空的文件。

  7. Any time anything changes, you re-run all the tests. You probably also re-run them all nightly via a cron job.

    任何时候任何变化,你重新运行所有的测试。你也可能通过一个cron工作整晚重新运行它们。

  8. Finally, you want to add enough tests to get good test coverage of your compiler's source code. The quality of coverage tools varies widely, but GNU Gcov is an adequate coverage tool.

    最后,您希望添加足够的测试以获得编译器源代码的良好测试覆盖率。覆盖工具的质量差异很大,但GNU Gcov是一个足够的覆盖工具。

Good luck with your interpreter! If you want to see a lovingly crafted but not very well documented testing infrastructure, go look at the test2 directory for the Quick C-- compiler.

祝你的翻译好运!如果你想看一个精心设计但没有很好记录的测试基础设施,请查看Quick C--编译器的test2目录。

#2


I don't think there's anything 'special' about testing a compiler; in a sense it's almost easier than testing some programs, since a compiler has such a basic high-level summary - you hand in source, it gives you back (possibly) compiled code and (possibly) a set of diagnostic messages.

我认为测试编译器没有任何“特殊”之处;从某种意义上说,它比测试某些程序更容易,因为编译器有这样一个基本的高级摘要 - 你交给源代码,它会返回(可能)编译代码和(可能)一组诊断消息。

Like any complex software entity, there will be many code paths, but since it's all very data-oriented (text in, text and bytes out) it's straightforward to author tests.

像任何复杂的软件实体一样,会有很多代码路径,但由于它都是面向数据的(文本输入,文本和字节输出),因此编写测试很简单。

#3


I’ve written an article on compiler testing, the original conclusion of which (slightly toned down for publication) was: It’s morally wrong to reinvent the wheel. Unless you already know all about the preexisting solutions and have a very good reason for ignoring them, you should start by looking at the tools that already exist. The easiest place to start is Gnu C Torture, but bear in mind that it’s based on Deja Gnu, which has, shall we say, issues. (It took me six attempts even to get the maintainer to allow a critical bug report about the Hello World example onto the mailing list.)

我写了一篇关于编译器测试的文章,其最初的结论(略微降低了发布)是:重新发明*在道德上是错误的。除非你已经了解了所有关于预先存在的解决方案并且有一个非常好的理由忽略它们,否则你应该首先看看已经存在的工具。最容易开始的地方是Gnu C Torture,但要记住它是以Deja Gnu为基础的,我们可以说这是问题。 (我甚至花了六次尝试让维护者将关于Hello World示例的关键错误报告放到邮件列表中。)

I’ll immodestly suggest that you look at the following as a starting place for tools to investigate:

我会毫不客气地建议您将以下内容作为调查工具的起始位置:

  1. Software: Practice and Experience April 2007. (Payware, not available to the general public---free preprint at http://pobox.com/~flash/Practical_Testing_of_C99.pdf.

    软件:实践和经验2007年4月。(Payware,一般公众无法使用---免费预印本,网址为http://pobox.com/~flash/Practical_Testing_of_C99.pdf。

  2. http://en.wikipedia.org/wiki/Compiler_correctness#Testing (Largely written by me.)

    http://en.wikipedia.org/wiki/Compiler_correctness#Testing(很大程度上由我写的。)

  3. Compiler testing bibliography (Please let me know of any updates I’ve missed.)

    编译器测试参考书目(请告诉我我错过的任何更新。)

#4


In the case of brainfuck, I think testing it should be done with brainfuck scripts. I would test the following, though:

在brainfuck的情况下,我认为测试它应该使用brainfuck脚本。不过我会测试以下内容:

1: Are all the cells initialized to 0

1:是否所有单元都初始化为0

2: What happens when you decrement the data pointer when it's currently pointing to the first cell? Does it wrap? Does it point to invalid memory?

2:当数据指针当前指向第一个单元格时递减数据指针会发生什么?它包裹吗?它是否指向无效的内存?

3: What happens when you increment the data pointer when it's pointing at the last cell? Does it wrap? Does it point to invalid memory

3:当数据指针指向最后一个单元格时增加数据指针会发生什么?它包裹吗?它是否指向无效的内存

4: Does output function correctly

4:输出功能是否正常

5: Does input function correctly

5:输入功能是否正常

6: Does the [ ] stuff work correctly

6:[]的东西是否正常工作

7: What happens when you increment a byte more than 255 times, does it wrap to 0 properly, or is it incorrectly treated as an integer or other value.

7:当您将一个字节递增超过255次,它是否正确地换行为0,或者它被错误地视为整数或其他值时会发生什么。

More tests are possible too, but this is probably where i'd start. I wrote a BF compiler a few years ago, and that had a few extra tests. Particularly I tested the [ ] stuff heavily, by having a lot of code inside the block, since an early version of my code generator had issues there (on x86 using a jxx I had issues when the block produced more than 128 bytes or so of code, resulting in invalid x86 asm).

更多测试也是可能的,但这可能是我开始的地方。几年前我写了一个BF编译器,还有一些额外的测试。特别是我通过在块中包含大量代码来大量测试[]内容,因为我的代码生成器的早期版本存在问题(在使用jxx的x86上,当块生成超过128个字节时,我遇到了问题代码,导致x86 asm无效)。

#5


You can test with some already written apps.

您可以使用一些已经编写过的应用程序进

#6


The secret is to:

秘诀是:

  • Separate the concerns
  • 分开关注点

  • Observe the law of Demeter
  • 遵守得墨忒耳的法律

  • Inject your dependencies
  • 注入您的依赖项

Well, software that is hard to test is a sign that the developer wrote it like it's 1985. Sorry to say that, but utilizing the three principles I presented here, even line numbered BASIC would be unit testable (it IS possible to inject dependencies into BASIC, because you can do "goto variable".

好吧,很难测试的软件是开发人员像1985年那样编写它的标志。很抱歉这样说,但是利用我在这里提出的三个原则,甚至行编号BASIC也可以单元测试(可以将依赖注入到BASIC,因为你可以做“转变量”。

#1


Testing a compiler is a little different from testing some other kinds of apps, because it's OK for the compiler to produce different assembly-code versions of a program as long as they all do the right thing. However, if you're just testing an interpreter, it's pretty much the same as any other text-based application. Here is a Unix-centric view:

测试编译器与测试其他类型的应用程序略有不同,因为编译器可以生成程序的不同汇编代码版本,只要它们都做正确的事情。但是,如果您只是测试一个解释器,它几乎与任何其他基于文本的应用程序相同。这是一个以Unix为中心的视图:

  1. You will want to build up a regression test suite. Each test should have
    • Source code you will interpret, say test001.bf
    • 您将解释的源代码,例如test001.bf

    • Standard input to the program you will interpret, say test001.0
    • 您将解释的程序的标准输入,例如test001.0

    • What you expect the interpreter to produce on standard output, say test001.1
    • 你期望翻译人员在标准输出上产生什么,比如test001.1

    • What you expect the interpreter to produce on standard error, say test001.2 (you care about standard error because you want to test your interpreter's error messages)
    • 您期望解释器在标准错误上产生什么,比如test001.2(您关心标准错误,因为您想测试解释器的错误消息)

  2. 您将需要构建回归测试套件。每个测试都应该有你要解释的源代码,比如test001.bf你要解释的程序的标准输入,比如test001.0你希望翻译在标准输出上产生什么,比如test001.1你期望翻译生成什么标准错误,比如test001.2(你关心标准错误,因为你想测试你的解释器的错误信息)

  3. You will need a "run test" script that does something like the following

    您将需要一个“运行测试”脚本,它执行以下操作

    function fail {
      echo "Unexpected differences on $1:"
      diff $2 $3
      exit 1
    }
    
    for testname
    do
      tmp1=$(tempfile)
      tmp2=$(tempfile)
      brainfuck $testname.bf < $testname.0 > $tmp1 2> $tmp2
      [ cmp -s $testname.1 $tmp1 ] || fail "stdout" $testname.1 $tmp1
      [ cmp -s $testname.2 $tmp2 ] || fail "stderr" $testname.2 $tmp2
    done
    
  4. You will find it helpful to have a "create test" script that does something like

    你会发现有一个“创建测试”脚本可以做类似的事情

    brainfuck $testname.bf < $testname.0 > $testname.1 2> $testname.2
    

    You run this only when you're totally confident that the interpreter works for that case.

    只有当您完全确信解释器适用于该情况时,才会运行此操作。

  5. You keep your test suite under source control.

    您将测试套件保持在源代码管理之下。

  6. It's convenient to embellish your test script so you can leave out files that are expected to be empty.

    修饰测试脚本很方便,因此可以省去预期为空的文件。

  7. Any time anything changes, you re-run all the tests. You probably also re-run them all nightly via a cron job.

    任何时候任何变化,你重新运行所有的测试。你也可能通过一个cron工作整晚重新运行它们。

  8. Finally, you want to add enough tests to get good test coverage of your compiler's source code. The quality of coverage tools varies widely, but GNU Gcov is an adequate coverage tool.

    最后,您希望添加足够的测试以获得编译器源代码的良好测试覆盖率。覆盖工具的质量差异很大,但GNU Gcov是一个足够的覆盖工具。

Good luck with your interpreter! If you want to see a lovingly crafted but not very well documented testing infrastructure, go look at the test2 directory for the Quick C-- compiler.

祝你的翻译好运!如果你想看一个精心设计但没有很好记录的测试基础设施,请查看Quick C--编译器的test2目录。

#2


I don't think there's anything 'special' about testing a compiler; in a sense it's almost easier than testing some programs, since a compiler has such a basic high-level summary - you hand in source, it gives you back (possibly) compiled code and (possibly) a set of diagnostic messages.

我认为测试编译器没有任何“特殊”之处;从某种意义上说,它比测试某些程序更容易,因为编译器有这样一个基本的高级摘要 - 你交给源代码,它会返回(可能)编译代码和(可能)一组诊断消息。

Like any complex software entity, there will be many code paths, but since it's all very data-oriented (text in, text and bytes out) it's straightforward to author tests.

像任何复杂的软件实体一样,会有很多代码路径,但由于它都是面向数据的(文本输入,文本和字节输出),因此编写测试很简单。

#3


I’ve written an article on compiler testing, the original conclusion of which (slightly toned down for publication) was: It’s morally wrong to reinvent the wheel. Unless you already know all about the preexisting solutions and have a very good reason for ignoring them, you should start by looking at the tools that already exist. The easiest place to start is Gnu C Torture, but bear in mind that it’s based on Deja Gnu, which has, shall we say, issues. (It took me six attempts even to get the maintainer to allow a critical bug report about the Hello World example onto the mailing list.)

我写了一篇关于编译器测试的文章,其最初的结论(略微降低了发布)是:重新发明*在道德上是错误的。除非你已经了解了所有关于预先存在的解决方案并且有一个非常好的理由忽略它们,否则你应该首先看看已经存在的工具。最容易开始的地方是Gnu C Torture,但要记住它是以Deja Gnu为基础的,我们可以说这是问题。 (我甚至花了六次尝试让维护者将关于Hello World示例的关键错误报告放到邮件列表中。)

I’ll immodestly suggest that you look at the following as a starting place for tools to investigate:

我会毫不客气地建议您将以下内容作为调查工具的起始位置:

  1. Software: Practice and Experience April 2007. (Payware, not available to the general public---free preprint at http://pobox.com/~flash/Practical_Testing_of_C99.pdf.

    软件:实践和经验2007年4月。(Payware,一般公众无法使用---免费预印本,网址为http://pobox.com/~flash/Practical_Testing_of_C99.pdf。

  2. http://en.wikipedia.org/wiki/Compiler_correctness#Testing (Largely written by me.)

    http://en.wikipedia.org/wiki/Compiler_correctness#Testing(很大程度上由我写的。)

  3. Compiler testing bibliography (Please let me know of any updates I’ve missed.)

    编译器测试参考书目(请告诉我我错过的任何更新。)

#4


In the case of brainfuck, I think testing it should be done with brainfuck scripts. I would test the following, though:

在brainfuck的情况下,我认为测试它应该使用brainfuck脚本。不过我会测试以下内容:

1: Are all the cells initialized to 0

1:是否所有单元都初始化为0

2: What happens when you decrement the data pointer when it's currently pointing to the first cell? Does it wrap? Does it point to invalid memory?

2:当数据指针当前指向第一个单元格时递减数据指针会发生什么?它包裹吗?它是否指向无效的内存?

3: What happens when you increment the data pointer when it's pointing at the last cell? Does it wrap? Does it point to invalid memory

3:当数据指针指向最后一个单元格时增加数据指针会发生什么?它包裹吗?它是否指向无效的内存

4: Does output function correctly

4:输出功能是否正常

5: Does input function correctly

5:输入功能是否正常

6: Does the [ ] stuff work correctly

6:[]的东西是否正常工作

7: What happens when you increment a byte more than 255 times, does it wrap to 0 properly, or is it incorrectly treated as an integer or other value.

7:当您将一个字节递增超过255次,它是否正确地换行为0,或者它被错误地视为整数或其他值时会发生什么。

More tests are possible too, but this is probably where i'd start. I wrote a BF compiler a few years ago, and that had a few extra tests. Particularly I tested the [ ] stuff heavily, by having a lot of code inside the block, since an early version of my code generator had issues there (on x86 using a jxx I had issues when the block produced more than 128 bytes or so of code, resulting in invalid x86 asm).

更多测试也是可能的,但这可能是我开始的地方。几年前我写了一个BF编译器,还有一些额外的测试。特别是我通过在块中包含大量代码来大量测试[]内容,因为我的代码生成器的早期版本存在问题(在使用jxx的x86上,当块生成超过128个字节时,我遇到了问题代码,导致x86 asm无效)。

#5


You can test with some already written apps.

您可以使用一些已经编写过的应用程序进

#6


The secret is to:

秘诀是:

  • Separate the concerns
  • 分开关注点

  • Observe the law of Demeter
  • 遵守得墨忒耳的法律

  • Inject your dependencies
  • 注入您的依赖项

Well, software that is hard to test is a sign that the developer wrote it like it's 1985. Sorry to say that, but utilizing the three principles I presented here, even line numbered BASIC would be unit testable (it IS possible to inject dependencies into BASIC, because you can do "goto variable".

好吧,很难测试的软件是开发人员像1985年那样编写它的标志。很抱歉这样说,但是利用我在这里提出的三个原则,甚至行编号BASIC也可以单元测试(可以将依赖注入到BASIC,因为你可以做“转变量”。