《Design by Contract for Embedded Software》 翻译

时间:2022-11-13 22:07:24

原文: Design by Contract for Embedded Software (state-machine.com)

Design by Contract is the single most effective programming technique for delivering high-quality code. Here you can learn what the Design by Contract programming philosophy is, what can it do for you, and why should all embedded software developers care.

契约式设计是交付高质量代码的一种有效的编程技术。在这里,你可以了解到什么是契约式设计的编程理念,它能为你做什么,以及为什么所有的嵌入式软件开发者都应该关注它。

Errors versus Exceptional Conditions

错误 VS 异常

While embedded systems come with their own set of complexities, they also offer many opportunities for simplifications compared to general-purpose computers. Dealing with errors and exceptional conditions provides perhaps the best case in point. Just think, how many times have you seen embedded software terribly convoluted by attempts to painstakingly propagate an error through many layers of code, just to end up doing something trivial with it, such as performing a system reset?

虽然嵌入式系统有其自身的复杂性,但与通用计算机相比,它们也提供了许多简化的机会。处理错误(error)和异常情况(exception)可能是最好的例子。试想一下,你有多少次看到嵌入式软件试图通过分析一层层的代码艰难的把一个层层传播下来的错误捕获,然而由于系统的复杂性,最终只能做一些微不足道的事情去应对,比如执行系统复位?

By error (known otherwise as a “bug”), I mean a persistent defect due to a design or implementation mistake (e.g., overrunning an array index or writing to a file before opening it). When your software has a bug, typically, you cannot reasonably “handle” the situation. You should rather concentrate on detecting (and ultimately fixing) the root cause of the problem. This situation is in contrast to the exceptional condition, which is a specific circumstance that can legitimately arise during the system lifetime but is relatively rare and lies off the main execution path of your software. In contrast to an error, you need to design and implement a recovery strategy that handles the exceptional condition.

所谓错误(以其他方式称为 "bug"),我指的是由于设计或实现上的错误(例如,数组越界或在打开文件之前写入文件)导致的持续缺陷。当你的软件有一个 bug 时,通常,你不能合理地 "处理 "这种情况。你应该专注于检测(并最终修复)问题的根源。这种情况与异常 (特殊情况)相反,异常是指在系统生命周期内可以合法地出现的特定情况,但相对罕见,并且不在你软件的主要执行路径上。与错误相比,你需要设计和实施一个处理异常的恢复策略。

As an example, consider dynamic memory allocation. In any type of system, memory allocation with malloc() (or the C++ new operator) can fail. In a general-purpose computer, a failed malloc() merely indicates that, at this instant the operating system cannot supply the requested memory. This can happen easily in a highly dynamic, general-purpose computing environment. When it happens, you have options to recover from the situation. One option might be for the application to free up some memory that it allocated and then retry the allocation. Another choice could be to prompt the user that the problem exists and encourage them to exit other applications so that the current application can gather more memory. Yet another option is to save data to the disk and exit. Whatever the choice, handling this situation requires some drastic actions, which are clearly off the mainstream behavior of your application. Nevertheless, you should design and implement such actions because in a desktop environment, a failed malloc() must be considered an exceptional condition.

我们以动态内存分配作为一个例子。在任何类型的系统中,用 malloc()(或 C++的 new 操作符)分配内存都可能失败。在通用计算机中,一个失败的 malloc() 仅仅表明,在这一时刻,操作系统不能提供所要求的内存。在一个高度动态的通用计算环境中,这种情况很容易发生。当它发生时,你可以选择从这种情况下恢复。一个选择可能是让应用程序释放它所分配的一些内存,然后重新尝试分配。另一个选择可能是提示用户问题的存在,并鼓励他们退出其他应用程序,以便当前的应用程序可以收集更多的内存。还有一个选择可能是将数据/日志/CoreDump保存到磁盘并退出。不管是什么选择,处理这种情况需要一些激进的措施,这显然是不符合应用程序的主流行为。然而,你应该设计并实现这样的动作或者措施,因为在桌面环境中,malloc () 失败必须被视为一种异常。

In a typical embedded system, on the other hand, the same failed malloc() probably should be flagged as a bug. That’s because embedded systems offer much fewer excuses to run out of memory, so when it happens, it’s typically an indication of a flaw. You cannot really recover from it. Exiting other applications is not an option. Neither is saving data to a disk and exit. Whichever way you look at it, it’s a bug no different really from overflowing the stack, dereferencing a NULL pointer, or overrunning an array index. Instead of bending over backwards in attempts to handle this condition in software (as you would on the desktop), you should concentrate first on finding the root cause and then fixing the problem. (I would first look for a memory leak, wouldn’t you?)

另一方面,在一个典型的嵌入式系统中,同样失败的 malloc() 可能应该被标记为一个错误。这是因为嵌入式系统提供的内存耗尽的借口要少得多,所以当它发生时,它通常是一个缺陷的迹象(bug)。你无法真正从中恢复。退出其他应用程序不是一种选择。将数据保存到磁盘并退出也不是一种选择。无论你从哪方面看,这都是一个错误,与堆栈溢出、访问空指针或数组越界没有什么区别。与其在代码中试图处理这种情况(就像在桌面上一样),你应该首先集中精力找到根本原因,然后解决问题。(如我首先会寻找内存泄漏,你会吗?)

The main point here is that many situations traditionally handled as exceptional conditions in general-purpose computing are in fact bugs in embedded systems. In other words, the specifics of embedded systems (computers dedicated to a single, well-defined purpose) allow you to considerably simplify the embedded software by flagging many situations as bugs (that you don’t need to handle) rather than exceptional conditions (that you do need to handle). The correct distinction between these two situations always depends on the context, so you should not blindly transfer the rules of thumb from other areas of programming to embedded real-time systems. Instead, I propose that you critically ask yourself the following two probing questions: “Can a given situation legitimately arise in this particular system?” and “If it happens, is there anything specific that needs to or can be done in the software?” If the answer to either of these questions is “yes,” then you should handle the situation as an exceptional condition; otherwise, you should treat the situation as a bug.

这里的主要观点是,许多在传统的通用计算中作为异常 (特殊情况) 处理的情况,在嵌入式系统中实际上表现为错误。换句话说,嵌入式系统(专门用于单一的、定义明确的用途的计算机)的特性允许你通过将许多情况标记为 bug(你不需要处理)而不是异常(你需要处理)来大大简化嵌入式软件。这两种情况的正确区分总是取决于上下文,所以你不应该盲目地将其他编程领域的经验法则转移到嵌入式实时系统中。相反,我建议你批判性地问自己以下两个探究性问题。"在这个特定的系统中,一个特定的情况会合法地出现吗?"和 "如果它发生了,在软件中是否有任何具体的需要或可以做的事情?" 如果这两个问题的答案都是 "是",那么你就应该把这种情况作为一种异常来处理;否则,你就应该把这种情况作为一个错误来处理。

The distinction between errors and exceptional conditions in any type of software (not just firmware) is important, because errors require the exact opposite programming strategy than exceptional conditions. The first priority in dealing with errors is to detect them as early as possible. Any attempt to handle a bug (as you would an exceptional condition) results in unnecessary complications of the code and either camouflages the bug or delays its manifestation. (In the worst case, it also introduces new bugs.) Either way, finding and fixing the bug will be harder.

在任何类型的软件(不仅仅是固件)中,区分错误和异常是很重要的,因为错误需要与异常完全相反的编程策略。处理错误的首要任务是尽可能早地发现它们。任何试图处理错误的行为(就像处理特殊情况一样)都会导致代码不必要的复杂化,要么掩盖错误,要么延迟其表现。(在最坏的情况下,它还会引入新的错误。) 无论怎样,发现和修复错误都会更难。

Design by Contract (DbC)

契约设计(DbC)

And here is where the Design by Contract (DbC) philosophy comes in. DbC, pioneered by Bertrand Meyer, views a software system as a set of components whose collaboration is based on precisely defined specifications of mutual obligations—the contracts.1 The central idea of this method is to inherently embed the contracts in the code and validate them automatically at run time. Doing so consistently has two major benefits: 1) It automatically helps detect bugs (as opposed to “handling” them), and 2) It is one of the best ways to document code.

而这正是契约设计(DbC)理念的体现。DbC 是由 Bertrand Meyer 开创的,他将软件系统视为一组组件,这些组件的协作是基于精确定义的相互义务的规范--合同 1。这样做有两个主要好处。1)它自动帮助检测错误(而不是 "处理 "它们),2)它是记录代码的最佳方式之一。

You can implement the most important aspects of DbC (the contracts) in C or C++ with assertions. The Standard C Library macro assert() is rarely applicable to embedded systems, however, because its default behavior (when the integer expression passed to the macro evaluates to 0) is to print an error message and exit. Neither of these actions makes much sense for most embedded systems, which rarely have a screen to print to and cannot really exit either (at least not in the same sense that a desktop application can). Therefore, in an embedded environment, you usually have to define your own assertions that suit your tools and allow you to customize the error response. I’d suggest, however, that you think twice before you go about “enhancing” assertions, because a large part of their power derives from their relative simplicity.

你可以用断言在 C 或 C++中实现 DbC 的最重要的方面(契约/合同)。然而,标准 C 库的宏 assert() 很少适用于嵌入式系统,因为它的默认行为(当传递给宏的整数表达式求值为 0 时)是打印一个错误信息并退出。这两种行为对大多数嵌入式系统来说都没有什么意义,它们很少有屏幕可以打印,也不能真正退出(至少不能像桌面程序那样退出)。因此,在嵌入式环境中,你通常必须定义你自己的断言,以适应你的工具并允许你自定义错误响应。然而,我建议你在 "加强 "断言之前三思而后行,因为断言的很大一部分力量来自于其相对的简单性。

Listing 1. Embedded systems-friendly assertions

#ifndef qassert_h
#define qassert_h

/** NASSERT macro disables all contract validations
 * (assertions, preconditions, postconditions, and invariants).
 */
#ifdef NASSERT /* NASSERT defined--DbC disabled */

#define DEFINE_THIS_FILE
#define ASSERT(ignore_)  ((void)0)
#define ALLEGE(test_)    ((void)(test_))

#else /* NASSERT not defined--DbC enabled */

#ifdef __cplusplus
extern "C"
{
#endif
   /* callback invoked in case of assertion failure */
   void onAssert__(char const *file, unsigned line);
#ifdef __cplusplus
}
#endif

#define DEFINE_THIS_FILE \
   static char const THIS_FILE__[] = __FILE__

#define ASSERT(test_) \
   ((test_) ? (void)0 : onAssert__(THIS_FILE__, __LINE__))

#define ALLEGE(test_)    ASSERT(test_)

#endif /* NASSERT */

#define REQUIRE(test_)   ASSERT(test_)
#define ENSURE(test_)    ASSERT(test_)
#define INVARIANT(test_) ASSERT(test_)

#endif /* qassert_h */

Listing 1 shows the simple embedded systems-friendly assertions that I’ve found adequate for a wide range of embedded projects. Listing 1 is similar to the standard <assert.h> (<cassert> in C++), except that the solution shown in Listing 1:

  • allows customizing the error response;
  • conserves memory by avoiding proliferation of multiple copies of the filename string;
  • provides additional macros for testing and documenting preconditions (REQUIRE), postconditions (ENSURE), and invariants (INVARIANT). (The names of the three last macros are a direct loan from Eiffel, the programming language that natively supports DbC.)

The all-purpose ASSERT() macro (lines 28-29 of Listing 1) is very similar to the standard assert(). If the argument passed to this macro evaluates to 0 (false), and if additionally the macro NASSERT is not defined, then ASSERT() will invoke a global callback onAssert__(). The function onAssert__() gives the clients the opportunity to customize the error response when the assertion fails. In embedded systems, onAssert__() typically first monopolizes the CPU (by disabling interrupts), then possibly attempts to put the system in a fail-safe mode, and eventually triggers a system reset. (Many embedded systems come out of reset in a fail-safe mode, so putting them in this mode before reset is often unnecessary.) If possible, the function should also leave a trail of bread crumbs from the cause, perhaps by storing the filename and line number in a nonvolatile memory. (The entry to onAssert__() is also an ideal place to set a breakpoint if you work with a debugger. TIP: Consult your debugger manual on how you can hard-code a permanent breakpoint in onAssert__().)

Compared to the standard assert(), the macro ASSERT() conserves memory (typically ROM) by passing THIS_FILE__ (Listing 1, line 26) as the first argument to onAssert__(), rather than the standard preprocessor macro __FILE__. This avoids proliferation of the multiple copies of the __FILE__ string but requires invoking macro DEFINE_THIS_FILE (line 25), preferably at the top of every C/C++ file.2

Defining the macro NASSERT (Listing 1, line 7) disables checking the assertions. When disabled, the assertion macros don’t generate any code (lines 10 and 35-37); in particular, they don’t test the expressions passed as arguments, so you should be careful to avoid any side effects (required for normal program operation) inside the expressions tested in assertions. The notable exception is the ALLEGE() macro (lines 11 and 31), which always tests the expression, although when assertions are disabled, it does not invoke the onAssert__() callback. ALLEGE() is useful in situations where avoiding side effects of the test would require introducing temporaries, which involves pushing additional registers onto the stack—something you often want to minimize in embedded systems.

以上比较简单,不翻译了

The DbC Philosophy

DbC的哲学/理念

The most important point to understand about software contracts (assertions in C/C++) is that they neither handle nor prevent errors, in the same way as contracts between people do not prevent fraud. For example, asserting successful memory allocation: ALLEGE((foo = new Foo) != NULL), might give you a warm and fuzzy feeling that you have handled or prevented a bug, when in fact, you haven’t. You did establish a contract, however, in which you spelled out that the inability to dynamically allocate object Foo at this spot in the code is an error. From that point on, the contract will be checked automatically and sure enough, the program will brutally abort if the contract fails. At first, you might think that this must be backwards. Contracts not only do nothing to handle (let alone fix) bugs, but they actually make things worse by turning every asserted condition, however benign, into a fatal error! However, recall from the previous discussion that the first priority when dealing with bugs is to detect them, not to handle them. To this end, a bug that causes a loud crash (and identifies exactly which contract was violated) is much easier to find than a subtle one that manifests itself intermittently millions of machine instructions downstream from the spot where you could have easily detected it.

关于软件契约(C/C++中的断言),最重要的一点是,它们既不能处理也不能防止错误,就像人与人之间的契约不能防止欺诈一样。例如,断言成功的内存分配。ALLEGE ((foo = new Foo) != NULL),可能会给你一种温暖和模糊的感觉,你已经处理或防止了一个错误,而事实上,你没有。然而,你确实建立了一个契约,在这个契约中,你阐明了在代码中的这个位置无法动态分配对象 Foo 是一个错误。从那时起,契约将被自动检查,当然,如果契约失败,程序将被粗暴地中止。起初,你可能认为这一定是倒退。契约不仅对处理(更不用说修复)错误毫无帮助,而且它们实际上使事情变得更糟,因为它们把每一个断言条件,无论多么良性,都变成了一个致命的错误然而,回顾前面的讨论,在处理 bug 时,首要任务是检测它们,而不是处理它们。为此,一个导致大声崩溃的 bug(并准确地识别出哪个契约被违反)要比一个微妙的 bug 更容易被发现,这个 bug 断断续续地表现在离你可以轻易发现它的地方几百万条机器指令的下游。

Assertions in software are in many respects like fuses in electrical circuits. Electrical engineers insert fuses in various places of their circuits to instill a controlled damage (burning a fuse) in case the circuit fails or is mishandled. Any nontrivial circuit, such as the electrical system of a car, has a multitude of differently rated fuses (a 20A fuse is appropriate for the headlights, but it’s way too big for the turn signals) to better help localize problems and to more effectively prevent expensive damage. On the other hand, a fuse can neither prevent nor fix a problem, so replacing a burned fuse doesn’t help until the root cause of the problem is removed. Just like with assertions, the main power of fuses derives from their simplicity.

软件中的断言在很多方面都像电路中的保险丝。电子工程师在电路的各个地方安装保险丝,以便在电路发生故障或处理不当的情况下灌输一种可控的损害(熔断保险丝)。任何非微不足道的电路,如汽车的电气系统,都有许多不同额定值的保险丝(20A 的保险丝适用于大灯,但对于转向灯来说就太大了),以更好地帮助定位问题,更有效地防止昂贵的损害。另一方面,保险丝既不能防止也不能解决问题,所以在问题的根源被消除之前,更换熔断的保险丝并没有帮助。就像断言一样,保险丝的主要力量来自于其简单性。

Due to the simplicity, however, assertions are sometimes viewed as a too primitive error-checking mechanism—something that’s perhaps good enough for smaller programs, but must be replaced with a “real” error handling in the industry-strength software. This view is inconsistent with the DbC philosophy, which regards contracts (assertions in C/C++) as the integral part of the software design. Contracts embody important design decisions, namely declaring certain situations as errors rather than exceptional conditions, and, therefore, embedding them in large-scale, industry-strength software is even more important than in quick-and-dirty solutions. Imagine building a large industrial electrical circuit (say, a power plant) without fuses.

然而,由于其简单性,断言有时被认为是一种过于原始的错误检查机制--对于较小的程序来说,这种机制也许足够好,但在具有工业强度的软件中,必须用 "真正的 "错误处理来代替。这种观点与 DbC 哲学不一致,DbC 哲学认为契约(C/C++中的断言)是软件设计的组成部分。契约体现了重要的设计决策,即把某些情况宣布为错误,而不是异常,因此,把它们嵌入到大规模的、具有工业强度的软件中,甚至比快速和肮脏的解决方案更重要。想象一下,在没有保险丝的情况下建造一个大型的工业电路(比如说,一个发电厂)。

Defensive or Preemptive Programming?

防御式还是进攻式编程?

The term “defensive programming” seems to have two complementary meanings. In the first meaning, the term is used to describe a programming style based on assertions, where you explicitly assert any assumptions that should hold true as long as the software operates correctly.3 In this sense, “defensive programming” is essentially synonymous with DbC.

术语 "防御性编程 "似乎有两个互补的含义。在第一种含义中,这个术语被用来描述一种基于断言的编程风格,在这种风格中,你明确断言任何假设,只要软件运行正常,这些假设就应该是真实的。在这个意义上,"防御性编程 "基本上是 DbC 的同义词。

In the other meaning, however, “defensive programming” denotes a programming style that aims at making operations more robust to errors, by accepting a wider range of inputs or allowing an order of operations not necessarily consistent with the object’s state. In this sense, “defensive programming” is complementary to DbC. For example, consider the following hypothetical output Port class:

然而,在另一种意义上,"防御性编程 "指的是一种编程风格,旨在通过接受更广泛的输入或允许不一定与对象的状态一致的操作顺序,使操作对错误更加稳健。在这个意义上,"防御性编程 "是对 DbC 的补充。例如,考虑下面这个假想的输出端口类:

class Port {
    bool open_;
public:
    Port() : open_(false) {}
    void open() {
        if (!open_) {
            // open the port ...
            open_ = true;
        }
    }
    void transmit(unsigned char const *buffer, unsigned nBytes) {
        if (open_ && buffer != NULL && nBytes > 0) {
            // transmit nBytes
            // from the buffer ...
        }
    }
    void close() {
        if (!open_) {
            open_ = false;
            // close the port ...
        }
    }
    // . . .
};

This class is programmed defensively (in the second meaning of the term), because it silently accepts invoking operations out of order (that is, transmit() before open()) with invalid parameters (e.g., transmit(NULL, 0)). This technique of making operations more robust to errors is often advertised as a better coding style, but unfortunately, it often hides bugs. Is it really a good program that calls port.transmit() before port.open()? Is it really OK to invoke transmit() with an uninitialized transmit buffer? I’d argue that a correctly designed and implemented code should not do such things, and when it happens it’s a sure indication of a larger problem. In comparison, the Port class coded according to the DbC philosophy would use preconditions:

这个类的编程是防御性的(在这个术语的第二个含义中),因为它默默地接受以无效的参数(例如,transmit (NULL, 0))不按顺序调用操作(即在 open () 之前调用 transmit ())。这种使操作对错误更健壮的技术经常被宣传为更好的编码风格,但不幸的是,它经常隐藏着错误。在 port. open () 之前调用 port. transmit () 真的是一个好程序吗?用一个未初始化的发送缓冲区调用发送 () 真的可以吗?我认为,一个正确设计和实现的代码不应该做这样的事情,当它发生时,肯定是一个更大的问题的迹象。相比之下,根据 DbC 理念编码的 Port 类会使用前置条件:

class Port {
    bool open_;
public:
    Port() : open_(false) {}
    void open() {
        REQUIRE(!open_);
        // open the port ...
        open = true;
    }
    void transmit(unsigned char const *buffer, unsigned nBytes) {
        REQUIRE(open_ && buffer != NULL && nBytes > 0);
        // transmit n-bytes
        // from the buffer ...
    }
    void close() {
        REQUIRE(open_);
        open_ = false;
        // close the port ...
    }
    // . . .
};

This implementation is intentionally less flexible, but unlike the defensive version, it’s hard to use this one incorrectly. Additionally (although difficult to convincingly demonstrate with a toy example like this), assertions tend to eliminate a lot of code that you would have to invent to handle the wider range of inputs allowed in the defensive code.

这个实现故意不那么灵活,但与防御性版本不同,很难错误地使用这个版本。此外(虽然很难用这样的玩具例子来令人信服地证明),断言倾向于消除很多你必须发明的代码,以处理防御性代码中允许的更大范围的输入。

But there is more, much more to DbC than just complementing defensive programming. The key to unveiling DbC’s full potential is to preemptively look for conditions to assert. The most effective assertions are discovered by asking two simple questions: “What are the implicit assumptions for this code and the client code to execute correctly?” and “How can I explicitly and most effectively test these assumptions?” By asking these questions for every piece of code you write, you’ll discover valuable conditions that you wouldn’t test otherwise. This way of thinking about assertions leads to a paradigm shift from “defensive” to “preemptive” programming, in which you preemptively look for situations that have even a potential of breeding bugs.

但是,DbC 不仅仅是补充防御性编程,还有更多,更多。揭示 DbC 全部潜力的关键是先发制人地寻找断言的条件。最有效的断言是通过问两个简单的问题发现的。"这段代码和客户代码正确执行的隐含假设是什么?"和 "我怎样才能明确和最有效地测试这些假设?" 通过对你写的每一段代码提出这些问题,你会发现有价值的条件,否则你就不会去测试。这种对断言的思考方式导致了从 "防御性 "到 "先发制人 "的编程模式的转变,在这种模式下,你会先发制人地寻找那些甚至有可能滋生错误的情况。

To this end, embedded systems are particularly suitable for implementing such a “preemptive doctrine.” Embedded CPUs are surrounded by specialized peripherals that just beg to be used for validating correct program execution. For example, a serial communication channel (say, a 16550-type UART) might set a bit in the Line Status Register when the input FIFO overflows. A typical serial driver ignores this bit (after all, the driver cannot recover bytes that already have been lost), but your driver can assert that the FIFO overrun bit is never set. By doing so, you’ll know when your hardware has lost bytes (perhaps due to an intermittent delay in servicing the UART), which is otherwise almost impossible to detect or reproduce at will. Needless to say, with this information you’ll not waste your time on debugging the protocol stack, but instead you’ll concentrate on finding and fixing a timing glitch. You can use timer/counters in a similar way to build real-time assertions that check if you miss your deadlines. In my company, we’ve used a special register of a GPS correlator chip to assert that every GPS channel is always serviced within its C/A code epoch (around every 1 ms)—yet another form of a real-time assertion. The main point of these examples is that the information available almost for free from the hardware wouldn’t be used if not for the “preemptive” assertions. Yet, the information is invaluable, because it’s often the only way to directly validate the time-domain performance of your code.

为此,嵌入式系统特别适合实施这种 "先发制人理论"。嵌入式 CPU 周围有专门的外设,这些外设正好可以用来验证程序的正确执行。例如,当输入 FIFO 溢出时,一个串行通信通道(例如 16550 型 UART)可能在线路状态寄存器中设置一个位。一个典型的串行驱动程序会忽略这个位(毕竟,驱动程序不能恢复已经丢失的字节),但是你的驱动程序可以断言 FIFO 溢出位从未被设置。通过这样做,你就可以知道你的硬件何时丢失了字节(也许是由于 UART 中断服务的间歇性延迟),否则几乎不可能随意检测或再现。不用说,有了这些信息,你就不会把时间浪费在调试协议栈上,而是集中精力寻找和修复一个定时故障。你可以以类似的方式使用定时器/计数器来建立实时断言,检查你是否错过了最后期限。在我的公司里,我们使用 GPS 相关芯片的一个特殊寄存器来断言每个 GPS 通道总是在其 C/A 代码纪元内得到服务(大约每 1ms)--这是实时断言的另一种形式。这些例子的主要观点是,如果不是 "先发制人 "的断言,几乎可以从硬件中免费获得的信息不会被使用。然而,这些信息是无价的,因为它往往是直接验证你的代码的时域性能的唯一方法。

Assertions and Testing

断言和测试

At GE Medical Systems, I once got involved in developing an automatic testing suite for diagnostics X-ray machines, which we called “cycler.” The cycler was essentially a random monkey program that emulated activating soft keys on our touch screen and depressing the foot switch to initiate X-ray exposures. The idea was to let the cycler exercise the system at night and on weekends. Indeed, the cycler helped us to catch quite a few problems, mostly those that left entries in the error log. However, because our software was programmed mostly defensively, in absence of errors in the log we didn’t know if the “cycler” run was truly successful, or perhaps, the code just wandered around all weekend long silently “taking care” of various problems.

在通用电气医疗系统公司,我曾经参与过为诊断用 X 光机开发一个自动测试套件,我们称之为 "循环器"。循环器基本上是一个随机的 Mock 程序,模拟激活我们触摸屏上的软键和按下脚踏开关来启动 X 射线曝光。我们的想法是在晚上和周末让循环器不间断地测试系统。事实上,循环器帮助我们发现了不少问题,主要是那些在错误日志中留下的条目。然而,由于我们的软件主要是防御性编程,在日志中没有错误的情况下,我们不知道 "循环器 "的运行是否真的成功,或者也许,代码只是在整个周末默默地 "照顾 "各种问题而徘徊。

In contrast, every successful test run of code peppered with assertions builds much more confidence in the software. I don’t know exactly what the critical density of assertions must be, but at some point the tests stop producing undefined behavior, segmentation faults, or system hangs—all bugs manifest themselves as assertion failures. This effect of DbC is truly amazing. The integrity checks embodied in assertions prevent the code from “wandering around” and even broken builds don’t crash-and-burn but rather end up hitting an assertion.

相比之下,每一次成功的代码测试运行都掺杂着断言,在软件中建立了更大的信心。我不知道断言的临界密度是多少,但在某些时候,测试不再产生未定义的行为、分段故障或系统挂起,所有的错误都表现为断言失败。DbC 的这种效果确实令人惊讶。体现在断言中的完整性检查防止了代码的 "四处游荡",即使是坏了的构建也不会崩溃和燃烧,而是最终击中断言。

Testing code developed according to DbC principles has an immense psychological impact on programmers. Because assertions escalate every asserted condition to a fatal error, all bugs require attention. DbC makes it so much harder to dismiss an intermittent bug as a “glitch”—after all, you have a record in the form of the filename and the line number where the assertion fired. Once you know where in the code to start your investigations, most bugs are more transparent.

测试根据 DbC 原则开发的代码对程序员有巨大的心理影响。因为断言将每一个断言条件升级为致命的错误,所有的错误都需要关注。DbC 使得将一个间歇性的错误当作 "小故障 "来处理变得非常困难--毕竟,你有一个文件名和断言发生的行号的记录。一旦你知道从代码的哪个地方开始调查,大多数错误就会更加透明。

DbC also encourages testing to be accommodated by the system architecture. In the embedded systems domain, the days of logic analyzers or in-circuit emulators having direct access to all of the CPU’s state information are long gone. Even if you had access to all the CPU’s address and data signals (which you typically don’t, because there are simply not enough pins to go around), the multistage pipelines and cache memories make it impossible to figure out what’s going on in there. The solution requires the testing instrumentation (assertions) integrated directly into the system’s firmware. You can no longer design a system without accounting for testing overhead right from the start. Assuming that all the CPU cycles, the RAM, and all the ROM will be devoted strictly to the job at hand simply won’t get the job done.

DbC 还鼓励测试被系统结构所容纳。在嵌入式系统领域,逻辑分析仪或在线仿真器可以直接访问 CPU 的所有状态信息的日子早已一去不复返。即使你能访问 CPU 的所有地址和数据信号(你通常不能,因为根本没有足够的引脚),多级流水线和高速缓冲存储器使你不可能弄清里面发生了什么。该解决方案要求将测试工具(断言)直接集成到系统的固件中。你不能再设计一个系统而不从一开始就考虑到测试的开销。假设所有的 CPU 周期、RAM 和所有的 ROM 都被严格地用于手头的工作,是无法完成工作的。

Assertions in Production Code

产品代码中的断言

The standard practice is to use assertions during development and testing, but to disable them in the final product by defining the NDEBUG macro. In Listing 1, I have replaced this macro with NASSERT, because many development environments define NDEBUG automatically when you switch to the production version, and I wanted to decouple the decision of disabling assertions from the version of software that you build. That’s because I truly believe that leaving assertions enabled, especially in the ship-version of the product, is a good idea.

标准做法是在开发和测试期间使用断言,但在最终产品中通过定义 NDEBUG 宏来禁用它们。在清单 1 中,我用 NASSERT 代替了这个宏,因为许多开发环境在你切换到生产版本时自动定义了 NDEBUG,我想把禁用断言的决定与你构建的软件版本脱钩。这是因为我真的相信,让断言处于启用状态,特别是在产品的出货版本中,是一个好主意。

The often-quoted opinion in this matter comes from C.A.R. Hoare, who considered disabling assertions in the final product like using a lifebelt during practice, but then not bothering with it for the real thing. I find the comparison of assertions to fuses more compelling. Would you design a prototype board with carefully rated fuses, but then replace them all with 0 W resistors (chunky pieces of wire) for a production run?

在这个问题上经常被引用的意见来自C.A.R.Hoare,他认为在最终产品中禁用断言就像在练习中使用救生圈,但在真正的比赛中却不屑于使用它。我发现把断言比作保险丝更有说服力。你会用精心设计的保险丝来设计一个原型板,但在生产过程中把它们全部换成 0W 的电阻(大块的电线)吗?

The question of shipping with assertions really boils down to two issues. First is the overhead that assertions add to your code. Obviously, if the overhead is too big, you have no choice. (But then I must ask how have you built and tested your firmware?) However, assertions should be considered an integral part of the firmware and properly sized hardware should accommodate them. As the price of hardware rapidly drops and its capabilities skyrocket, it just makes sense to trade a small fraction of the raw CPU horsepower and memory resources for better system integrity. In addition, as I mentioned earlier, assertions often pay for themselves by eliminating reams of defensive code.

使用断言的问题实际上可以归结为两个问题。首先是断言给你的代码带来的开销。很明显,如果开销太大,你就没有选择。(但是,我必须问,你是如何建立和测试你的固件的?)然而,断言应该被认为是固件的一个组成部分,适当大小的硬件应该容纳它们。随着硬件价格的迅速下降和能力的急剧上升,用一小部分原始 CPU 马力和内存资源来换取更好的系统完整性是有意义的。此外,正如我前面提到的,断言往往通过消除大量的防御性代码而得到回报。

The other issue is the correct system response when an assertion fires in the field. As it turns out, a simple system reset is for most embedded devices the least inconvenient action from the user’s perspective—certainly less inconvenient than locking up a device and denying service indefinitely. That’s exactly what happened the other day, when my wife’s cellular phone froze and the only way of bringing it back to life was to pull out the battery. (I don’t know how she does it, but since then she managed to hang her phone again more than once, along with our VCR and even the TV.) The question that comes to my mind is whether the firmware in those products used assertions (or whether the assertions have been enabled)—apparently not, because otherwise the firmware would have reset automatically.

另一个问题是当一个断言在现场发生时的正确系统响应。事实证明,对于大多数嵌入式设备来说,从用户的角度来看,简单的系统重置是最不方便的行动--当然比锁定设备和无限期拒绝服务要不方便。这正是前几天发生的事情,当时我妻子的手机冻结了,而使其恢复正常的唯一方法是拔出电池。(我不知道她是怎么做到的,但从那时起,她又设法不止一次地把手机挂起来,还有我们的录像机,甚至电视)。我想到的问题是,这些产品的固件是否使用了断言(或断言是否已被启用)--显然没有,因为否则固件会自动重置。

Further Reading

Assertions have been a recurring subject of many articles (and rightly so). For example, two articles from the C/C++ Users Journal“Generic: Assertions” and “Generic : Enforcements“, describe how you can unleash templates and exceptions to build truly smart assertions. More specifically to embedded systems, I greatly enjoyed Niall Murphy’s articles devoted to assertions, Assertiveness Training for Programmers and Assert Yourself. By the way, the analogy between assertions in software and fuses in electrical systems was Niall’s original idea, which came up when we talked about assertions at an Embedded Systems Conference some years ago in San Francisco.

The main goal of this article is to convince you that the DbC philosophy can fundamentally change the way you design, implement, test and deploy your software. A good starting point to learn more about DbC is the Eiffel Software website (among others). You can find there “Design by Contract: The Lessons of Ariane”, an interesting interpretation of the infamous Ariane 5 software failure.