为什么C和c++没有构建检查整数溢出的方法呢?

时间:2020-12-18 20:06:01

Why do C and C++ not provide a set of implementation provided operations to perform each of the basic integer operations with overflow checking provided (e.g. a bool safeAdd(int *out, int a, int b)).

为什么C和c++不提供一组实现提供的操作来执行每个基本整数操作,并提供溢出检查(例如bool safeAdd(int *out, int a, int b))。

As I understand it, most instruction sets have ways to tell if the operations overflowed (e.g. x86 overflow and carry flags) and also define would happens in the case of signed integers.

正如我所理解的,大多数指令集都有方法来判断操作是否溢出(例如x86溢出并携带标志),并且在有符号整数的情况下也会定义。

As such, should compilers not be capable of doing a far better job, creating simpler and faster operations, than what it is possible to code in C and C++?

因此,编译器应该不能做得更好,创建更简单、更快的操作,而不是用C和c++编写代码。

6 个解决方案

#1


10  

C and C++ follow a central tenet of "You don't pay for what you don't need". So the default arithmetic operations aren't going to stray from the underlying architecture's single instruction for arithmetic operations.

C和c++遵循“你不为你不需要的东西付钱”的核心原则。因此,默认的算术操作不会偏离底层体系结构对算术操作的单一指令。

As to why there isn't a standard library function for adding two integers and detecting overflow, I can't say. First of all, it appears the language defines signed integer overflow as undefined behavior:

至于为什么没有一个标准库函数来添加两个整数和检测溢出,我不能说。首先,语言将带符号整数溢出定义为无定义行为:

In the C programming language, signed integer overflow causes undefined behavior,

在C编程语言中,带符号整数溢出导致未定义的行为,

Considering that there are multiple ways to implement signed integer (one's complement, two's complement, etc) and when C was created, these architectures were all prevalent, its understandable why this is undefined. It would be hard to implement a "safe*" pure C function without lots of information about the underlying platform. It could be done knowing on a CPU-by-CPU basis.

考虑到有符号整数的实现有多种方式(一个是补码,两个是补码,等等),并且当C被创建时,这些体系结构都很流行,这可以理解为什么没有定义。如果没有关于底层平台的大量信息,很难实现“安全的*”纯C函数。它可以基于cpu对cpu的了解来完成。

Still that doesn't make it impossible. I'd definitely be interested if someone could find proposals to the C or C++ standards bodies with safer overflow helpers and be able to see why they were rejected.

但这并不意味着这是不可能的。如果有人能找到C或c++标准团体的建议,提供更安全的溢出助手,并能了解为什么他们被拒绝,我肯定会感兴趣。

Regardless, there are many ways in practice to detect arithmetic overflows and libraries to help.

无论如何,在实践中有许多方法可以检测算术溢出和库来帮助。

#2


6  

Probably because there is no demand for it. Arithmetic overflow is undefined behavior, expressedly to allow implementations to do such checks. If compiler vendors thought that doing them would sell more compilers, they would.

可能是因为没有需求。算术溢出是未定义的行为,表示允许实现进行这种检查。如果编译器供应商认为这样做可以卖出更多的编译器,他们会这么做。

In practice, it would be very, very difficult for a compiler to do them more effectively than the programmer can. It's pretty much standard procedure to validate the ranges of all numeric input, to ranges where you can prove that later operations cannot overflow. All good programmers do this as a matter of habit. So this means one quick if immediately after input, and no further checking.

在实践中,编译器要比程序员更有效地执行这些操作是非常非常非常困难的。验证所有数字输入的范围,到您可以证明后续操作不能溢出的范围,几乎是一个标准过程。所有好的程序员都是出于习惯而这样做的。这意味着在输入后立即进行一次快速检查,不再进行进一步检查。

Still, programmers have been known to make mistakes, and it's simple to forget to correct the validation when you change the calculations later. I'd like to see such a feature in a compiler. But apparently, it won't help sell compilers, or at least the vendors believe that it won't, so we don't get it.

尽管如此,众所周知程序员会犯错误,而且很容易在稍后更改计算时忘记更正验证。我希望在编译器中看到这样的特性。但显然,它不会帮助销售编译器,或者至少供应商认为它不会,所以我们没有得到它。

#3


6  

The question pops up regularly.

这个问题经常出现。

First, remember than C is defined to be portable and efficient. As such, it was designed to only provide operations that were supported by a lot of hardware (probably before x86 even saw the light of day).

首先,请记住,than C被定义为可移植性和有效性。因此,它被设计为只提供由许多硬件支持的操作(可能在x86甚至还没有出现之前)。

Second, a number of compiler provide (or plan to provide) built-ins for such operations, so that users may use class-types that use those built-ins under the hood. The quality of the implementations of the built-ins is not as important (though it is) than the fact that a compiler aware of their meaning may optimize the checks out when they are provably useless.

其次,许多编译器为此类操作提供(或计划提供)内嵌,以便用户可以使用在后台使用这些内嵌的类类型。内嵌的实现的质量并不重要(尽管如此),重要的是编译器意识到它们的含义,当它们被证明是无用的时候,它们可能会优化检查。

Finally, there are other ways to actually check programs. For example, static analysis or special compilations modes & unit tests may detect those flaws early and avoid the need (more or less completely) to embed those overflow checks in Release builds.

最后,还有其他方法可以检查程序。例如,静态分析或特殊的编译模式和单元测试可以及早发现这些缺陷,并避免(或多或少地)将溢出检查嵌入到发行版构建中。

#4


5  

Because it is rarely ever needed. When would you actually need to detect integer overflow? In nearly all situations when you need to check some range, then it is usually you to define the actual range because this range depends entirely on the application and algorithm.

因为它很少被需要。何时需要检测整数溢出?在几乎所有情况下,当您需要检查某个范围时,通常需要定义实际的范围,因为这个范围完全依赖于应用程序和算法。

When do you really need to know if a result has overflown the range of int instead of knowing if a result is inside the allowed domain for a particular algorithm or if an index is inside the bounds of an array? It is you who gives your variables a semantic, the language specification only provides you the overall ranges of types and if you chose a type whose range doesn't fit your needs, then it's your fault.

什么时候你真的需要知道一个结果是否超出了int范围,而不是知道一个结果是否在特定算法的允许域内,或者一个索引是否在数组的范围内?是您为变量提供了语义,语言规范只提供了类型的总体范围,如果您选择的类型的范围不符合您的需求,那么这是您的错误。

Integer overflow is UB, because you seldom really care about it. If my unsigned char overflows during operations, I have probably chosen the wrong type for accumulating 10 million numbers. But knowing about the overflow during runtime won't help me, since my design is broken anyway.

整数溢出是UB,因为您很少真正关心它。如果我的无符号字符在操作过程中溢出,我可能选择了错误的类型来积累1000万个数字。但是了解运行时溢出的情况对我没有帮助,因为我的设计已经被破坏了。

#5


4  

A better question might be: why is integer overflow undefined behavior? In practice, 99.9% of all CPUs use two's complement and a carry/overflow bit. So in the real world, on an assembler/opcode level, integer overflows are always well-defined. In fact a whole lot of assembler, or hardware-related C, relies heavily on well-defined integer overflows (drivers for timer hardware in particular).

一个更好的问题可能是:为什么整数溢出没有定义的行为?在实践中,99.9%的cpu使用2的补充和一个进位/溢出位。所以在现实世界中,在汇编程序/操作码级别上,整数溢出总是定义良好的。事实上,许多汇编程序,或与硬件相关的C,严重依赖定义良好的整型溢出(特别是计时器硬件的驱动程序)。

The original C language, before standardization, probably didn't consider things like this in detail. But when C got standardized by ANSI and ISO, they had to follow certain standardization rules. ISO standards aren't allowed to be biased towards a certain technology and thereby give a certain company advantages in competition.

在标准化之前,最初的C语言可能并没有考虑到这些细节。但是当C被ANSI和ISO标准化后,他们必须遵循一定的标准化规则。ISO标准不允许偏向某一技术,从而在竞争中给予某一公司一定的优势。

So they had to consider that some CPUs may possible implement obscure things like one's complement, "sign and magnitude" or "some implementation-defined manner". They had to allowed signed zeroes, padding bits and other obscure signed integer mechanisms.

因此,他们不得不考虑一些cpu可能实现一些模糊的东西,如补语、符号和大小或“一些实现定义的方式”。他们必须允许有符号的零,填充位和其他模糊的符号整数机制。

Because of it, the behavior of signed numbers turned wonderfully fuzzy. You can't tell what happens when a signed integer in C overflows, because signed integers may be expressed in two's complement, one's complement, or possibly some other implementation-defined madness. Therefore integer overflows are undefined behavior.

正因为如此,有符号数字的行为变得非常模糊。当C中的有符号整数溢出时,您无法知道会发生什么,因为有符号整数可以表示为2的补,1的补,或者可能是其他实现定义的疯狂。因此,整数溢出是未定义的行为。

The sane solution to this problem wouldn't be to invent some safe range checks, but rather to state that all signed integers in the C language shall have two's complement format, end of story. Then an unsigned char would always be 0 to 127 and overflow to -128 and everything would be well-defined. But artificial standard bureaucracy prevents the standard from turning sane.

解决这个问题的明智方法不是发明一些安全范围检查,而是声明C语言中的所有有符号整数都应该有两种补码格式,故事到此结束。然后无符号字符将始终为0到127,溢出为-128,所有内容都将得到良好的定义。但人为的标准官僚主义阻碍了标准的正常运转。

There are many issues like this in the C standard. Alignment/padding, endianess etc.

在C标准中有很多这样的问题。对齐/填充,endianess等等。

#6


4  

Why? Well, because they weren't in C when C++ started from that, and because since then nobody proposed such functions and succeeded in convincing compiler makers and committee members that they are useful enough to be provided.

为什么?因为c++刚开始的时候它们不在C语言中,而且从那以后没有人提出这样的函数并成功地说服编译器制造商和委员会成员它们是非常有用的。

Note that compilers do provide such kinds of intrinsics, so it isn't that they are against them.

请注意,编译器确实提供了这种类型的intrinsic,因此它们并不是针对它们的。

Note as well that there are propositions to standardize things like Fixed-Point Arithmetic and Unbounded-Precision Integer Types.

还需要注意的是,有一些建议可以标准化一些东西,比如定点算法和无边界精度整数类型。

So it is probably just that there isn't just enough interest.

所以可能只是没有足够的利息。

#1


10  

C and C++ follow a central tenet of "You don't pay for what you don't need". So the default arithmetic operations aren't going to stray from the underlying architecture's single instruction for arithmetic operations.

C和c++遵循“你不为你不需要的东西付钱”的核心原则。因此,默认的算术操作不会偏离底层体系结构对算术操作的单一指令。

As to why there isn't a standard library function for adding two integers and detecting overflow, I can't say. First of all, it appears the language defines signed integer overflow as undefined behavior:

至于为什么没有一个标准库函数来添加两个整数和检测溢出,我不能说。首先,语言将带符号整数溢出定义为无定义行为:

In the C programming language, signed integer overflow causes undefined behavior,

在C编程语言中,带符号整数溢出导致未定义的行为,

Considering that there are multiple ways to implement signed integer (one's complement, two's complement, etc) and when C was created, these architectures were all prevalent, its understandable why this is undefined. It would be hard to implement a "safe*" pure C function without lots of information about the underlying platform. It could be done knowing on a CPU-by-CPU basis.

考虑到有符号整数的实现有多种方式(一个是补码,两个是补码,等等),并且当C被创建时,这些体系结构都很流行,这可以理解为什么没有定义。如果没有关于底层平台的大量信息,很难实现“安全的*”纯C函数。它可以基于cpu对cpu的了解来完成。

Still that doesn't make it impossible. I'd definitely be interested if someone could find proposals to the C or C++ standards bodies with safer overflow helpers and be able to see why they were rejected.

但这并不意味着这是不可能的。如果有人能找到C或c++标准团体的建议,提供更安全的溢出助手,并能了解为什么他们被拒绝,我肯定会感兴趣。

Regardless, there are many ways in practice to detect arithmetic overflows and libraries to help.

无论如何,在实践中有许多方法可以检测算术溢出和库来帮助。

#2


6  

Probably because there is no demand for it. Arithmetic overflow is undefined behavior, expressedly to allow implementations to do such checks. If compiler vendors thought that doing them would sell more compilers, they would.

可能是因为没有需求。算术溢出是未定义的行为,表示允许实现进行这种检查。如果编译器供应商认为这样做可以卖出更多的编译器,他们会这么做。

In practice, it would be very, very difficult for a compiler to do them more effectively than the programmer can. It's pretty much standard procedure to validate the ranges of all numeric input, to ranges where you can prove that later operations cannot overflow. All good programmers do this as a matter of habit. So this means one quick if immediately after input, and no further checking.

在实践中,编译器要比程序员更有效地执行这些操作是非常非常非常困难的。验证所有数字输入的范围,到您可以证明后续操作不能溢出的范围,几乎是一个标准过程。所有好的程序员都是出于习惯而这样做的。这意味着在输入后立即进行一次快速检查,不再进行进一步检查。

Still, programmers have been known to make mistakes, and it's simple to forget to correct the validation when you change the calculations later. I'd like to see such a feature in a compiler. But apparently, it won't help sell compilers, or at least the vendors believe that it won't, so we don't get it.

尽管如此,众所周知程序员会犯错误,而且很容易在稍后更改计算时忘记更正验证。我希望在编译器中看到这样的特性。但显然,它不会帮助销售编译器,或者至少供应商认为它不会,所以我们没有得到它。

#3


6  

The question pops up regularly.

这个问题经常出现。

First, remember than C is defined to be portable and efficient. As such, it was designed to only provide operations that were supported by a lot of hardware (probably before x86 even saw the light of day).

首先,请记住,than C被定义为可移植性和有效性。因此,它被设计为只提供由许多硬件支持的操作(可能在x86甚至还没有出现之前)。

Second, a number of compiler provide (or plan to provide) built-ins for such operations, so that users may use class-types that use those built-ins under the hood. The quality of the implementations of the built-ins is not as important (though it is) than the fact that a compiler aware of their meaning may optimize the checks out when they are provably useless.

其次,许多编译器为此类操作提供(或计划提供)内嵌,以便用户可以使用在后台使用这些内嵌的类类型。内嵌的实现的质量并不重要(尽管如此),重要的是编译器意识到它们的含义,当它们被证明是无用的时候,它们可能会优化检查。

Finally, there are other ways to actually check programs. For example, static analysis or special compilations modes & unit tests may detect those flaws early and avoid the need (more or less completely) to embed those overflow checks in Release builds.

最后,还有其他方法可以检查程序。例如,静态分析或特殊的编译模式和单元测试可以及早发现这些缺陷,并避免(或多或少地)将溢出检查嵌入到发行版构建中。

#4


5  

Because it is rarely ever needed. When would you actually need to detect integer overflow? In nearly all situations when you need to check some range, then it is usually you to define the actual range because this range depends entirely on the application and algorithm.

因为它很少被需要。何时需要检测整数溢出?在几乎所有情况下,当您需要检查某个范围时,通常需要定义实际的范围,因为这个范围完全依赖于应用程序和算法。

When do you really need to know if a result has overflown the range of int instead of knowing if a result is inside the allowed domain for a particular algorithm or if an index is inside the bounds of an array? It is you who gives your variables a semantic, the language specification only provides you the overall ranges of types and if you chose a type whose range doesn't fit your needs, then it's your fault.

什么时候你真的需要知道一个结果是否超出了int范围,而不是知道一个结果是否在特定算法的允许域内,或者一个索引是否在数组的范围内?是您为变量提供了语义,语言规范只提供了类型的总体范围,如果您选择的类型的范围不符合您的需求,那么这是您的错误。

Integer overflow is UB, because you seldom really care about it. If my unsigned char overflows during operations, I have probably chosen the wrong type for accumulating 10 million numbers. But knowing about the overflow during runtime won't help me, since my design is broken anyway.

整数溢出是UB,因为您很少真正关心它。如果我的无符号字符在操作过程中溢出,我可能选择了错误的类型来积累1000万个数字。但是了解运行时溢出的情况对我没有帮助,因为我的设计已经被破坏了。

#5


4  

A better question might be: why is integer overflow undefined behavior? In practice, 99.9% of all CPUs use two's complement and a carry/overflow bit. So in the real world, on an assembler/opcode level, integer overflows are always well-defined. In fact a whole lot of assembler, or hardware-related C, relies heavily on well-defined integer overflows (drivers for timer hardware in particular).

一个更好的问题可能是:为什么整数溢出没有定义的行为?在实践中,99.9%的cpu使用2的补充和一个进位/溢出位。所以在现实世界中,在汇编程序/操作码级别上,整数溢出总是定义良好的。事实上,许多汇编程序,或与硬件相关的C,严重依赖定义良好的整型溢出(特别是计时器硬件的驱动程序)。

The original C language, before standardization, probably didn't consider things like this in detail. But when C got standardized by ANSI and ISO, they had to follow certain standardization rules. ISO standards aren't allowed to be biased towards a certain technology and thereby give a certain company advantages in competition.

在标准化之前,最初的C语言可能并没有考虑到这些细节。但是当C被ANSI和ISO标准化后,他们必须遵循一定的标准化规则。ISO标准不允许偏向某一技术,从而在竞争中给予某一公司一定的优势。

So they had to consider that some CPUs may possible implement obscure things like one's complement, "sign and magnitude" or "some implementation-defined manner". They had to allowed signed zeroes, padding bits and other obscure signed integer mechanisms.

因此,他们不得不考虑一些cpu可能实现一些模糊的东西,如补语、符号和大小或“一些实现定义的方式”。他们必须允许有符号的零,填充位和其他模糊的符号整数机制。

Because of it, the behavior of signed numbers turned wonderfully fuzzy. You can't tell what happens when a signed integer in C overflows, because signed integers may be expressed in two's complement, one's complement, or possibly some other implementation-defined madness. Therefore integer overflows are undefined behavior.

正因为如此,有符号数字的行为变得非常模糊。当C中的有符号整数溢出时,您无法知道会发生什么,因为有符号整数可以表示为2的补,1的补,或者可能是其他实现定义的疯狂。因此,整数溢出是未定义的行为。

The sane solution to this problem wouldn't be to invent some safe range checks, but rather to state that all signed integers in the C language shall have two's complement format, end of story. Then an unsigned char would always be 0 to 127 and overflow to -128 and everything would be well-defined. But artificial standard bureaucracy prevents the standard from turning sane.

解决这个问题的明智方法不是发明一些安全范围检查,而是声明C语言中的所有有符号整数都应该有两种补码格式,故事到此结束。然后无符号字符将始终为0到127,溢出为-128,所有内容都将得到良好的定义。但人为的标准官僚主义阻碍了标准的正常运转。

There are many issues like this in the C standard. Alignment/padding, endianess etc.

在C标准中有很多这样的问题。对齐/填充,endianess等等。

#6


4  

Why? Well, because they weren't in C when C++ started from that, and because since then nobody proposed such functions and succeeded in convincing compiler makers and committee members that they are useful enough to be provided.

为什么?因为c++刚开始的时候它们不在C语言中,而且从那以后没有人提出这样的函数并成功地说服编译器制造商和委员会成员它们是非常有用的。

Note that compilers do provide such kinds of intrinsics, so it isn't that they are against them.

请注意,编译器确实提供了这种类型的intrinsic,因此它们并不是针对它们的。

Note as well that there are propositions to standardize things like Fixed-Point Arithmetic and Unbounded-Precision Integer Types.

还需要注意的是,有一些建议可以标准化一些东西,比如定点算法和无边界精度整数类型。

So it is probably just that there isn't just enough interest.

所以可能只是没有足够的利息。