I am currently writing a C program that requires frequent comparisons of string lengths so I wrote the following helper function:
我目前正在编写一个C程序,它需要频繁地对字符串长度进行比较,所以我编写了以下助手函数:
int strlonger(char *s1, char *s2) {
return strlen(s1) - strlen(s2) > 0;
}
I have noticed that the function returns true even when s1
has shorter length than s2
. Can someone please explain this strange behavior?
我注意到,即使s1的长度小于s2,函数仍然返回true。谁能解释一下这种奇怪的行为吗?
3 个解决方案
#1
174
What you've come across is some peculiar behavior that arises in C when handling expressions that contain both signed and unsigned quantities.
您所遇到的是在处理包含签名和未签名数量的表达式时出现的一些特殊行为。
When an operation is performed where one operand is signed and the other is unsigned, C will implicitly convert the signed argument to unsigned and perform the operations assuming the numbers are nonnegative. This convention often leads to nonintuitive behavior for relational operators such as <
and >
.
当执行一个操作时,其中一个操作数是带符号的,而另一个操作数是无符号的,C将隐式地将带符号的参数转换为无符号,并在数字是非负的情况下执行操作。这种约定常常导致关系运算符(如 <和> )的非直觉行为。
Regarding your helper function, note that since strlen
returns type size_t
(an unsigned quantity), the difference and the comparison are both computed using unsigned arithmetic. When s1
is shorter than s2
, the difference strlen(s1) - strlen(s2)
should be negative, but instead becomes a large, unsigned number, which is greater than 0
. Thus,
关于helper函数,请注意,由于strlen返回类型size_t(一个无符号量),差异和比较都是使用无符号算术计算的。当s1小于s2时,差的strlen(s1) - strlen(s2)应该是负数,而是变成一个大的、无符号的数,大于0。因此,
return strlen(s1) - strlen(s2) > 0;
returns 1
even if s1
is shorter than s2
. To fix your function, use this code instead:
返回1,即使s1小于s2。要修复您的功能,请使用以下代码:
return strlen(s1) > strlen(s2);
Welcome to the wonderful world of C! :)
欢迎来到C的奇妙世界!:)
Additional Examples
Since this question has recently received a lot of attention, I'd like to provide a few (simple) examples, just to ensure that I am getting the idea across. I will assume that we are working with a 32-bit machine using two's complement representation.
由于这个问题最近受到了很多关注,我想提供一些(简单的)示例,以确保我的想法得到了理解。我假设我们使用的是32位机器,使用的是2的补码表示。
The important concept to understand when working with unsigned/signed variables in C is that if there is a mix of unsigned and signed quantities in a single expression, signed values are implicitly cast to unsigned.
在使用C中的无符号/有符号变量时,要理解的重要概念是,如果在单个表达式中存在无符号和有符号量的混合,则将有符号值隐式地转换为无符号值。
Example #1:
Consider the following expression:
考虑以下表达式:
-1 < 0U
Since the second operand is unsigned, the first one is implicitly cast to unsigned, and hence the expression is equivalent to the comparison,
由于第二个操作数是无符号的,第一个操作数被隐式转换为无符号的,因此表达式等价于比较,
4294967295U < 0U
which of course is false. This is probably not the behavior you were expecting.
这当然是错误的。这可能不是你所期望的行为。
Example #2:
Consider the following code that attempts to sum the elements of an array a
, where the number of elements is given by parameter length
:
考虑以下试图对数组a中的元素进行求和的代码,其中元素的数量由参数长度给出:
int sum_array_elements(int a[], unsigned length) {
int i;
int result = 0;
for (i = 0; i <= length-1; i++)
result += a[i];
return result;
}
This function is designed to demonstrate how easily bugs can arise due to implicit casting from signed to unsigned. It seems quite natural to pass parameter length
as unsigned; after all, who would ever want to use a negative length? The stopping criterion i <= length-1
also seems quite intuitive. However, when run with argument length
equal to 0
, the combination of these two yields an unexpected outcome.
这个函数的设计目的是为了演示由签名的隐式转换引起的bug有多容易出现。将参数长度传递为无符号似乎很自然;毕竟,谁会想用负数长度呢?停止条件i <= length-1看起来也很直观。然而,当参数长度为0时,这两种方法的结合会产生意想不到的结果。
Since parameter length
is unsigned, the computation 0-1
is performed using unsigned arithmetic, which is equivalent to modular addition. The result is then UMax. The <=
comparison is also performed using an unsigned comparison, and since any number is less than or equal to UMax, the comparison always holds. Thus, the code will attempt to access invalid elements of array a
.
由于参数长度是无符号的,所以计算0-1是使用无符号算法进行的,这相当于模块化加法。结果就是UMax。使用无符号比较执行<=比较,由于任何数小于或等于UMax,比较总是成立的。因此,代码将尝试访问数组a的无效元素。
The code can be fixed either by declaring length
to be an int
, or by changing the test of the for
loop to be i < length
.
可以通过将长度声明为int,或者将for循环的测试更改为i < length来修复代码。
Conclusion: When Should You Use Unsigned?
I don't want to state anything too controversial here, but here are some of the rules I often adhere to when I write programs in C.
我不想在这里说任何有争议的东西,但是这里有一些我在用C编写程序时经常遵守的规则。
-
DON'T use just because a number is nonnegative. It is easy to make mistakes, and these mistakes are sometimes incredibly subtle (as illustrated in Example #2).
不要仅仅因为一个数字是非负的就使用它。犯错误很容易,而且这些错误有时非常微妙(如示例#2所示)。
-
DO use when performing modular arithmetic.
在执行模块运算时一定要使用。
-
DO use when using bits to represent sets. This is often convenient because it allows you to perform logical right shifts without sign extension.
使用位来表示集合时要使用。这通常很方便,因为它允许您执行逻辑右移,而不需要符号扩展。
Of course, there may be situations in which you decide to go against these "rules". But most often than not, following these suggestions will make your code easier to work with and less error-prone.
当然,在某些情况下,你可能会违背这些“规则”。但是大多数情况下,遵循这些建议将使您的代码更易于使用,并且更不容易出错。
#2
25
strlen
returns a size_t
which is a typedef
for an unsigned
type.
strlen返回size_t,它是无符号类型的类型定义。
So,
所以,
(unsigned) 4 - (unsigned) 7 == (unsigned) - 3
All unsigned
values are greater than or equal to 0
. Try converting the variables returned by strlen
to long int
.
所有的无符号值都大于或等于0。尝试将strlen返回的变量转换为long int类型。
#3
1
Alex Lockwood's answer is the best solution (compact, clear semantics, etc).
Alex Lockwood的回答是最好的解决方案(紧凑、清晰的语义等)。
Sometimes it does make sense to explicitly convert to a signed form of size_t
: ptrdiff_t
, e.g.
有时显式地转换成size_t: ptrdiff_t的有符号形式是有意义的,例如。
return ptrdiff_t(strlen(s1)) - ptrdiff_t(strlen(s2)) > 0;
If you do this, you'll want to be certain that the size_t
value fits in a ptrdiff_t
(which has one fewer mantissa bits).
如果这样做,您将希望确定size_t值是否适合ptrdiff_t(它的尾数位更少)。
#1
174
What you've come across is some peculiar behavior that arises in C when handling expressions that contain both signed and unsigned quantities.
您所遇到的是在处理包含签名和未签名数量的表达式时出现的一些特殊行为。
When an operation is performed where one operand is signed and the other is unsigned, C will implicitly convert the signed argument to unsigned and perform the operations assuming the numbers are nonnegative. This convention often leads to nonintuitive behavior for relational operators such as <
and >
.
当执行一个操作时,其中一个操作数是带符号的,而另一个操作数是无符号的,C将隐式地将带符号的参数转换为无符号,并在数字是非负的情况下执行操作。这种约定常常导致关系运算符(如 <和> )的非直觉行为。
Regarding your helper function, note that since strlen
returns type size_t
(an unsigned quantity), the difference and the comparison are both computed using unsigned arithmetic. When s1
is shorter than s2
, the difference strlen(s1) - strlen(s2)
should be negative, but instead becomes a large, unsigned number, which is greater than 0
. Thus,
关于helper函数,请注意,由于strlen返回类型size_t(一个无符号量),差异和比较都是使用无符号算术计算的。当s1小于s2时,差的strlen(s1) - strlen(s2)应该是负数,而是变成一个大的、无符号的数,大于0。因此,
return strlen(s1) - strlen(s2) > 0;
returns 1
even if s1
is shorter than s2
. To fix your function, use this code instead:
返回1,即使s1小于s2。要修复您的功能,请使用以下代码:
return strlen(s1) > strlen(s2);
Welcome to the wonderful world of C! :)
欢迎来到C的奇妙世界!:)
Additional Examples
Since this question has recently received a lot of attention, I'd like to provide a few (simple) examples, just to ensure that I am getting the idea across. I will assume that we are working with a 32-bit machine using two's complement representation.
由于这个问题最近受到了很多关注,我想提供一些(简单的)示例,以确保我的想法得到了理解。我假设我们使用的是32位机器,使用的是2的补码表示。
The important concept to understand when working with unsigned/signed variables in C is that if there is a mix of unsigned and signed quantities in a single expression, signed values are implicitly cast to unsigned.
在使用C中的无符号/有符号变量时,要理解的重要概念是,如果在单个表达式中存在无符号和有符号量的混合,则将有符号值隐式地转换为无符号值。
Example #1:
Consider the following expression:
考虑以下表达式:
-1 < 0U
Since the second operand is unsigned, the first one is implicitly cast to unsigned, and hence the expression is equivalent to the comparison,
由于第二个操作数是无符号的,第一个操作数被隐式转换为无符号的,因此表达式等价于比较,
4294967295U < 0U
which of course is false. This is probably not the behavior you were expecting.
这当然是错误的。这可能不是你所期望的行为。
Example #2:
Consider the following code that attempts to sum the elements of an array a
, where the number of elements is given by parameter length
:
考虑以下试图对数组a中的元素进行求和的代码,其中元素的数量由参数长度给出:
int sum_array_elements(int a[], unsigned length) {
int i;
int result = 0;
for (i = 0; i <= length-1; i++)
result += a[i];
return result;
}
This function is designed to demonstrate how easily bugs can arise due to implicit casting from signed to unsigned. It seems quite natural to pass parameter length
as unsigned; after all, who would ever want to use a negative length? The stopping criterion i <= length-1
also seems quite intuitive. However, when run with argument length
equal to 0
, the combination of these two yields an unexpected outcome.
这个函数的设计目的是为了演示由签名的隐式转换引起的bug有多容易出现。将参数长度传递为无符号似乎很自然;毕竟,谁会想用负数长度呢?停止条件i <= length-1看起来也很直观。然而,当参数长度为0时,这两种方法的结合会产生意想不到的结果。
Since parameter length
is unsigned, the computation 0-1
is performed using unsigned arithmetic, which is equivalent to modular addition. The result is then UMax. The <=
comparison is also performed using an unsigned comparison, and since any number is less than or equal to UMax, the comparison always holds. Thus, the code will attempt to access invalid elements of array a
.
由于参数长度是无符号的,所以计算0-1是使用无符号算法进行的,这相当于模块化加法。结果就是UMax。使用无符号比较执行<=比较,由于任何数小于或等于UMax,比较总是成立的。因此,代码将尝试访问数组a的无效元素。
The code can be fixed either by declaring length
to be an int
, or by changing the test of the for
loop to be i < length
.
可以通过将长度声明为int,或者将for循环的测试更改为i < length来修复代码。
Conclusion: When Should You Use Unsigned?
I don't want to state anything too controversial here, but here are some of the rules I often adhere to when I write programs in C.
我不想在这里说任何有争议的东西,但是这里有一些我在用C编写程序时经常遵守的规则。
-
DON'T use just because a number is nonnegative. It is easy to make mistakes, and these mistakes are sometimes incredibly subtle (as illustrated in Example #2).
不要仅仅因为一个数字是非负的就使用它。犯错误很容易,而且这些错误有时非常微妙(如示例#2所示)。
-
DO use when performing modular arithmetic.
在执行模块运算时一定要使用。
-
DO use when using bits to represent sets. This is often convenient because it allows you to perform logical right shifts without sign extension.
使用位来表示集合时要使用。这通常很方便,因为它允许您执行逻辑右移,而不需要符号扩展。
Of course, there may be situations in which you decide to go against these "rules". But most often than not, following these suggestions will make your code easier to work with and less error-prone.
当然,在某些情况下,你可能会违背这些“规则”。但是大多数情况下,遵循这些建议将使您的代码更易于使用,并且更不容易出错。
#2
25
strlen
returns a size_t
which is a typedef
for an unsigned
type.
strlen返回size_t,它是无符号类型的类型定义。
So,
所以,
(unsigned) 4 - (unsigned) 7 == (unsigned) - 3
All unsigned
values are greater than or equal to 0
. Try converting the variables returned by strlen
to long int
.
所有的无符号值都大于或等于0。尝试将strlen返回的变量转换为long int类型。
#3
1
Alex Lockwood's answer is the best solution (compact, clear semantics, etc).
Alex Lockwood的回答是最好的解决方案(紧凑、清晰的语义等)。
Sometimes it does make sense to explicitly convert to a signed form of size_t
: ptrdiff_t
, e.g.
有时显式地转换成size_t: ptrdiff_t的有符号形式是有意义的,例如。
return ptrdiff_t(strlen(s1)) - ptrdiff_t(strlen(s2)) > 0;
If you do this, you'll want to be certain that the size_t
value fits in a ptrdiff_t
(which has one fewer mantissa bits).
如果这样做,您将希望确定size_t值是否适合ptrdiff_t(它的尾数位更少)。