64位的int vs size_t

时间:2022-09-01 10:09:42

Porting code from 32bit to 64bit. Lots of places with

将代码从32位移植到64位。很多的地方

int len = strlen(pstr);

These all generate warnings now because strlen() returns size_t which is 64bit and int is still 32bit. So I've been replacing them with

这些都生成警告,因为strlen()返回size_t,它是64位,int仍然是32位。所以我用

size_t len = strlen(pstr);

But I just realized that this is not safe, as size_t is unsigned and it can be treated as signed by the code (I actually ran into one case where it caused a problem, thank you, unit tests!).

但是我刚刚意识到这是不安全的,因为size_t是无符号的,它可以被代码处理为有符号的(实际上我遇到过一个情况,它导致了问题,谢谢,单元测试!)

Blindly casting strlen return to (int) feels dirty. Or maybe it shouldn't?
So the question is: is there an elegant solution for this? I probably have a thousand lines of code like that in the codebase; I can't manually check each one of them and the test coverage is currently somewhere between 0.01 and 0.001%.

盲目施放strlen返回(int)感觉脏。也许不应该吗?所以问题是:这有一个优雅的解决方案吗?代码库中可能有上千行这样的代码;我不能手动检查每一个,测试覆盖率目前在0.01到0.001%之间。

6 个解决方案

#1


5  

As a compromise, you could use ssize_t (if available). Fake it up if not, using long long, int_fast64_t, intmax_t, or have a platform porting header which lets a suitable type be specified for a platform. ssize_t is in POSIX not standard C or C++, but if you ever hit a platform which doesn't have a signed type of the same size as size_t then I sympathise.

作为妥协,您可以使用ssize_t(如果可用的话)。如果不是,可以使用long long、int_fast64_t、intmax_t,或者使用一个平台移植头,以便为平台指定合适的类型。ssize_t在POSIX中不是标准的C或c++,但是如果你碰到的平台没有与size_t相同大小的签名类型,我表示同情。

A cast to int is nearly safe (assuming 32 bit int on your 64 bit platform, which seems reasonable), because a string is unlikely to be more than 2^31 bytes long. A cast to a larger signed type is even safer. Customers who can afford 2^63 bytes of memory are what's known in the trade as "a good problem to have" ;-)

演员int几乎是安全的(假设32位整数64位平台上,它似乎是合理的),因为一个字符串不太可能超过2 ^ 31个字节长。对更大的符号类型的转换甚至更安全。顾客可以承受2 ^ 63字节的内存是什么在交易时被称为“好问题”;-)

Of course, you could check it:

当然,你可以检查一下:

size_t ulen = strlen(pstr);
if (ulen > SSIZE_MAX) abort(); // preferably trace, log, return error, etc.
ssize_t len = (ssize_t) ulen;

Sure there's an overhead, but if you have 1000 instances then they can't all be performance-critical. For the ones which are (if any), you can do the work to investigate whether len being signed actually matters. If it doesn't, switch to size_t. If it does, rewrite or just take a risk on never meeting an object that absurdly huge. The original code would almost certainly have done the wrong thing anyway on the 32bit platform, if len had been negative as a result of strlen returning a value bigger than INT_MAX.

当然存在开销,但是如果您有1000个实例,那么它们不能都是性能关键型的。对于那些有(如果有的话)签名的,您可以做一些工作来调查是否len签署确实重要。如果没有,切换到size_t。如果是的话,重写或者冒险,永远不要遇到一个巨大得荒谬的对象。原来的代码几乎肯定在32位平台上做错了,如果len由于strlen返回大于INT_MAX的值而为负值。

#2


7  

Some time ago I posted a short note about this kind of issues on my blog and the short answer is:

不久前,我在博客上发表了一篇关于此类问题的短文,简短的回答是:

Always use proper C++ integer types

始终使用正确的c++整数类型

Long answer: When programming in C++, it’s a good idea to use proper integer types relevant to particular context. A little bit of strictness always pays back. It’s not uncommon to see a tendency to ignore the integral types defined as specific to standard containers, namely size_type. It’s available for number of standard container like std::string or std::vector. Such ignorance may get its revenge easily.

长答案:在使用c++编程时,最好使用与特定上下文相关的适当整数类型。一点点的严格总会有回报的。经常会出现忽略特定于标准容器(即size_type)的整型类型的情况。它可用于标准容器的数量,如std::string或std::vector。这种无知很容易招致报复。

Below is a simple example of incorrectly used type to catch result of std::string::find function. I’m quite sure that many would expect there is nothing wrong with the unsigned int here. But, actually this is just a bug. I run Linux on 64-bit architecture and when I compile this program as is, it works as expected. However, when I replace the string in line 1 with abc, it still works but not as expected :-)

下面是一个用于捕获std的错误使用类型的简单示例::find函数。我敢肯定,很多人会认为这里的无符号int没有问题。但是,实际上这只是一个bug。我在64位架构上运行Linux,当我编译这个程序时,它就像预期的那样工作。但是,当我用abc替换第1行中的字符串时,它仍然可以工作,但是不像预期的那样:-)

#include <iostream>
#include <string>
using namespace std;
int main()
{
  string s = "a:b:c"; // "abc" [1]
  char delim = ':';
  unsigned int pos = s.find(delim);
  if(string::npos != pos)
  {
    cout << delim << " found in " << s << endl;
  }
}

Fix is very simply. Just replace unsigned int with std::string::size_type. The problem could be avoided if somebody who wrote this program took care of use of correct type. Not to mention that the program would be portable straight away.

修复是非常简单的。只需用std::string::size_type替换无符号int。如果编写这个程序的人注意使用正确的类型,这个问题就可以避免。更不用说这个程序可以直接携带。

I’ve seen this kind of issues quite many times, especially in code written by former C programmers who do not like to wear the muzzle of strictness the C++ types system enforces and requires. The example above is a trivial one, but I believe it presents the root of the problem well.

我见过这种问题很多次,特别是以前的C程序员编写的代码,他们不喜欢使用c++类型的系统执行和要求的严格的口吻。上面的例子是一个微不足道的例子,但我相信它很好地揭示了问题的根源。

I recommend brilliant article 64-bit development written by Andrey Karpov where you can find a lot more on the subject.

我推荐Andrey Karpov写的杰出文章64位开发,在这里您可以找到更多关于这个主题的内容。

#3


5  

Setting the compiler warnings to the maximum level should get you a nice report of every incorrect sign conversion. In gcc, '-Wall -Wextra' should do.

将编译器警告设置为*别应该可以得到关于每个错误符号转换的漂亮报告。在gcc中,“-Wall -Wextra”应该可以。

You can also use a static code analyzer like cppcheck to see if everything is right.

您还可以使用cppcheck之类的静态代码分析器来查看是否一切正常。

#4


4  

You could use ssize_t (the signed variant of size_t).

您可以使用ssize_t (size_t的带符号变体)。

#5


1  

You can treat site_t signed safely in most cases. The unsigned size_t will be treated as negative only when it (or the intermediate results in expressions) is bigger then 2^31 (for 32-bit) or 2^63 for 64 bit.

在大多数情况下,您可以安全地处理site_t签名。无符号size_t将被视为负只有当它(或中间结果的表达式)是大然后2 ^ 31(32位)或2 ^ 63为64位。

UPDATE: Sorry, size_t will be unsafe in constructions like while ( (size_t)t >=0 ). So right answer is to use ssize_t.

更新:抱歉,size_t在像while (size_t)t >=0这样的结构中是不安全的。正确的答案是用ssize_t。

#6


1  

If your compiler supports c++0x:

如果你的编译器支持c++0x:

auto len = strlen(pstr);

#1


5  

As a compromise, you could use ssize_t (if available). Fake it up if not, using long long, int_fast64_t, intmax_t, or have a platform porting header which lets a suitable type be specified for a platform. ssize_t is in POSIX not standard C or C++, but if you ever hit a platform which doesn't have a signed type of the same size as size_t then I sympathise.

作为妥协,您可以使用ssize_t(如果可用的话)。如果不是,可以使用long long、int_fast64_t、intmax_t,或者使用一个平台移植头,以便为平台指定合适的类型。ssize_t在POSIX中不是标准的C或c++,但是如果你碰到的平台没有与size_t相同大小的签名类型,我表示同情。

A cast to int is nearly safe (assuming 32 bit int on your 64 bit platform, which seems reasonable), because a string is unlikely to be more than 2^31 bytes long. A cast to a larger signed type is even safer. Customers who can afford 2^63 bytes of memory are what's known in the trade as "a good problem to have" ;-)

演员int几乎是安全的(假设32位整数64位平台上,它似乎是合理的),因为一个字符串不太可能超过2 ^ 31个字节长。对更大的符号类型的转换甚至更安全。顾客可以承受2 ^ 63字节的内存是什么在交易时被称为“好问题”;-)

Of course, you could check it:

当然,你可以检查一下:

size_t ulen = strlen(pstr);
if (ulen > SSIZE_MAX) abort(); // preferably trace, log, return error, etc.
ssize_t len = (ssize_t) ulen;

Sure there's an overhead, but if you have 1000 instances then they can't all be performance-critical. For the ones which are (if any), you can do the work to investigate whether len being signed actually matters. If it doesn't, switch to size_t. If it does, rewrite or just take a risk on never meeting an object that absurdly huge. The original code would almost certainly have done the wrong thing anyway on the 32bit platform, if len had been negative as a result of strlen returning a value bigger than INT_MAX.

当然存在开销,但是如果您有1000个实例,那么它们不能都是性能关键型的。对于那些有(如果有的话)签名的,您可以做一些工作来调查是否len签署确实重要。如果没有,切换到size_t。如果是的话,重写或者冒险,永远不要遇到一个巨大得荒谬的对象。原来的代码几乎肯定在32位平台上做错了,如果len由于strlen返回大于INT_MAX的值而为负值。

#2


7  

Some time ago I posted a short note about this kind of issues on my blog and the short answer is:

不久前,我在博客上发表了一篇关于此类问题的短文,简短的回答是:

Always use proper C++ integer types

始终使用正确的c++整数类型

Long answer: When programming in C++, it’s a good idea to use proper integer types relevant to particular context. A little bit of strictness always pays back. It’s not uncommon to see a tendency to ignore the integral types defined as specific to standard containers, namely size_type. It’s available for number of standard container like std::string or std::vector. Such ignorance may get its revenge easily.

长答案:在使用c++编程时,最好使用与特定上下文相关的适当整数类型。一点点的严格总会有回报的。经常会出现忽略特定于标准容器(即size_type)的整型类型的情况。它可用于标准容器的数量,如std::string或std::vector。这种无知很容易招致报复。

Below is a simple example of incorrectly used type to catch result of std::string::find function. I’m quite sure that many would expect there is nothing wrong with the unsigned int here. But, actually this is just a bug. I run Linux on 64-bit architecture and when I compile this program as is, it works as expected. However, when I replace the string in line 1 with abc, it still works but not as expected :-)

下面是一个用于捕获std的错误使用类型的简单示例::find函数。我敢肯定,很多人会认为这里的无符号int没有问题。但是,实际上这只是一个bug。我在64位架构上运行Linux,当我编译这个程序时,它就像预期的那样工作。但是,当我用abc替换第1行中的字符串时,它仍然可以工作,但是不像预期的那样:-)

#include <iostream>
#include <string>
using namespace std;
int main()
{
  string s = "a:b:c"; // "abc" [1]
  char delim = ':';
  unsigned int pos = s.find(delim);
  if(string::npos != pos)
  {
    cout << delim << " found in " << s << endl;
  }
}

Fix is very simply. Just replace unsigned int with std::string::size_type. The problem could be avoided if somebody who wrote this program took care of use of correct type. Not to mention that the program would be portable straight away.

修复是非常简单的。只需用std::string::size_type替换无符号int。如果编写这个程序的人注意使用正确的类型,这个问题就可以避免。更不用说这个程序可以直接携带。

I’ve seen this kind of issues quite many times, especially in code written by former C programmers who do not like to wear the muzzle of strictness the C++ types system enforces and requires. The example above is a trivial one, but I believe it presents the root of the problem well.

我见过这种问题很多次,特别是以前的C程序员编写的代码,他们不喜欢使用c++类型的系统执行和要求的严格的口吻。上面的例子是一个微不足道的例子,但我相信它很好地揭示了问题的根源。

I recommend brilliant article 64-bit development written by Andrey Karpov where you can find a lot more on the subject.

我推荐Andrey Karpov写的杰出文章64位开发,在这里您可以找到更多关于这个主题的内容。

#3


5  

Setting the compiler warnings to the maximum level should get you a nice report of every incorrect sign conversion. In gcc, '-Wall -Wextra' should do.

将编译器警告设置为*别应该可以得到关于每个错误符号转换的漂亮报告。在gcc中,“-Wall -Wextra”应该可以。

You can also use a static code analyzer like cppcheck to see if everything is right.

您还可以使用cppcheck之类的静态代码分析器来查看是否一切正常。

#4


4  

You could use ssize_t (the signed variant of size_t).

您可以使用ssize_t (size_t的带符号变体)。

#5


1  

You can treat site_t signed safely in most cases. The unsigned size_t will be treated as negative only when it (or the intermediate results in expressions) is bigger then 2^31 (for 32-bit) or 2^63 for 64 bit.

在大多数情况下,您可以安全地处理site_t签名。无符号size_t将被视为负只有当它(或中间结果的表达式)是大然后2 ^ 31(32位)或2 ^ 63为64位。

UPDATE: Sorry, size_t will be unsafe in constructions like while ( (size_t)t >=0 ). So right answer is to use ssize_t.

更新:抱歉,size_t在像while (size_t)t >=0这样的结构中是不安全的。正确的答案是用ssize_t。

#6


1  

If your compiler supports c++0x:

如果你的编译器支持c++0x:

auto len = strlen(pstr);