I am porting some C code that uses a lot of bit manipulation into Java. The C code operates under the assumption that int is 32 bits wide and char is 8 bits wide. There are assertions in it that check whether those assumptions are valid.
我正在将一些使用大量位操作的C代码移植到Java中。C代码在假定int是32位宽,char是8位宽的情况下运行。里面有一些断言来检查这些假设是否有效。
I have already come to terms with the fact that I'll have to use long
in place of unsigned int
. But can I safely use byte
as a replacement for unsigned char
?
我已经接受了一个事实,我将不得不使用long来代替无符号int,但是我可以安全地使用byte作为无符号char的替换吗?
They merely represent bytes, but I have already run into this bizarre incident: (data
is an unsigned char *
in C and a byte[]
in Java):
它们只是表示字节,但我已经遇到了这种奇怪的情况:(数据在C中是无符号字符*,在Java中是字节[]):
/* C */
uInt32 c = (data[0] << 24) | (data[1] << 16) | (data[2] << 8) | data[3];
/* Java */
long a = ((data[0] << 24) | (data[1] << 16) | (data[2] << 8) | data[3]) & 0xffffffff;
long b = ((data[0] & 0xff) << 24) | ((data[1] & 0xff) << 16) |
((data[2] & 0xff) << 8) | (data[3] & 0xff) & 0xffffffff;
You would think a left shift operation is safe. But due strange unary promotion rules in Java, a
and b
are not going to be the same if some of the bytes in data
are "negative" (b
gives the correct result).
你会认为左移操作是安全的。但是由于Java中奇怪的一元推广规则,如果数据中的某些字节是“负的”(b给出正确的结果),那么a和b将不会是相同的。
What other "gotchas" should I be aware of? I really don't want to use short
here.
我还应该注意什么其他的“陷阱”?这里我不想用short。
2 个解决方案
#1
3
You can safely use a byte
to represent a value between 0 and 255 if you make sure to bitwise-AND its value with 255 (or 0xFF) before using it in computations. This promotes it to an int
, and ensures the promoted value is between 0 and 255.
如果您确保使用位元,您可以安全地使用一个字节来表示0到255之间的值,并且在计算中使用它之前,可以使用255(或0xFF)来表示它的值。这将它提升为int类型,并确保提升值在0到255之间。
Otherwise, integer promotion would result in an int
value between -128 and 127, using sign extension. -127 as a byte
(hex 0x81) would become -127 as an int
(hex 0xFFFFFF81).
否则,使用符号扩展,整数提升将导致-128和127之间的int值。-127作为一个字节(十六进制0x81)会变成-127作为一个int(十六进制0xffffffff81)。
So you can do this:
你可以这样做:
long a = (((data[0] & 255) << 24) | ((data[1] & 255) << 16) | ((data[2] & 255) << 8) | (data[3] & 255)) & 0xffffffff;
Note that the first & 255
is unnecessary here, since a later step masks off the extra bits anyway (& 0xffffffff
). But it's probably simplest to just always include it.
注意,在这里第一个和255是不必要的,因为后面的步骤会去掉多余的部分(& 0xffffffff)。但它可能是最简单的。
#2
-1
... can I safely use
byte
as a replacement forunsigned char
?…我可以安全地使用byte作为无符号字符的替换吗?
As you've discovered, not really... No.
正如你所发现的,并不是真的……不。
According to Oracle Java documentation, byte
is a signed integer type, and though it has 256 distinct values (due to the explicit range specification "It has a minimum value of -128 and a maximum value of 127 (inclusive)" from the documentation) there are values that an unsigned char
from C can store, that a byte
from Java can't (and vice-versa).
根据甲骨文Java文档,字节有符号整数类型,尽管它有256个不同的值(由于明确范围规范“它有一个-128年的最小值和最大值127(包容)”的文档)有价值,一个无符号字符从C可以存储,一个字节从Java不能(反之亦然)。
That explains the problem you've experienced. However, the extent of the problem hasn't been fully demonstrated on your 8-bit-byte implementation.
这就解释了你所经历的问题。但是,问题的程度还没有在您的8位字节实现中得到充分的演示。
What other "gotchas" should I be aware of?
我还应该注意什么其他的“陷阱”?
Whilst a byte
in Java is required to have support for only values between (and including) -128 and 127, Cs unsigned char
has maximum value (UCHAR_MAX
) that depends upon the number of bits used to represent it (CHAR_BIT
; at least 8). So when CHAR_BIT
is greater than 8, there will be extra values beyond 255 that unsigned char
can store.
虽然Java中一个字节只需要支持(包括)-128和127之间的值,但是Cs无符号char具有最大值(UCHAR_MAX),这取决于用于表示它的比特数(CHAR_BIT);至少8).所以当CHAR_BIT大于8时,将有超过255个未签名的char可以存储的值。
In summary, in the world of Java a byte
should really be called an octet
(a group of eight bits) where-as in C a byte (char
, signed char
, unsigned char
) is a group of at least (possibly more than) eight bits.
总之,在Java世界中,一个字节实际上应该被称为八位组(一组8位),而在C中,一个字节(char,有符号的char,无符号的char)是一个至少(可能超过)8位的组。
No. They are not equivalent. I don't think you'll find an equivalent type in Java, either; they're all rather fixed-width. You could safely use byte
in Java as an equivalent for int8_t
in C, however (except that int8_t
isn't required to exist in C unless CHAR_BIT == 8
).
不。它们并非是是等效的。我认为在Java中也找不到相同的类型;他们都是相当固定宽度。但是,您可以安全地使用Java中的byte作为C中的int8_t的等价部分(除了在C中不需要int8_t,除非CHAR_BIT = 8)。
As for pitfalls, there are some in your C code too. Assuming data[0]
is an unsigned char
, data[0] << 24
is undefined behaviour on any system for which INT_MAX == 32767
.
至于缺陷,您的C代码中也有一些缺陷。假设数据[0]是一个无符号字符,数据[0]<< 24是任何系统的未定义行为,其中INT_MAX == 32767。
#1
3
You can safely use a byte
to represent a value between 0 and 255 if you make sure to bitwise-AND its value with 255 (or 0xFF) before using it in computations. This promotes it to an int
, and ensures the promoted value is between 0 and 255.
如果您确保使用位元,您可以安全地使用一个字节来表示0到255之间的值,并且在计算中使用它之前,可以使用255(或0xFF)来表示它的值。这将它提升为int类型,并确保提升值在0到255之间。
Otherwise, integer promotion would result in an int
value between -128 and 127, using sign extension. -127 as a byte
(hex 0x81) would become -127 as an int
(hex 0xFFFFFF81).
否则,使用符号扩展,整数提升将导致-128和127之间的int值。-127作为一个字节(十六进制0x81)会变成-127作为一个int(十六进制0xffffffff81)。
So you can do this:
你可以这样做:
long a = (((data[0] & 255) << 24) | ((data[1] & 255) << 16) | ((data[2] & 255) << 8) | (data[3] & 255)) & 0xffffffff;
Note that the first & 255
is unnecessary here, since a later step masks off the extra bits anyway (& 0xffffffff
). But it's probably simplest to just always include it.
注意,在这里第一个和255是不必要的,因为后面的步骤会去掉多余的部分(& 0xffffffff)。但它可能是最简单的。
#2
-1
... can I safely use
byte
as a replacement forunsigned char
?…我可以安全地使用byte作为无符号字符的替换吗?
As you've discovered, not really... No.
正如你所发现的,并不是真的……不。
According to Oracle Java documentation, byte
is a signed integer type, and though it has 256 distinct values (due to the explicit range specification "It has a minimum value of -128 and a maximum value of 127 (inclusive)" from the documentation) there are values that an unsigned char
from C can store, that a byte
from Java can't (and vice-versa).
根据甲骨文Java文档,字节有符号整数类型,尽管它有256个不同的值(由于明确范围规范“它有一个-128年的最小值和最大值127(包容)”的文档)有价值,一个无符号字符从C可以存储,一个字节从Java不能(反之亦然)。
That explains the problem you've experienced. However, the extent of the problem hasn't been fully demonstrated on your 8-bit-byte implementation.
这就解释了你所经历的问题。但是,问题的程度还没有在您的8位字节实现中得到充分的演示。
What other "gotchas" should I be aware of?
我还应该注意什么其他的“陷阱”?
Whilst a byte
in Java is required to have support for only values between (and including) -128 and 127, Cs unsigned char
has maximum value (UCHAR_MAX
) that depends upon the number of bits used to represent it (CHAR_BIT
; at least 8). So when CHAR_BIT
is greater than 8, there will be extra values beyond 255 that unsigned char
can store.
虽然Java中一个字节只需要支持(包括)-128和127之间的值,但是Cs无符号char具有最大值(UCHAR_MAX),这取决于用于表示它的比特数(CHAR_BIT);至少8).所以当CHAR_BIT大于8时,将有超过255个未签名的char可以存储的值。
In summary, in the world of Java a byte
should really be called an octet
(a group of eight bits) where-as in C a byte (char
, signed char
, unsigned char
) is a group of at least (possibly more than) eight bits.
总之,在Java世界中,一个字节实际上应该被称为八位组(一组8位),而在C中,一个字节(char,有符号的char,无符号的char)是一个至少(可能超过)8位的组。
No. They are not equivalent. I don't think you'll find an equivalent type in Java, either; they're all rather fixed-width. You could safely use byte
in Java as an equivalent for int8_t
in C, however (except that int8_t
isn't required to exist in C unless CHAR_BIT == 8
).
不。它们并非是是等效的。我认为在Java中也找不到相同的类型;他们都是相当固定宽度。但是,您可以安全地使用Java中的byte作为C中的int8_t的等价部分(除了在C中不需要int8_t,除非CHAR_BIT = 8)。
As for pitfalls, there are some in your C code too. Assuming data[0]
is an unsigned char
, data[0] << 24
is undefined behaviour on any system for which INT_MAX == 32767
.
至于缺陷,您的C代码中也有一些缺陷。假设数据[0]是一个无符号字符,数据[0]<< 24是任何系统的未定义行为,其中INT_MAX == 32767。