为什么布尔比char占用更多的内存?

时间:2023-01-05 23:07:54

Why does a Boolean consume 4 bytes and a char 2 bytes in the .NET framework? A Boolean should take up 1bit or at least be smaller than a char.

为什么一个布尔值在。net框架中消耗4个字节和一个char 2字节?一个布尔值应该是1比特,或者至少小于一个字符。

9 个解决方案

#1


50  

It is a question of memory alignment. 4-byte variables work faster than 2-byte ones. This is the reason why you should use int instead of byte or short for counters and the like.

这是一个内存对齐的问题。4字节的变量比2字节的要快。这就是为什么您应该使用int而不是byte或short来表示计数器等等。

You should use 2-byte variables only when memory is a bigger concern than speed. And this is the reason why char (which is Unicode in .NET) takes two bytes instead of four.

只有当内存比速度更重要时,才应该使用2字节的变量。这就是为什么char(。net中的Unicode)需要两个字节而不是四个字节。

#2


15  

About boolean

关于布尔

Most other answers get it wrong - alignment and speed is why a programmer should stick to int for loop counters, not why the compiler can make a byte be 4-bytes wide. All of your reasonings, in fact, apply to byte and short as well as boolean.

大多数其他的答案都是错误的-对齐和速度是为什么程序员应该坚持使用int作为循环计数器,而不是为什么编译器可以使一个字节宽为4字节。实际上,您的所有推理都适用于字节、短和布尔。

In C# at least, bool (or System.Boolean) is a 1-byte wide builtin structure, which can be automatically boxed, so you have an object (which needs two memory words to be represented, at the very least, i.e. 8/16 bytes on 32/64 bits environments respectively) with a field (at least one byte) plus one memory word to point to it, i.e. in total at least 13/25 bytes.

至少在c#中,布尔值(或System.Boolean)是1字节宽内装式结构,可以自动装箱,所以你有一个对象(需要两个记忆单词代表,至少,即8/16字节分别在32/64位环境)与一个字段(至少一个字节)+ 1记忆单词指向它,即总共至少13/25字节。

That's indeed the 1st Google entry on "C# primitive types". http://msdn.microsoft.com/en-us/library/ms228360(VS.80).aspx

这确实是关于“c#原始类型”的第一个谷歌条目。http://msdn.microsoft.com/en-us/library/ms228360(VS.80). aspx

Also the quoted link (http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx) also states that a boolean, by the CLI standard, takes 1 byte.

另外,引用的链接(http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx)还指出,根据CLI标准,布尔值为1字节。

Actually, however, the only place where this is visible is on arrays of booleans - n booleans would take n bytes. In the other cases, one boolean may take 4 bytes.

实际上,唯一可以看到的地方是布尔- n布尔数组,它需要n个字节。在其他情况下,一个布尔值可能需要4个字节。

  • Inside a structure, most runtimes (also in Java) would align all fields to a 4 byte boundary for performance. The Monty JVM for embedded devices is wiser - I guess it reorders fields optimally.
    • On the local frame/operand stack for the interpreter, in most implementation, for performance, one stack entry is one memory-word wide (and maybe on .NET it must be 64-bit wide to support double and long, which on .NET uses just 1 stack entry instead of 2 in Java). A JIT compiler can instead use 1 byte for boolean locals while keeping other vars aligned by reordering fields without performance impact, if the additional overhead is worth it.
    • 对于解释器的本地框架/操作栈,在大多数实现中,对于性能来说,一个堆栈条目是一个内存字宽(可能在。net上,它必须是64位宽的,以支持double和long,而net只使用1个堆栈条目,而不是Java中的2)。如果额外的开销值得的话,JIT编译器可以对布尔局部变量使用一个字节,同时通过重新排序字段来保持其他vars的对齐,而不会影响性能。
  • 在一个结构中,大多数运行时(也用Java)将所有字段对齐到一个4字节的性能边界。用于嵌入式设备的Monty JVM更明智——我猜它会以最佳方式重新排序字段。在解释器的本地框架/操作数堆栈中,在大多数实现中,为了性能,一个堆栈条目是一个内存字宽(在。net中,它必须是64位宽,才能支持双长,在。net中,它只使用一个堆栈条目,而Java中只有两个)。如果额外的开销值得的话,JIT编译器可以对布尔局部变量使用一个字节,同时通过重新排序字段来保持其他vars的对齐,而不会影响性能。

About char

关于字符

char are two bytes because when support for internationalization is required, using two-byte characters internally is the safest bet. This is not related directly to choosing to support Unicode, but to the choice to stick to UTF-16 and to the Basic Multilingual Plane. In Java and C#, you can assume all the time that one logical char fits into a variable of type char.

char是两个字节,因为当需要支持国际化时,在内部使用两个字节的字符是最安全的选择。这与选择支持Unicode没有直接关系,而是选择坚持使用UTF-16和基本的多语言平面。在Java和c#中,您可以一直假设一个逻辑字符适合于char类型的变量。

#3


8  

That's because in a 32-bit environment, the CPU can handle 32-bit values quicker than 8-bit or 16-bit values, so this is a speed/size tradeoff. If you have to save memory and you have a large quantity of bools, just use uints and save your booleans as the bits of 4 byte uints. Chars are 2 bytes wide since they store 16-bit Unicode characters.

这是因为在32位环境中,CPU可以比8位或16位更快地处理32位值,所以这是速度/大小的权衡。如果你需要节省内存,并且你有大量的bools,只需使用uints并将布尔值保存为4字节的uints。字符为2字节宽,因为它们存储16位Unicode字符。

#4


3  

Regardless of the minor difference in memory storage, using Boolean for true/false yes/no values is important for developers (including yourself, when you have to revisit the code a year later), because it more accurately reflects your intent. Making your code more understandable is much more important than saving two bytes.

不管内存存储方面的细微差别,对于开发人员(包括您自己,在一年后必须重新访问代码时)来说,使用布尔值表示true/false yes/no值很重要,因为它更准确地反映了您的意图。使代码更容易理解比保存两个字节要重要得多。

Making your code more accurately reflect your intent also reduces the likelihood that some compiler optimisation will have a negative effect. This advice transcends platforms and compilers.

使您的代码更准确地反映您的意图也降低了某些编译器优化将产生负面影响的可能性。这个建议超越了平台和编译器。

#5


2  

You should also use boolean to help write maintanable code. If I'm glancing at code seeing that something is a boolean is more then worth the memory savings to figure out that your using char as booleans.

您还应该使用布尔值来帮助编写可维护的代码。如果我浏览一下代码,发现某个东西是布尔值,那么就值得节省内存,以确定使用char作为布尔值。

#6


1  

I found this: "Actually, a Boolean is 4 bytes, not 2. The reason is that that's what the CLR supports for Boolean. I think that's what it does because 32 bit values are much more efficient to manipulate, so the time/space tradeoff is, in general, worth it. You should use the bit vector class (forget where it is) if you need to jam a bunch of bits together..."

我发现:“实际上,布尔值是4字节,而不是2字节。原因是CLR支持布尔值。我认为这就是它的作用因为32位值更容易操作,所以时间/空间的权衡,总的来说是值得的。如果你需要把一堆比特粘在一起,你应该使用位向量类(忘掉它在哪里)。

It's written by Paul Wick at http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx

这是Paul Wick在http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx上写的

#7


1  

First of all you should use a profiler to determine where do you have memory problem, IMHO.

首先,您应该使用分析器来确定您的内存问题,IMHO。

#8


1  

Memory is only a concern if you have a large array of bits, in which case you can use the System.Collections.BitArray class.

只有当您有一个大的位数组时,才需要考虑内存,在这种情况下,您可以使用System.Collections。BitArray类。

#9


0  

Its because Windows and .Net have used Unicode (UTF 16) since inception as their internal character set. UTF 16 uses 2 bytes per character or a pair of 2 byte words per character but only if required as it is a variable width encoding.

这是因为Windows和。net从一开始就使用Unicode (UTF 16)作为它们的内部字符集。

"For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words"

对于基本多语言平面(BMP)中的字符,结果编码是一个16位字。对于其他平面上的字符,编码将产生一对16位的单词”

My guess regarding booleans would be they are four bytes as the default register is 32 bits and this would be the minimum size .Net could do a logical operation on efficiently, unless using bitwise operations.

我猜布尔值应该是4字节,因为默认寄存器是32位,这是。net能够高效地进行逻辑操作的最小大小,除非使用位操作。

#1


50  

It is a question of memory alignment. 4-byte variables work faster than 2-byte ones. This is the reason why you should use int instead of byte or short for counters and the like.

这是一个内存对齐的问题。4字节的变量比2字节的要快。这就是为什么您应该使用int而不是byte或short来表示计数器等等。

You should use 2-byte variables only when memory is a bigger concern than speed. And this is the reason why char (which is Unicode in .NET) takes two bytes instead of four.

只有当内存比速度更重要时,才应该使用2字节的变量。这就是为什么char(。net中的Unicode)需要两个字节而不是四个字节。

#2


15  

About boolean

关于布尔

Most other answers get it wrong - alignment and speed is why a programmer should stick to int for loop counters, not why the compiler can make a byte be 4-bytes wide. All of your reasonings, in fact, apply to byte and short as well as boolean.

大多数其他的答案都是错误的-对齐和速度是为什么程序员应该坚持使用int作为循环计数器,而不是为什么编译器可以使一个字节宽为4字节。实际上,您的所有推理都适用于字节、短和布尔。

In C# at least, bool (or System.Boolean) is a 1-byte wide builtin structure, which can be automatically boxed, so you have an object (which needs two memory words to be represented, at the very least, i.e. 8/16 bytes on 32/64 bits environments respectively) with a field (at least one byte) plus one memory word to point to it, i.e. in total at least 13/25 bytes.

至少在c#中,布尔值(或System.Boolean)是1字节宽内装式结构,可以自动装箱,所以你有一个对象(需要两个记忆单词代表,至少,即8/16字节分别在32/64位环境)与一个字段(至少一个字节)+ 1记忆单词指向它,即总共至少13/25字节。

That's indeed the 1st Google entry on "C# primitive types". http://msdn.microsoft.com/en-us/library/ms228360(VS.80).aspx

这确实是关于“c#原始类型”的第一个谷歌条目。http://msdn.microsoft.com/en-us/library/ms228360(VS.80). aspx

Also the quoted link (http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx) also states that a boolean, by the CLI standard, takes 1 byte.

另外,引用的链接(http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx)还指出,根据CLI标准,布尔值为1字节。

Actually, however, the only place where this is visible is on arrays of booleans - n booleans would take n bytes. In the other cases, one boolean may take 4 bytes.

实际上,唯一可以看到的地方是布尔- n布尔数组,它需要n个字节。在其他情况下,一个布尔值可能需要4个字节。

  • Inside a structure, most runtimes (also in Java) would align all fields to a 4 byte boundary for performance. The Monty JVM for embedded devices is wiser - I guess it reorders fields optimally.
    • On the local frame/operand stack for the interpreter, in most implementation, for performance, one stack entry is one memory-word wide (and maybe on .NET it must be 64-bit wide to support double and long, which on .NET uses just 1 stack entry instead of 2 in Java). A JIT compiler can instead use 1 byte for boolean locals while keeping other vars aligned by reordering fields without performance impact, if the additional overhead is worth it.
    • 对于解释器的本地框架/操作栈,在大多数实现中,对于性能来说,一个堆栈条目是一个内存字宽(可能在。net上,它必须是64位宽的,以支持double和long,而net只使用1个堆栈条目,而不是Java中的2)。如果额外的开销值得的话,JIT编译器可以对布尔局部变量使用一个字节,同时通过重新排序字段来保持其他vars的对齐,而不会影响性能。
  • 在一个结构中,大多数运行时(也用Java)将所有字段对齐到一个4字节的性能边界。用于嵌入式设备的Monty JVM更明智——我猜它会以最佳方式重新排序字段。在解释器的本地框架/操作数堆栈中,在大多数实现中,为了性能,一个堆栈条目是一个内存字宽(在。net中,它必须是64位宽,才能支持双长,在。net中,它只使用一个堆栈条目,而Java中只有两个)。如果额外的开销值得的话,JIT编译器可以对布尔局部变量使用一个字节,同时通过重新排序字段来保持其他vars的对齐,而不会影响性能。

About char

关于字符

char are two bytes because when support for internationalization is required, using two-byte characters internally is the safest bet. This is not related directly to choosing to support Unicode, but to the choice to stick to UTF-16 and to the Basic Multilingual Plane. In Java and C#, you can assume all the time that one logical char fits into a variable of type char.

char是两个字节,因为当需要支持国际化时,在内部使用两个字节的字符是最安全的选择。这与选择支持Unicode没有直接关系,而是选择坚持使用UTF-16和基本的多语言平面。在Java和c#中,您可以一直假设一个逻辑字符适合于char类型的变量。

#3


8  

That's because in a 32-bit environment, the CPU can handle 32-bit values quicker than 8-bit or 16-bit values, so this is a speed/size tradeoff. If you have to save memory and you have a large quantity of bools, just use uints and save your booleans as the bits of 4 byte uints. Chars are 2 bytes wide since they store 16-bit Unicode characters.

这是因为在32位环境中,CPU可以比8位或16位更快地处理32位值,所以这是速度/大小的权衡。如果你需要节省内存,并且你有大量的bools,只需使用uints并将布尔值保存为4字节的uints。字符为2字节宽,因为它们存储16位Unicode字符。

#4


3  

Regardless of the minor difference in memory storage, using Boolean for true/false yes/no values is important for developers (including yourself, when you have to revisit the code a year later), because it more accurately reflects your intent. Making your code more understandable is much more important than saving two bytes.

不管内存存储方面的细微差别,对于开发人员(包括您自己,在一年后必须重新访问代码时)来说,使用布尔值表示true/false yes/no值很重要,因为它更准确地反映了您的意图。使代码更容易理解比保存两个字节要重要得多。

Making your code more accurately reflect your intent also reduces the likelihood that some compiler optimisation will have a negative effect. This advice transcends platforms and compilers.

使您的代码更准确地反映您的意图也降低了某些编译器优化将产生负面影响的可能性。这个建议超越了平台和编译器。

#5


2  

You should also use boolean to help write maintanable code. If I'm glancing at code seeing that something is a boolean is more then worth the memory savings to figure out that your using char as booleans.

您还应该使用布尔值来帮助编写可维护的代码。如果我浏览一下代码,发现某个东西是布尔值,那么就值得节省内存,以确定使用char作为布尔值。

#6


1  

I found this: "Actually, a Boolean is 4 bytes, not 2. The reason is that that's what the CLR supports for Boolean. I think that's what it does because 32 bit values are much more efficient to manipulate, so the time/space tradeoff is, in general, worth it. You should use the bit vector class (forget where it is) if you need to jam a bunch of bits together..."

我发现:“实际上,布尔值是4字节,而不是2字节。原因是CLR支持布尔值。我认为这就是它的作用因为32位值更容易操作,所以时间/空间的权衡,总的来说是值得的。如果你需要把一堆比特粘在一起,你应该使用位向量类(忘掉它在哪里)。

It's written by Paul Wick at http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx

这是Paul Wick在http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx上写的

#7


1  

First of all you should use a profiler to determine where do you have memory problem, IMHO.

首先,您应该使用分析器来确定您的内存问题,IMHO。

#8


1  

Memory is only a concern if you have a large array of bits, in which case you can use the System.Collections.BitArray class.

只有当您有一个大的位数组时,才需要考虑内存,在这种情况下,您可以使用System.Collections。BitArray类。

#9


0  

Its because Windows and .Net have used Unicode (UTF 16) since inception as their internal character set. UTF 16 uses 2 bytes per character or a pair of 2 byte words per character but only if required as it is a variable width encoding.

这是因为Windows和。net从一开始就使用Unicode (UTF 16)作为它们的内部字符集。

"For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words"

对于基本多语言平面(BMP)中的字符,结果编码是一个16位字。对于其他平面上的字符,编码将产生一对16位的单词”

My guess regarding booleans would be they are four bytes as the default register is 32 bits and this would be the minimum size .Net could do a logical operation on efficiently, unless using bitwise operations.

我猜布尔值应该是4字节,因为默认寄存器是32位,这是。net能够高效地进行逻辑操作的最小大小,除非使用位操作。