
时间:2023-01-05 23:07:54

Why does a Boolean consume 4 bytes and a char 2 bytes in the .NET framework? A Boolean should take up 1bit or at least be smaller than a char.

为什么一个布尔值在。net框架中消耗4个字节和一个char 2字节?一个布尔值应该是1比特,或者至少小于一个字符。

9 个解决方案



It is a question of memory alignment. 4-byte variables work faster than 2-byte ones. This is the reason why you should use int instead of byte or short for counters and the like.


You should use 2-byte variables only when memory is a bigger concern than speed. And this is the reason why char (which is Unicode in .NET) takes two bytes instead of four.




About boolean


Most other answers get it wrong - alignment and speed is why a programmer should stick to int for loop counters, not why the compiler can make a byte be 4-bytes wide. All of your reasonings, in fact, apply to byte and short as well as boolean.


In C# at least, bool (or System.Boolean) is a 1-byte wide builtin structure, which can be automatically boxed, so you have an object (which needs two memory words to be represented, at the very least, i.e. 8/16 bytes on 32/64 bits environments respectively) with a field (at least one byte) plus one memory word to point to it, i.e. in total at least 13/25 bytes.

至少在c#中,布尔值(或System.Boolean)是1字节宽内装式结构,可以自动装箱,所以你有一个对象(需要两个记忆单词代表,至少,即8/16字节分别在32/64位环境)与一个字段(至少一个字节)+ 1记忆单词指向它,即总共至少13/25字节。

That's indeed the 1st Google entry on "C# primitive types". http://msdn.microsoft.com/en-us/library/ms228360(VS.80).aspx

这确实是关于“c#原始类型”的第一个谷歌条目。http://msdn.microsoft.com/en-us/library/ms228360(VS.80). aspx

Also the quoted link (http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx) also states that a boolean, by the CLI standard, takes 1 byte.


Actually, however, the only place where this is visible is on arrays of booleans - n booleans would take n bytes. In the other cases, one boolean may take 4 bytes.

实际上,唯一可以看到的地方是布尔- n布尔数组,它需要n个字节。在其他情况下,一个布尔值可能需要4个字节。

  • Inside a structure, most runtimes (also in Java) would align all fields to a 4 byte boundary for performance. The Monty JVM for embedded devices is wiser - I guess it reorders fields optimally.
    • On the local frame/operand stack for the interpreter, in most implementation, for performance, one stack entry is one memory-word wide (and maybe on .NET it must be 64-bit wide to support double and long, which on .NET uses just 1 stack entry instead of 2 in Java). A JIT compiler can instead use 1 byte for boolean locals while keeping other vars aligned by reordering fields without performance impact, if the additional overhead is worth it.
    • 对于解释器的本地框架/操作栈,在大多数实现中,对于性能来说,一个堆栈条目是一个内存字宽(可能在。net上,它必须是64位宽的,以支持double和long,而net只使用1个堆栈条目,而不是Java中的2)。如果额外的开销值得的话,JIT编译器可以对布尔局部变量使用一个字节,同时通过重新排序字段来保持其他vars的对齐,而不会影响性能。
  • 在一个结构中,大多数运行时(也用Java)将所有字段对齐到一个4字节的性能边界。用于嵌入式设备的Monty JVM更明智——我猜它会以最佳方式重新排序字段。在解释器的本地框架/操作数堆栈中,在大多数实现中,为了性能,一个堆栈条目是一个内存字宽(在。net中,它必须是64位宽,才能支持双长,在。net中,它只使用一个堆栈条目,而Java中只有两个)。如果额外的开销值得的话,JIT编译器可以对布尔局部变量使用一个字节,同时通过重新排序字段来保持其他vars的对齐,而不会影响性能。

About char


char are two bytes because when support for internationalization is required, using two-byte characters internally is the safest bet. This is not related directly to choosing to support Unicode, but to the choice to stick to UTF-16 and to the Basic Multilingual Plane. In Java and C#, you can assume all the time that one logical char fits into a variable of type char.




That's because in a 32-bit environment, the CPU can handle 32-bit values quicker than 8-bit or 16-bit values, so this is a speed/size tradeoff. If you have to save memory and you have a large quantity of bools, just use uints and save your booleans as the bits of 4 byte uints. Chars are 2 bytes wide since they store 16-bit Unicode characters.




Regardless of the minor difference in memory storage, using Boolean for true/false yes/no values is important for developers (including yourself, when you have to revisit the code a year later), because it more accurately reflects your intent. Making your code more understandable is much more important than saving two bytes.

不管内存存储方面的细微差别,对于开发人员(包括您自己,在一年后必须重新访问代码时)来说,使用布尔值表示true/false yes/no值很重要,因为它更准确地反映了您的意图。使代码更容易理解比保存两个字节要重要得多。

Making your code more accurately reflect your intent also reduces the likelihood that some compiler optimisation will have a negative effect. This advice transcends platforms and compilers.




You should also use boolean to help write maintanable code. If I'm glancing at code seeing that something is a boolean is more then worth the memory savings to figure out that your using char as booleans.




I found this: "Actually, a Boolean is 4 bytes, not 2. The reason is that that's what the CLR supports for Boolean. I think that's what it does because 32 bit values are much more efficient to manipulate, so the time/space tradeoff is, in general, worth it. You should use the bit vector class (forget where it is) if you need to jam a bunch of bits together..."


It's written by Paul Wick at http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx

这是Paul Wick在http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx上写的



First of all you should use a profiler to determine where do you have memory problem, IMHO.




Memory is only a concern if you have a large array of bits, in which case you can use the System.Collections.BitArray class.




Its because Windows and .Net have used Unicode (UTF 16) since inception as their internal character set. UTF 16 uses 2 bytes per character or a pair of 2 byte words per character but only if required as it is a variable width encoding.

这是因为Windows和。net从一开始就使用Unicode (UTF 16)作为它们的内部字符集。

"For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words"


My guess regarding booleans would be they are four bytes as the default register is 32 bits and this would be the minimum size .Net could do a logical operation on efficiently, unless using bitwise operations.




It is a question of memory alignment. 4-byte variables work faster than 2-byte ones. This is the reason why you should use int instead of byte or short for counters and the like.


You should use 2-byte variables only when memory is a bigger concern than speed. And this is the reason why char (which is Unicode in .NET) takes two bytes instead of four.




About boolean


Most other answers get it wrong - alignment and speed is why a programmer should stick to int for loop counters, not why the compiler can make a byte be 4-bytes wide. All of your reasonings, in fact, apply to byte and short as well as boolean.


In C# at least, bool (or System.Boolean) is a 1-byte wide builtin structure, which can be automatically boxed, so you have an object (which needs two memory words to be represented, at the very least, i.e. 8/16 bytes on 32/64 bits environments respectively) with a field (at least one byte) plus one memory word to point to it, i.e. in total at least 13/25 bytes.

至少在c#中,布尔值(或System.Boolean)是1字节宽内装式结构,可以自动装箱,所以你有一个对象(需要两个记忆单词代表,至少,即8/16字节分别在32/64位环境)与一个字段(至少一个字节)+ 1记忆单词指向它,即总共至少13/25字节。

That's indeed the 1st Google entry on "C# primitive types". http://msdn.microsoft.com/en-us/library/ms228360(VS.80).aspx

这确实是关于“c#原始类型”的第一个谷歌条目。http://msdn.microsoft.com/en-us/library/ms228360(VS.80). aspx

Also the quoted link (http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx) also states that a boolean, by the CLI standard, takes 1 byte.


Actually, however, the only place where this is visible is on arrays of booleans - n booleans would take n bytes. In the other cases, one boolean may take 4 bytes.

实际上,唯一可以看到的地方是布尔- n布尔数组,它需要n个字节。在其他情况下,一个布尔值可能需要4个字节。

  • Inside a structure, most runtimes (also in Java) would align all fields to a 4 byte boundary for performance. The Monty JVM for embedded devices is wiser - I guess it reorders fields optimally.
    • On the local frame/operand stack for the interpreter, in most implementation, for performance, one stack entry is one memory-word wide (and maybe on .NET it must be 64-bit wide to support double and long, which on .NET uses just 1 stack entry instead of 2 in Java). A JIT compiler can instead use 1 byte for boolean locals while keeping other vars aligned by reordering fields without performance impact, if the additional overhead is worth it.
    • 对于解释器的本地框架/操作栈,在大多数实现中,对于性能来说,一个堆栈条目是一个内存字宽(可能在。net上,它必须是64位宽的,以支持double和long,而net只使用1个堆栈条目,而不是Java中的2)。如果额外的开销值得的话,JIT编译器可以对布尔局部变量使用一个字节,同时通过重新排序字段来保持其他vars的对齐,而不会影响性能。
  • 在一个结构中,大多数运行时(也用Java)将所有字段对齐到一个4字节的性能边界。用于嵌入式设备的Monty JVM更明智——我猜它会以最佳方式重新排序字段。在解释器的本地框架/操作数堆栈中,在大多数实现中,为了性能,一个堆栈条目是一个内存字宽(在。net中,它必须是64位宽,才能支持双长,在。net中,它只使用一个堆栈条目,而Java中只有两个)。如果额外的开销值得的话,JIT编译器可以对布尔局部变量使用一个字节,同时通过重新排序字段来保持其他vars的对齐,而不会影响性能。

About char


char are two bytes because when support for internationalization is required, using two-byte characters internally is the safest bet. This is not related directly to choosing to support Unicode, but to the choice to stick to UTF-16 and to the Basic Multilingual Plane. In Java and C#, you can assume all the time that one logical char fits into a variable of type char.




That's because in a 32-bit environment, the CPU can handle 32-bit values quicker than 8-bit or 16-bit values, so this is a speed/size tradeoff. If you have to save memory and you have a large quantity of bools, just use uints and save your booleans as the bits of 4 byte uints. Chars are 2 bytes wide since they store 16-bit Unicode characters.




Regardless of the minor difference in memory storage, using Boolean for true/false yes/no values is important for developers (including yourself, when you have to revisit the code a year later), because it more accurately reflects your intent. Making your code more understandable is much more important than saving two bytes.

不管内存存储方面的细微差别,对于开发人员(包括您自己,在一年后必须重新访问代码时)来说,使用布尔值表示true/false yes/no值很重要,因为它更准确地反映了您的意图。使代码更容易理解比保存两个字节要重要得多。

Making your code more accurately reflect your intent also reduces the likelihood that some compiler optimisation will have a negative effect. This advice transcends platforms and compilers.




You should also use boolean to help write maintanable code. If I'm glancing at code seeing that something is a boolean is more then worth the memory savings to figure out that your using char as booleans.




I found this: "Actually, a Boolean is 4 bytes, not 2. The reason is that that's what the CLR supports for Boolean. I think that's what it does because 32 bit values are much more efficient to manipulate, so the time/space tradeoff is, in general, worth it. You should use the bit vector class (forget where it is) if you need to jam a bunch of bits together..."


It's written by Paul Wick at http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx

这是Paul Wick在http://geekswithblogs.net/cwilliams/archive/2005/09/18/54271.aspx上写的



First of all you should use a profiler to determine where do you have memory problem, IMHO.




Memory is only a concern if you have a large array of bits, in which case you can use the System.Collections.BitArray class.




Its because Windows and .Net have used Unicode (UTF 16) since inception as their internal character set. UTF 16 uses 2 bytes per character or a pair of 2 byte words per character but only if required as it is a variable width encoding.

这是因为Windows和。net从一开始就使用Unicode (UTF 16)作为它们的内部字符集。

"For characters in the Basic Multilingual Plane (BMP) the resulting encoding is a single 16-bit word. For characters in the other planes, the encoding will result in a pair of 16-bit words"


My guess regarding booleans would be they are four bytes as the default register is 32 bits and this would be the minimum size .Net could do a logical operation on efficiently, unless using bitwise operations.
