Why is a char
1
byte long in C? Why is it not 2
bytes or 4
bytes long?
为什么C中的字符长1个字节?为什么不是2个字节或4个字节长?
What is the basic logic behind it to keep it as 1
byte? I know in Java a char
is 2
bytes long. Same question for it.
将它保持为1个字节的基本逻辑是什么?我知道在Java中,char长度为2个字节。同样的问题。
6 个解决方案
#1
21
char
is 1 byte in C
because it is specified so in standards.
char在C中是1个字节,因为它是在标准中指定的。
The most probable logic is. the (binary) representation of a char
(in standard character set) can fit into 1
byte. At the time of the primary development of C
, the most commonly available standards were ASCII
and EBCDIC
which needed 7 and 8 bit encoding, respectively. So, 1
byte was sufficient to represent the whole character set.
最可能的逻辑是。 char的(二进制)表示(在标准字符集中)可以适合1个字节。在C的主要开发时,最常用的标准是ASCII和EBCDIC,它们分别需要7位和8位编码。因此,1个字节足以表示整个字符集。
OTOH, during the time Java
came into picture, the concepts of extended charcater sets and unicode
were present. So, to be future-proof and support extensibility, char
was given 2 bytes
, which is capable of handling extended character set values.
OTOH,在Java出现的时候,扩展的charcater集和unicode的概念出现了。因此,为了面向未来并支持可扩展性,char被赋予2个字节,它能够处理扩展的字符集值。
#2
5
It is because the C languange is 37 years old and there was no need to have more bytes for 1 char, as only 128 ASCII characters were used (http://en.wikipedia.org/wiki/ASCII).
这是因为C languange已有37年历史,并且不需要为1个字符添加更多字节,因为只使用了128个ASCII字符(http://en.wikipedia.org/wiki/ASCII)。
#3
5
Why would a char
hold more than 1byte? A char normally represents an ASCII character. Just have a look at an ASCII table, there are only 256 characters in the (extended) ASCII Code. So you need only to represent numbers from 0 to 255, which comes down to 8bit = 1byte.
为什么char会超过1byte? char通常表示ASCII字符。只需查看一个ASCII表,(扩展)ASCII代码中只有256个字符。因此,您只需要表示0到255之间的数字,即8bit = 1byte。
Have a look at an ASCII Table, e.g. here: http://www.asciitable.com/
看一下ASCII表,例如在这里:http://www.asciitable.com/
Thats for C. When Java was designed they anticipated that in the future it would be enough for any character (also Unicode) to be held in 16bits = 2bytes.
这就是C.当Java被设计时,他们预计将来任何字符(也是Unicode)都可以保存在16bits = 2bytes中。
#4
2
When C was developed (the first book on it was published by its developers in 1972), the two primary character encoding standards were ASCII and EBCDIC, which were 7 and 8 bit encodings for characters, respectively. And memory and disk space were both of greater concerns at the time; C was popularized on machines with a 16-bit address space, and using more than a byte for strings would have been considered wasteful.
当C开发出来时(第一本书由其开发人员于1972年发布),两个主要的字符编码标准是ASCII和EBCDIC,分别是7和8位字符编码。而且内存和磁盘空间在当时都是更受关注的问题; C在具有16位地址空间的机器上普及,并且对字符串使用多于一个字节将被认为是浪费。
By the time Java came along (mid 1990s), some with vision were able to perceive that a language could make use of an international stnadard for character encoding, and so Unicode was chosen for its definition. Memory and disk space were less of a problem by then.
当Java出现(20世纪90年代中期)时,一些有远见的人能够认识到语言可以利用国际stnadard进行字符编码,因此选择Unicode作为其定义。到那时,内存和磁盘空间不再是问题。
#5
1
The C language standard defines a virtual machine where all objects occupy an integral number of abstract storage units made up of some fixed number of bits (specified by the CHAR_BIT
macro in limits.h). Each storage unit must be uniquely addressable. A storage unit is defined as the amount of storage occupied by a single character from the basic character set1. Thus, by definition, the size of the char
type is 1.
C语言标准定义了一个虚拟机,其中所有对象占用整数个抽象存储单元,这些抽象存储单元由一些固定数量的位组成(由limits.h中的CHAR_BIT宏指定)。每个存储单元必须是唯一可寻址的。存储单元定义为基本字符集1中单个字符占用的存储量。因此,根据定义,char类型的大小为1。
Eventually, these abstract storage units have to be mapped onto physical hardware. Most common architectures use individually addressable 8-bit bytes, so char
objects usually map to a single 8-bit byte.
最终,这些抽象存储单元必须映射到物理硬件上。最常见的体系结构使用可单独寻址的8位字节,因此char对象通常映射到单个8位字节。
Usually.
Historically, native byte sizes have been anywhere from 6 to 9 bits wide. In C, the char
type must be at least 8 bits wide in order to represent all the characters in the basic character set, so to support a machine with 6-bit bytes, a compiler may have to map a char
object onto two native machine bytes, with CHAR_BIT
being 12. sizeof (char)
is still 1, so types with size N
will map to 2 * N
native bytes.
从历史上看,本机字节大小为6到9位宽。在C中,char类型必须至少为8位宽才能表示基本字符集中的所有字符,因此为了支持具有6位字节的机器,编译器可能必须将char对象映射到两个本机CHAR_BIT为12的字节.sizeof(char)仍为1,因此大小为N的类型将映射到2 * N本机字节。
1. The basic character set consists of all 26 English letters in both upper- and lowercase, 10 digits, punctuation and other graphic characters, and control characters such as newlines, tabs, form feeds, etc., all of which fit comfortably into 8 bits.
#6
0
You don't need more than a byte to represent the whole ascii table (128 characters).
您不需要多于一个字节来表示整个ascii表(128个字符)。
But there are other C types which have more room to contain data, like int type (4 bytes) or long double type (12 bytes).
但是还有其他C类型有更多空间来包含数据,如int类型(4字节)或长双类型(12字节)。
All of these contain numerical values (even chars! even if they're represented as "letters", they're "numbers", you can compare it, add it...).
所有这些都包含数值(甚至是字符!即使它们被表示为“字母”,它们也是“数字”,你可以比较它,添加它......)。
These are just different standard sizes, like cm and m for lenght, .
这些只是不同的标准尺寸,例如长度为cm和m。
#1
21
char
is 1 byte in C
because it is specified so in standards.
char在C中是1个字节,因为它是在标准中指定的。
The most probable logic is. the (binary) representation of a char
(in standard character set) can fit into 1
byte. At the time of the primary development of C
, the most commonly available standards were ASCII
and EBCDIC
which needed 7 and 8 bit encoding, respectively. So, 1
byte was sufficient to represent the whole character set.
最可能的逻辑是。 char的(二进制)表示(在标准字符集中)可以适合1个字节。在C的主要开发时,最常用的标准是ASCII和EBCDIC,它们分别需要7位和8位编码。因此,1个字节足以表示整个字符集。
OTOH, during the time Java
came into picture, the concepts of extended charcater sets and unicode
were present. So, to be future-proof and support extensibility, char
was given 2 bytes
, which is capable of handling extended character set values.
OTOH,在Java出现的时候,扩展的charcater集和unicode的概念出现了。因此,为了面向未来并支持可扩展性,char被赋予2个字节,它能够处理扩展的字符集值。
#2
5
It is because the C languange is 37 years old and there was no need to have more bytes for 1 char, as only 128 ASCII characters were used (http://en.wikipedia.org/wiki/ASCII).
这是因为C languange已有37年历史,并且不需要为1个字符添加更多字节,因为只使用了128个ASCII字符(http://en.wikipedia.org/wiki/ASCII)。
#3
5
Why would a char
hold more than 1byte? A char normally represents an ASCII character. Just have a look at an ASCII table, there are only 256 characters in the (extended) ASCII Code. So you need only to represent numbers from 0 to 255, which comes down to 8bit = 1byte.
为什么char会超过1byte? char通常表示ASCII字符。只需查看一个ASCII表,(扩展)ASCII代码中只有256个字符。因此,您只需要表示0到255之间的数字,即8bit = 1byte。
Have a look at an ASCII Table, e.g. here: http://www.asciitable.com/
看一下ASCII表,例如在这里:http://www.asciitable.com/
Thats for C. When Java was designed they anticipated that in the future it would be enough for any character (also Unicode) to be held in 16bits = 2bytes.
这就是C.当Java被设计时,他们预计将来任何字符(也是Unicode)都可以保存在16bits = 2bytes中。
#4
2
When C was developed (the first book on it was published by its developers in 1972), the two primary character encoding standards were ASCII and EBCDIC, which were 7 and 8 bit encodings for characters, respectively. And memory and disk space were both of greater concerns at the time; C was popularized on machines with a 16-bit address space, and using more than a byte for strings would have been considered wasteful.
当C开发出来时(第一本书由其开发人员于1972年发布),两个主要的字符编码标准是ASCII和EBCDIC,分别是7和8位字符编码。而且内存和磁盘空间在当时都是更受关注的问题; C在具有16位地址空间的机器上普及,并且对字符串使用多于一个字节将被认为是浪费。
By the time Java came along (mid 1990s), some with vision were able to perceive that a language could make use of an international stnadard for character encoding, and so Unicode was chosen for its definition. Memory and disk space were less of a problem by then.
当Java出现(20世纪90年代中期)时,一些有远见的人能够认识到语言可以利用国际stnadard进行字符编码,因此选择Unicode作为其定义。到那时,内存和磁盘空间不再是问题。
#5
1
The C language standard defines a virtual machine where all objects occupy an integral number of abstract storage units made up of some fixed number of bits (specified by the CHAR_BIT
macro in limits.h). Each storage unit must be uniquely addressable. A storage unit is defined as the amount of storage occupied by a single character from the basic character set1. Thus, by definition, the size of the char
type is 1.
C语言标准定义了一个虚拟机,其中所有对象占用整数个抽象存储单元,这些抽象存储单元由一些固定数量的位组成(由limits.h中的CHAR_BIT宏指定)。每个存储单元必须是唯一可寻址的。存储单元定义为基本字符集1中单个字符占用的存储量。因此,根据定义,char类型的大小为1。
Eventually, these abstract storage units have to be mapped onto physical hardware. Most common architectures use individually addressable 8-bit bytes, so char
objects usually map to a single 8-bit byte.
最终,这些抽象存储单元必须映射到物理硬件上。最常见的体系结构使用可单独寻址的8位字节,因此char对象通常映射到单个8位字节。
Usually.
Historically, native byte sizes have been anywhere from 6 to 9 bits wide. In C, the char
type must be at least 8 bits wide in order to represent all the characters in the basic character set, so to support a machine with 6-bit bytes, a compiler may have to map a char
object onto two native machine bytes, with CHAR_BIT
being 12. sizeof (char)
is still 1, so types with size N
will map to 2 * N
native bytes.
从历史上看,本机字节大小为6到9位宽。在C中,char类型必须至少为8位宽才能表示基本字符集中的所有字符,因此为了支持具有6位字节的机器,编译器可能必须将char对象映射到两个本机CHAR_BIT为12的字节.sizeof(char)仍为1,因此大小为N的类型将映射到2 * N本机字节。
1. The basic character set consists of all 26 English letters in both upper- and lowercase, 10 digits, punctuation and other graphic characters, and control characters such as newlines, tabs, form feeds, etc., all of which fit comfortably into 8 bits.
#6
0
You don't need more than a byte to represent the whole ascii table (128 characters).
您不需要多于一个字节来表示整个ascii表(128个字符)。
But there are other C types which have more room to contain data, like int type (4 bytes) or long double type (12 bytes).
但是还有其他C类型有更多空间来包含数据,如int类型(4字节)或长双类型(12字节)。
All of these contain numerical values (even chars! even if they're represented as "letters", they're "numbers", you can compare it, add it...).
所有这些都包含数值(甚至是字符!即使它们被表示为“字母”,它们也是“数字”,你可以比较它,添加它......)。
These are just different standard sizes, like cm and m for lenght, .
这些只是不同的标准尺寸,例如长度为cm和m。