C中的数组数据类型

时间:2022-06-20 16:30:22

By reading some details about pointers and arrays in C I got a little confused. On the one hand, the array can be seen as a data type. On the other hand, the array tends to be an unmodifiable lvalue. I imagine that the compiler will do something like replacing the array's identifier with a constant address and an expression for calculating the position given by the index at runtime.

通过阅读一些关于C中的指针和数组的细节,我有点困惑。一方面,数组可以被看作是数据类型。另一方面,数组往往是不可修改的lvalue。我猜想编译器会做一些事情,比如用常量地址和表达式替换数组的标识符,以便在运行时计算索引给出的位置。

myArray[3] -(compiler)-> AE8349F + 3 * sizeof(<type>)

When saying that an array is a data type, what does this exactly mean? I hope you can help me to clarify my confused understanding of what an array really is and how it is treated by the compiler.

当说数组是数据类型时,这到底意味着什么?我希望您能帮助我澄清我对数组到底是什么以及编译器如何处理数组的困惑理解。

2 个解决方案

#1


16  

When speaking about that an array is a data type, what does this exactly mean?

当说数组是数据类型时,这到底意味着什么?

A data type is a set of data with values having predefined characteristics. Examples of data types are: integer, floating point unit number, character, string, and pointer

数据类型是一组具有预定义特征值的数据。数据类型的例子有:整数、浮点数、字符、字符串和指针

An array is a group of memory locations related by the fact that they all have the same name and the same type.

数组是一组内存位置,它们都具有相同的名称和类型。


If you are wondering why array is not modifiable then best explanation I have ever read is;

如果你想知道为什么数组不可修改,那么我读过的最好的解释是;

C didn't spring fully formed from the mind of Dennis Ritchie; it was derived from an earlier language known as B (which was derived from BCPL).1 B was a "typeless" language; it didn't have different types for integers, floats, text, records, etc. Instead, everything was simply a fixed length word or "cell" (essentially an unsigned integer). Memory was treated as a linear array of cells. When you allocated an array in B, such as

C并没有完全从丹尼斯·里奇的头脑中形成;它源自一种较早的语言B(源自BCPL)。1 B是一种“无类型”语言;对于整数、浮点数、文本、记录等,它没有不同的类型。相反,一切都只是一个固定长度的单词或“cell”(本质上是一个无符号整数)。记忆被视为细胞的线性阵列。在B中分配数组时,例如

auto V[10];

the compiler allocated 11 cells; 10 contiguous cells for the array itself, plus a cell that was bound to V containing the location of the first cell:

编译器分配了11个单元格;数组本身的10个相邻单元格,加上一个与V绑定的单元格,其中包含第一个单元格的位置:

    +----+
V:  |    | -----+
    +----+      |
     ...        |
    +----+      |
    |    | <----+
    +----+
    |    |
    +----+
    |    |      
    +----+
    |    |
    +----+
     ...

When Ritchie was adding struct types to C, he realized that this arrangement was causing him some problems. For example, he wanted to create a struct type to represent an entry in a file or directory table:

当Ritchie向C添加结构类型时,他意识到这种安排给他带来了一些问题。例如,他想创建一个struct类型来表示文件或目录表中的条目:

struct {
  int inumber;
  char name[14];
};

He wanted the structure to not just describe the entry in an abstract manner, but also to represent the bits in the actual file table entry, which didn't have an extra cell or word to store the location of the first element in the array. So he got rid of it - instead of setting aside a separate location to store the address of the first element, he wrote C such that the address of the first element would be computed when the array expression was evaluated.

他希望这个结构不仅能以抽象的方式描述条目,而且还能表示实际文件表条目中的位,该条目没有额外的单元格或单词来存储数组中第一个元素的位置。所以他删掉了它——他没有留出一个单独的位置来存储第一个元素的地址,而是写了C,这样第一个元素的地址就会在计算数组表达式时被计算出来。

This is why you can't do something like

这就是为什么你不能做类似的事情。

int a[N], b[N];
a = b;

because both a and b evaluate to pointer values in that context; it's equivalent to writing 3 = 4. There's nothing in memory that actually stores the address of the first element in the array; the compiler simply computes it during the translation phase.

因为a和b都对该上下文中的指针值求值;等于3 = 4。内存中没有存储数组中第一个元素的地址的东西;编译器只是在翻译阶段计算它。


1. This is all taken from the paper The Development of the C Language


For more detail you may like to read this answer.

想要了解更多细节,请阅读以下答案。


EDIT: For more clarity; Difference between modifiable l-value, non-modifiable l-value & r-value (in short);

编辑:更清晰;可修改的l值、不可修改的l值与r值之间的差异(简而言之);

The difference among these kinds of expressions is this:

这些表达的不同之处在于:

  • A modifiable l-value is addressable (can be the operand of unary &) and assignable (can be the left operand of =).
  • 可修改的l值是可寻址的(可以是一元&的操作数)和可赋值的(可以是=的左操作数)。
  • A non-modifiable l-value is addressable, but not assignable.
  • 不可修改的l值是可寻址的,但不可赋值。
  • An r-value is neither addressable nor assignable.
  • r值既不能寻址也不能赋值。

#2


0  

An array is a contiguous block of memory. This means it's laid out in memory sequentially. Let's say we define an array like:

数组是一个连续的内存块。这意味着它在内存中按顺序排列。假设我们定义一个数组,比如:

int x[4];

Where sizeof(int) == 32 bits.

其中sizeof(int) = 32位。

This will be laid out in memory like this (picking an arbitrary starting address, let's say 0x00000001)

这将在内存中这样布置(选择一个任意的起始地址,比如0x00000001)

0x00000001 - 0x00000004
[element 0]
0x00000005 - 0x00000008
[element 1]
0x00000009 - 0x0000000C
[element 2]
0x0000000D - 0x00000010
[element 3]

You're correct that the compiler replaces the identifier. Remember (if you've learned this. If not, then you're learning something new!) that an array is essentially a pointer. In C/C++, the array name is a pointer to the first element of the array (or a pointer pointing to address 0x00000001 in our example). By doing this:

编译器替换标识符是正确的。记住(如果你学过这个的话)。如果不是,那么你正在学习新的东西)一个数组本质上是一个指针。在C/ c++中,数组名是指向数组第一个元素的指针(或者在我们的示例中指向地址0x00000001的指针)。通过这样做:

std::cout << x[2];

You're telling the compiler to add 2 to that memory address, which is pointer arithmetic. Let's say instead you use a variable to index:

你告诉编译器将2添加到那个内存地址,这是指针算法。假设你用一个变量来索引:

int i = 2;
std::cout << x[i];

The compiler sees this:

编译器认为:

int i = 2;
std::cout << x + (i * sizeof(int));

It basically multiplies the size of the datatype by the given index and adds that to base address of the array. The compiler basically takes the index-of operator [] and converts it to addition with a pointer.

它基本上将数据类型的大小乘以给定的索引,并将其添加到数组的基本地址。编译器基本上接受index-of运算符[],然后用指针将其转换为加法。

If you really want to spin your head around this, consider this code:

如果你真的想绕着它转,考虑一下下面的代码:

std::cout << 2[x];

This is completely valid. If you can figure out why, then you've got the concept down.

这是完全有效的。如果你能找出原因,那么你就把概念写下来了。

#1


16  

When speaking about that an array is a data type, what does this exactly mean?

当说数组是数据类型时,这到底意味着什么?

A data type is a set of data with values having predefined characteristics. Examples of data types are: integer, floating point unit number, character, string, and pointer

数据类型是一组具有预定义特征值的数据。数据类型的例子有:整数、浮点数、字符、字符串和指针

An array is a group of memory locations related by the fact that they all have the same name and the same type.

数组是一组内存位置,它们都具有相同的名称和类型。


If you are wondering why array is not modifiable then best explanation I have ever read is;

如果你想知道为什么数组不可修改,那么我读过的最好的解释是;

C didn't spring fully formed from the mind of Dennis Ritchie; it was derived from an earlier language known as B (which was derived from BCPL).1 B was a "typeless" language; it didn't have different types for integers, floats, text, records, etc. Instead, everything was simply a fixed length word or "cell" (essentially an unsigned integer). Memory was treated as a linear array of cells. When you allocated an array in B, such as

C并没有完全从丹尼斯·里奇的头脑中形成;它源自一种较早的语言B(源自BCPL)。1 B是一种“无类型”语言;对于整数、浮点数、文本、记录等,它没有不同的类型。相反,一切都只是一个固定长度的单词或“cell”(本质上是一个无符号整数)。记忆被视为细胞的线性阵列。在B中分配数组时,例如

auto V[10];

the compiler allocated 11 cells; 10 contiguous cells for the array itself, plus a cell that was bound to V containing the location of the first cell:

编译器分配了11个单元格;数组本身的10个相邻单元格,加上一个与V绑定的单元格,其中包含第一个单元格的位置:

    +----+
V:  |    | -----+
    +----+      |
     ...        |
    +----+      |
    |    | <----+
    +----+
    |    |
    +----+
    |    |      
    +----+
    |    |
    +----+
     ...

When Ritchie was adding struct types to C, he realized that this arrangement was causing him some problems. For example, he wanted to create a struct type to represent an entry in a file or directory table:

当Ritchie向C添加结构类型时,他意识到这种安排给他带来了一些问题。例如,他想创建一个struct类型来表示文件或目录表中的条目:

struct {
  int inumber;
  char name[14];
};

He wanted the structure to not just describe the entry in an abstract manner, but also to represent the bits in the actual file table entry, which didn't have an extra cell or word to store the location of the first element in the array. So he got rid of it - instead of setting aside a separate location to store the address of the first element, he wrote C such that the address of the first element would be computed when the array expression was evaluated.

他希望这个结构不仅能以抽象的方式描述条目,而且还能表示实际文件表条目中的位,该条目没有额外的单元格或单词来存储数组中第一个元素的位置。所以他删掉了它——他没有留出一个单独的位置来存储第一个元素的地址,而是写了C,这样第一个元素的地址就会在计算数组表达式时被计算出来。

This is why you can't do something like

这就是为什么你不能做类似的事情。

int a[N], b[N];
a = b;

because both a and b evaluate to pointer values in that context; it's equivalent to writing 3 = 4. There's nothing in memory that actually stores the address of the first element in the array; the compiler simply computes it during the translation phase.

因为a和b都对该上下文中的指针值求值;等于3 = 4。内存中没有存储数组中第一个元素的地址的东西;编译器只是在翻译阶段计算它。


1. This is all taken from the paper The Development of the C Language


For more detail you may like to read this answer.

想要了解更多细节,请阅读以下答案。


EDIT: For more clarity; Difference between modifiable l-value, non-modifiable l-value & r-value (in short);

编辑:更清晰;可修改的l值、不可修改的l值与r值之间的差异(简而言之);

The difference among these kinds of expressions is this:

这些表达的不同之处在于:

  • A modifiable l-value is addressable (can be the operand of unary &) and assignable (can be the left operand of =).
  • 可修改的l值是可寻址的(可以是一元&的操作数)和可赋值的(可以是=的左操作数)。
  • A non-modifiable l-value is addressable, but not assignable.
  • 不可修改的l值是可寻址的,但不可赋值。
  • An r-value is neither addressable nor assignable.
  • r值既不能寻址也不能赋值。

#2


0  

An array is a contiguous block of memory. This means it's laid out in memory sequentially. Let's say we define an array like:

数组是一个连续的内存块。这意味着它在内存中按顺序排列。假设我们定义一个数组,比如:

int x[4];

Where sizeof(int) == 32 bits.

其中sizeof(int) = 32位。

This will be laid out in memory like this (picking an arbitrary starting address, let's say 0x00000001)

这将在内存中这样布置(选择一个任意的起始地址,比如0x00000001)

0x00000001 - 0x00000004
[element 0]
0x00000005 - 0x00000008
[element 1]
0x00000009 - 0x0000000C
[element 2]
0x0000000D - 0x00000010
[element 3]

You're correct that the compiler replaces the identifier. Remember (if you've learned this. If not, then you're learning something new!) that an array is essentially a pointer. In C/C++, the array name is a pointer to the first element of the array (or a pointer pointing to address 0x00000001 in our example). By doing this:

编译器替换标识符是正确的。记住(如果你学过这个的话)。如果不是,那么你正在学习新的东西)一个数组本质上是一个指针。在C/ c++中,数组名是指向数组第一个元素的指针(或者在我们的示例中指向地址0x00000001的指针)。通过这样做:

std::cout << x[2];

You're telling the compiler to add 2 to that memory address, which is pointer arithmetic. Let's say instead you use a variable to index:

你告诉编译器将2添加到那个内存地址,这是指针算法。假设你用一个变量来索引:

int i = 2;
std::cout << x[i];

The compiler sees this:

编译器认为:

int i = 2;
std::cout << x + (i * sizeof(int));

It basically multiplies the size of the datatype by the given index and adds that to base address of the array. The compiler basically takes the index-of operator [] and converts it to addition with a pointer.

它基本上将数据类型的大小乘以给定的索引,并将其添加到数组的基本地址。编译器基本上接受index-of运算符[],然后用指针将其转换为加法。

If you really want to spin your head around this, consider this code:

如果你真的想绕着它转,考虑一下下面的代码:

std::cout << 2[x];

This is completely valid. If you can figure out why, then you've got the concept down.

这是完全有效的。如果你能找出原因,那么你就把概念写下来了。