是否保证T[x][y]在C中具有与T[x*y]相同的内存布局?

时间:2021-10-28 19:32:27

So far thought it is, but after I learned that the compiler may pad data to align it for architecture requirements for example I'm in doubt. So I wonder if a char[4][3] has the same memory layout as char[12]. Can the compiler put padding after the char[3] part to make it aligned so the whole array takes actually 16 bytes?

到目前为止,我认为是这样的,但在我了解到编译器可能会填充数据以使它与架构需求相一致时,我就有疑问了。因此,我想知道char[4][3]是否具有与char[12]相同的内存布局。编译器是否可以在char[3]部分后添加填充,使其对齐,这样整个数组就需要16个字节?

The background story that a function of a library takes a bunch of fixed length strings in a char* parameter so it expects a continuous buffer without paddig, and the string length can be odd. So I thought I declare a char[N_STRINGS][STRING_LENGTH] array, then conveniently populate it and pass it to the function by casting it to char*. So far it seems to work. But I'm not sure if this solution is portable.

一个库函数在char*参数中使用一串固定长度字符串的背景故事,因此它期望一个没有paddig的连续缓冲区,并且字符串长度可能是奇数。因此,我认为我声明了一个char[n_string][STRING_LENGTH]数组,然后方便地填充它并将其转换为char*。到目前为止,它似乎起作用了。但是我不确定这个解决方案是否可以移植。

4 个解决方案

#1


10  

An array of M elements of type A has all its elements in contiguous positions in memory, without padding bytes at all. This fact is not depending on the nature of A.

A类型的M元素数组的所有元素都位于内存中相邻的位置,没有填充字节。这一事实并不取决于A的性质。

Now, if A is the type "array of N elements having type T", then each element in the T-type array will have, again, N contiguous positions in memory. All these blocks of N objects of type T are, also, stored in contiguous positions.

现在,如果A是“具有T类型的N个元素的数组”,那么T型数组中的每个元素都会在内存中再次出现N个连续的位置。所有这些T类型的N个对象的块也存储在相邻的位置。

So, the result, is the existence in memory of M*N elements of type T, stored in contiguous positions.

因此,结果是,存在于内存中M*N类型T的元素,存储在相邻的位置。

The element [i][j] of the array is stored in the position i*N+j.

数组中的元素[i][j]存储在i*N+j的位置。

#2


8  

Let's consider

让我们考虑一下

T array[size]; 
array[0]; // 1

1 is formally defined as:

1被正式定义为:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

下标运算符[]的定义是E1[E2]与(*(E1)+(E2))相同

per §6.5.2.1, clause 2 taken from the standard C draft N1570. When applied to multi-dimensional arrays, «array whose elements are arrays», we have:

每§6.5.2.1条款2的标准C N1570草案。当应用于多维数组时,元素为数组的«数组»,我们有:

If E is an n-dimensional array (n ≥ 2) with dimensions i × j × ... × k, then E (used as other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with dimensions j × . . . × k.

如果E n维数组(n≥2)与维我j××……×k,然后E(用作除了一个左值)被转换为一个指针(n−1维数组与维j×。×k。

Therefore, given E = T array[i][j] and S = array[i][j], S is first converted to a pointer to a one-dimensional array of size j, namely T (*ptr)[j] = &array[i].

因此,给定E = T数组[i][j]和S =数组[i][j],首先将S转换为一个指向一维数组大小j的指针,即T (*ptr)[j] = &array[i]。

If the unary * operator is applied to this pointer explicitly, or implicitly as a result of subscripting, the result is the referenced (n − 1)-dimensional array, which itself is converted into a pointer if used as other than an lvalue.

如果一元*操作符应用于这个指针明确,或隐式加下标的结果,其结果是引用(n−1维数组,这本身就是转换成一个指针,如果作为一个左值。

and this rule applies recursively. We may conclude that, in order to do so, the n-dimensional array must be allocated contiguously.

这个规则递归地应用。我们可以得出结论,为了这样做,n维数组必须连续分配。

It follows from this that arrays are stored in row-major order (last subscript varies fastest).

由此可见,数组以行主顺序存储(最后一个下标变化最快)。

in terms of logical layout.

在逻辑布局方面。

Since char [12] has to be stored contiguously and so has to char [3][4], and since they have the same alignment, they should be compatible, despite they're technically different types.

由于char[12]必须连续地存储,因此必须对char[3][4]进行存储,而且由于它们具有相同的对齐方式,所以它们应该是兼容的,尽管它们在技术上是不同的类型。

#3


3  

What you're referring to as types are not types. The type T you mention in the title would be (in this case) a pointer to a char.

你所说的类型不是类型。您在标题中提到的类型(在本例中)是指向char的指针。

You're correct that when it comes to structs, alignment is a factor that can lead to padding being added, which may mean that your struct takes up more bytes than meets the eye.

您是正确的,当涉及到struct时,对齐是一个可以导致填充添加的因素,这可能意味着您的结构占用的字节比您看到的要多。

Having said that, when you allocate an array, the array will be contiguous in memory. Remember that when you index into an array, array[3] is equivalent to *(array + 3).

已经说过,当您分配一个数组时,数组将在内存中连续。记住,当索引到数组时,数组[3]等于*(数组+ 3)。

For example, the following program should print out 12:

例如,下面的程序应该打印出12:

#include <stdio.h>

int main() {
    char array[4][3];
    printf("%zu", sizeof(array));
    return 0;
}

#4


-4  

Strictly speaking a 2-D array is an array of pointers to 1-D arrays. In general you cannot assume more than that.

严格来说,二维数组是指向一维数组的指针数组。一般来说,你不能假设更多。

I would take the view that if you want a contiguous block of of any type then declare a contiguous 1D block, rather than hoping for any particular layout from the compiler or runtime.

我认为,如果您想要任何类型的连续块,那么声明一个连续的1D块,而不是希望从编译器或运行时获得任何特定的布局。

Now a compiler probably will allocate a contiguous block for a 2-D array when it knows in advance the dimensions ( i.e. they're constant at compile time ), but it's not the strict interpretation.

现在,一个编译器可能会为二维数组分配一个连续块,当它提前知道维度(即它们在编译时是常量)时,但它不是严格的解释。

Remember int main( int argc, char **argv ) ;

记住int main(int argc, char **argv);

That char **argv is an array of pointers to char pointers.

这个char **argv是一个指向char指针的指针数组。

In more general programming you can e.g. malloc() each row in a 2D array separately and swapping row is as simple as swapping the values to those pointers. For example :

在更一般的编程中,你可以像malloc()那样,在一个2D数组中分别进行每一行,交换行就像将值交换到这些指针一样简单。例如:

char **array = NULL ;

array = malloc( 2 * sizeof( char * ) ) ;

array[0] = malloc( 24 ) ;

array[1] = malloc( 11 ) ;

strcpy( array[0], "first" ) ;
strcpy( array[1], "second" ) ;

printf( "%s\n%s\n", array[0], array[1] ) ;

/* swap the rows */

char *t = array[0] ;
array[0] = array[1] ;
array[1] = t ;

printf( "%s\n%s\n", array[0], array[1] ) ;

free( array[0] ) ;
free( array[1] ) ;
free( array ) ;

#1


10  

An array of M elements of type A has all its elements in contiguous positions in memory, without padding bytes at all. This fact is not depending on the nature of A.

A类型的M元素数组的所有元素都位于内存中相邻的位置,没有填充字节。这一事实并不取决于A的性质。

Now, if A is the type "array of N elements having type T", then each element in the T-type array will have, again, N contiguous positions in memory. All these blocks of N objects of type T are, also, stored in contiguous positions.

现在,如果A是“具有T类型的N个元素的数组”,那么T型数组中的每个元素都会在内存中再次出现N个连续的位置。所有这些T类型的N个对象的块也存储在相邻的位置。

So, the result, is the existence in memory of M*N elements of type T, stored in contiguous positions.

因此,结果是,存在于内存中M*N类型T的元素,存储在相邻的位置。

The element [i][j] of the array is stored in the position i*N+j.

数组中的元素[i][j]存储在i*N+j的位置。

#2


8  

Let's consider

让我们考虑一下

T array[size]; 
array[0]; // 1

1 is formally defined as:

1被正式定义为:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

下标运算符[]的定义是E1[E2]与(*(E1)+(E2))相同

per §6.5.2.1, clause 2 taken from the standard C draft N1570. When applied to multi-dimensional arrays, «array whose elements are arrays», we have:

每§6.5.2.1条款2的标准C N1570草案。当应用于多维数组时,元素为数组的«数组»,我们有:

If E is an n-dimensional array (n ≥ 2) with dimensions i × j × ... × k, then E (used as other than an lvalue) is converted to a pointer to an (n − 1)-dimensional array with dimensions j × . . . × k.

如果E n维数组(n≥2)与维我j××……×k,然后E(用作除了一个左值)被转换为一个指针(n−1维数组与维j×。×k。

Therefore, given E = T array[i][j] and S = array[i][j], S is first converted to a pointer to a one-dimensional array of size j, namely T (*ptr)[j] = &array[i].

因此,给定E = T数组[i][j]和S =数组[i][j],首先将S转换为一个指向一维数组大小j的指针,即T (*ptr)[j] = &array[i]。

If the unary * operator is applied to this pointer explicitly, or implicitly as a result of subscripting, the result is the referenced (n − 1)-dimensional array, which itself is converted into a pointer if used as other than an lvalue.

如果一元*操作符应用于这个指针明确,或隐式加下标的结果,其结果是引用(n−1维数组,这本身就是转换成一个指针,如果作为一个左值。

and this rule applies recursively. We may conclude that, in order to do so, the n-dimensional array must be allocated contiguously.

这个规则递归地应用。我们可以得出结论,为了这样做,n维数组必须连续分配。

It follows from this that arrays are stored in row-major order (last subscript varies fastest).

由此可见,数组以行主顺序存储(最后一个下标变化最快)。

in terms of logical layout.

在逻辑布局方面。

Since char [12] has to be stored contiguously and so has to char [3][4], and since they have the same alignment, they should be compatible, despite they're technically different types.

由于char[12]必须连续地存储,因此必须对char[3][4]进行存储,而且由于它们具有相同的对齐方式,所以它们应该是兼容的,尽管它们在技术上是不同的类型。

#3


3  

What you're referring to as types are not types. The type T you mention in the title would be (in this case) a pointer to a char.

你所说的类型不是类型。您在标题中提到的类型(在本例中)是指向char的指针。

You're correct that when it comes to structs, alignment is a factor that can lead to padding being added, which may mean that your struct takes up more bytes than meets the eye.

您是正确的,当涉及到struct时,对齐是一个可以导致填充添加的因素,这可能意味着您的结构占用的字节比您看到的要多。

Having said that, when you allocate an array, the array will be contiguous in memory. Remember that when you index into an array, array[3] is equivalent to *(array + 3).

已经说过,当您分配一个数组时,数组将在内存中连续。记住,当索引到数组时,数组[3]等于*(数组+ 3)。

For example, the following program should print out 12:

例如,下面的程序应该打印出12:

#include <stdio.h>

int main() {
    char array[4][3];
    printf("%zu", sizeof(array));
    return 0;
}

#4


-4  

Strictly speaking a 2-D array is an array of pointers to 1-D arrays. In general you cannot assume more than that.

严格来说,二维数组是指向一维数组的指针数组。一般来说,你不能假设更多。

I would take the view that if you want a contiguous block of of any type then declare a contiguous 1D block, rather than hoping for any particular layout from the compiler or runtime.

我认为,如果您想要任何类型的连续块,那么声明一个连续的1D块,而不是希望从编译器或运行时获得任何特定的布局。

Now a compiler probably will allocate a contiguous block for a 2-D array when it knows in advance the dimensions ( i.e. they're constant at compile time ), but it's not the strict interpretation.

现在,一个编译器可能会为二维数组分配一个连续块,当它提前知道维度(即它们在编译时是常量)时,但它不是严格的解释。

Remember int main( int argc, char **argv ) ;

记住int main(int argc, char **argv);

That char **argv is an array of pointers to char pointers.

这个char **argv是一个指向char指针的指针数组。

In more general programming you can e.g. malloc() each row in a 2D array separately and swapping row is as simple as swapping the values to those pointers. For example :

在更一般的编程中,你可以像malloc()那样,在一个2D数组中分别进行每一行,交换行就像将值交换到这些指针一样简单。例如:

char **array = NULL ;

array = malloc( 2 * sizeof( char * ) ) ;

array[0] = malloc( 24 ) ;

array[1] = malloc( 11 ) ;

strcpy( array[0], "first" ) ;
strcpy( array[1], "second" ) ;

printf( "%s\n%s\n", array[0], array[1] ) ;

/* swap the rows */

char *t = array[0] ;
array[0] = array[1] ;
array[1] = t ;

printf( "%s\n%s\n", array[0], array[1] ) ;

free( array[0] ) ;
free( array[1] ) ;
free( array ) ;