是UB访问一个二维数组行结束的元素吗？

Is the behavior of the following program undefined?

以下程序的行为是否未定义?

#include <stdio.h>

int main(void)
{
    int arr[2][3] = { { 1, 2, 3 },
                      { 4, 5, 6 }
    };

    int *ptr1 = &arr[0][0];      // pointer to first elem of { 1, 2, 3 }
    int *ptr3 = ptr1 + 2;        // pointer to last elem of { 1, 2, 3 }
    int *ptr3_plus_1 = ptr3 + 1; // pointer to one past last elem of { 1, 2, 3 }
    int *ptr4 = &arr[1][0];      // pointer to first elem of { 4, 5, 6 }
//    int *ptr_3_plus_2 = ptr3 + 2; // this is not legal

    /* It is legal to compare ptr3_plus_1 and ptr4 */
    if (ptr3_plus_1 == ptr4) {
        puts("ptr3_plus_1 == ptr4");

        /* ptr3_plus_1 is a valid address, but is it legal to dereference it? */
        printf("*ptr3_plus_1 = %d\n", *ptr3_plus_1);
    } else {
        puts("ptr3_plus_1 != ptr4");
    }

    return 0;
}

According to §6.5.6 ¶8:

根据§6.5.6¶8:

Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object.... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

此外,如果表达式P指向数组对象的最后一个元素,则表达式(P)+1指向一个超过数组对象的最后一个元素....如果指针操作数和结果都指向元素相同的数组对象,或者一个超过数组对象的最后一个元素,评估不应产生溢出;否则,行为未定义。如果结果指向数组对象的最后一个元素之后,则不应将其用作已计算的一元*运算符的操作数。

From this, it would appear that the behavior of the above program is undefined; ptr3_plus_1 points to an address one past the end of the array object from which it is derived, and dereferencing this address causes undefined behavior.

由此看来,上述程序的行为似乎未定义; ptr3_plus_1指向一个地址,该地址超出派生它的数组对象的末尾,并且取消引用此地址会导致未定义的行为。

Further, Annex J.2 suggests that this is undefined behavior:

此外,附件J.2表明这是未定义的行为:

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

数组下标超出范围,即使一个对象显然可以使用给定的下标访问(如左边的表达式[1] [7],给出声明int a [4] [5])(6.5.6)。

There is some discussion of this issue in the Stack Overflow question, One-dimensional access to a multidimensional array: well-defined C?. The consensus here appears to be that this kind of access to arbitrary elements of a two-dimensional array through one-dimensional subscripts is indeed undefined behavior.

在Stack Overflow问题中有一些关于这个问题的讨论,一维访问多维数组:定义良好的C?。这里的共识似乎是通过一维下标对二维数组的任意元素的这种访问确实是未定义的行为。

The issue, as I see it, is that it is not even legal to form the address of the pointer ptr3_plus_2, so it is not legal to access arbitrary two-dimensional array elements in this way. But, it is legal to form the address of the pointer ptr3_plus_1 using this pointer arithmetic. Further, it is legal to compare the two pointers ptr3_plus_1 and ptr4, according to §6.5.9 ¶6:

我认为问题在于,形成指针ptr3_plus_2的地址甚至不合法,因此以这种方式访问任意二维数组元素是不合法的。但是,使用此指针算法形成指针ptr3_plus_1的地址是合法的。此外,根据§6.5.9¶6比较两个指针ptr3_plus_1和ptr4是合法的:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.

两个指针比较相等,当且仅当两个都是空指针时,两者都是指向同一对象的指针(包括指向对象的指针和在其开头的子对象)或函数,两者都是指向同一数组的最后一个元素之后的指针对象,或者一个是指向一个数组对象末尾的指针,另一个是指向不同数组对象的开头的指针,该数组对象恰好跟随地址空间中的第一个数组对象。

So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address (the object pointed to by ptr4 must be adjacent in memory to the object pointed to by ptr3 anyway, since array storage must be contiguous), it would seem that *ptr3_plus_1 is as valid as *ptr4.

所以,如果ptr3_plus_1和ptr4都是有效的指针,它们必须比较相同且必须指向同一个地址(ptr4指向的对象必须在内存中与ptr3指向的对象相邻,因为数组存储必须是连续的) ,似乎* ptr3_plus_1与* ptr4一样有效。

Is this undefined behavior, as described in §6.5.6 ¶8 and Annex J.2, or is this an exceptional case?

这是不确定的行为,如§6.5.6¶8和附件J.2中所述,还是这是一个例外情况?

To Clarify

It seems unambiguous that it is undefined behavior to attempt to access the element one past the end of the final row of a two-dimensional array. My interest is in the question of whether it is legal to access the first element of the intermediate rows by forming a new pointer using a pointer to an element from the previous row and pointer arithmetic. It seems to me that a different example in Annex J.2 could have made this more clear.

似乎毫无疑问的是,尝试在二维数组的最后一行的末尾之后访问元素是未定义的行为。我感兴趣的是通过使用指向前一行中的元素的指针和指针算法形成一个新指针来访问中间行的第一个元素是否合法。在我看来,附件J.2中的另一个例子可以使这一点更清楚。

Is it possible to reconcile the clear statement in §6.5.6 ¶8 that an attempted dereference of a pointer to the location one past the end of an array leads to undefined behavior with the idea that the pointer past the end of the first row of a two-dimensional array of type T[][] is also a pointer of type T * that points to an object of type T, namely the first element of an array of type T[]?

是否有可能协调§6.5.6¶8中的clear语句,试图取消引用一个指向一个数组末尾的位置的指针会导致未定义的行为,并指出指针经过第一行的末尾类型为T [] []的二维数组也是类型为T *的指针,指向类型为T的对象,即类型为T []的数组的第一个元素?

2 个解决方案

#1

So, if it both ptr3_plus_1 and ptr4 are valid pointers that compare equal and that must point to the same address

因此,如果ptr3_plus_1和ptr4都是有效的指针,那么它们必须指向相同的地址并且必须指向相同的地址

They are.

it would seem that *ptr3_plus_1 is as valid as *ptr4.

似乎* ptr3_plus_1与* ptr4一样有效。

It is not.

不是这样。

The pointers are equal, but not equivalent. The trivial well-known example of the distinction between equality and equivalence is negative zero:

指针是相等的,但不相等。平等和等价之间区别的一个众所周知的例子是负零:

double a = 0.0, b = -0.0;
assert (a == b);
assert (1/a != 1/b);

Now, to be fair, there is a difference between the two, as positive and negative zero have a different representation, ptr3_plus_1 and ptr4 on typical implementations have the same representation. This is not guaranteed, and on implementations where they would have different representations, it should be clear that your code might fail.

现在,公平地说,两者之间存在差异,因为正零和负零具有不同的表示,典型实现上的ptr3_plus_1和ptr4具有相同的表示。这不保证,并且在它们具有不同表示的实现上,应该清楚您的代码可能会失败。

Even on the typical implementations, while there are good arguments to be made that the same representation implies equivalent values, to the best of my knowledge, the official interpretation is that the standard does not guarantee this, therefore programs cannot rely on it, therefore implementations can assume programs do not do this and optimise accordingly.

即使在典型的实现中,虽然有很好的论据可以说相同的表示意味着等价的值,但据我所知,官方解释是标准不能保证这一点,因此程序不能依赖它,因此实现可以假设程序不这样做并相应地进行优化。

#2

A debugging implementation might use "fat" pointers. For example, a pointer may be represented as a tuple (address, base, size) to detect out-of-bounds access. There is absolutely nothing wrong or contrary to the standard about such representation. So any pointer arithmetic that brings the pointer outside the range of [base, base+size] fails, and any dereference outside of [base, base+size) also fails.

调试实现可能使用“胖”指针。例如,指针可以表示为元组(地址,基数,大小)以检测越界访问。关于这种表述的标准绝对没有错误或相反。因此,任何使指针超出[base,base + size]范围的指针算法都会失败,并且[base,base + size]之外的任何解除引用也会失败。

Note that base and size are not the address and the size of the 2D array but rather of the array that the pointer points into (the row in this case).

请注意,base和size不是2D数组的地址和大小,而是指针指向的数组(在本例中为行)。

It might sound trivial in this case, but when deciding whether a certain pointer construction is UB or not, it is useful to mentally run your example through this hypothetical implementation.

在这种情况下听起来可能听起来微不足道,但在决定某个指针构造是否为UB时,通过这个假设的实现精神上运行您的示例是有用的。

#1