C：结构或数组的速度更快？

I want to implement (what represents abstractly) a two dimensional 4x4 matrix. All the code I write for matrix multiplication et cetera will be entirely "unrolled" as it were -- that is to say, I will not be using loops to access and write data entries in the matrix.

我想实现(抽象地表示)二维4x4矩阵。我为矩阵乘法等编写的所有代码都将完全“展开” - 也就是说,我不会使用循环来访问和写入矩阵中的数据条目。

My question is: In C, would it be faster to use a struct as such:

我的问题是:在C中,使用结构本身会更快:

typedef struct {
    double e0, e1, e2, e3, e4, ..., e15
} My4x4Matrix;

Or would this be faster:

或者这会更快:

typedef double My4x4Matrix[16];

Given that I will be accessing each matrix element individually as such:

鉴于我将单独访问每个矩阵元素:

My4x4Matrix a,b,c;
// (Some initialization of a and b.)
...
c.e0=a.e0+b.e0;
c.e1=a.e1+b.e1;
...

My4x4Matrix a,b,c;
// (Some initialization of a and b.)
...
c[0]=a[0]+b[0];
c[1]=a[1]+b[1];
...

Or are they exactly the same speed?

或者它们的速度完全相同?

4 个解决方案

#1

Any decent compiler will generate the exact same code, byte-for-byte. However, using arrays allows you a lot more flexibility; when accessing the matrix elements, you can choose whether you want to access fixed locations or address positions with variables.

任何体面的编译器都会生成完全相同的代码,逐字节。但是,使用数组可以让您获得更大的灵活性;访问矩阵元素时,您可以选择是要访问固定位置还是使用变量寻址位置。

I also highly question your choice to "unwind" (unroll?) all the operations by hand. Any good compiler can fully unroll loops with a constant number of iterations for you, and can perhaps even generate SIMD code and/or optimally schedule the order of instructions. You'll have a hard time doing better by hand, and you'll end up with code that's hideous to read. The fact that you asked this question suggests to me that you're probably not sufficiently experienced to do better than even a naive optimizing compiler.

我也高度质疑你选择“放松”(展开?)所有操作。任何好的编译器都可以为您完全展开具有恒定迭代次数的循环,甚至可以生成SIMD代码和/或最佳地调度指令的顺序。你会很难用手做得更好,而你最终会得到一些可怕的代码。您提出这个问题的事实告诉我,您可能没有足够的经验来做得比一个天真的优化编译器更好。

#2

Struct elements (fields) can only be accessed by their names explicitly specified in the program's source, which means that every time you access a field the actual field must be selected and hardcoded at compile time. If you wanted to implement the same thing with arrays, that would mean that you would use explicit constant compile-time array indices (as in your example). In this case the performance of the two will be exactly the same and the code generated will be exactly the same (excluding from consideration "malicious" compilers).

结构元素(字段)只能通过在程序源中明确指定的名称来访问,这意味着每次访问字段时,必须在编译时选择实际字段并进行硬编码。如果你想用数组实现相同的东西,那就意味着你将使用显式的常量编译时数组索引(如你的例子中所示)。在这种情况下,两者的性能将完全相同,生成的代码将完全相同(不包括考虑“恶意”编译器)。

However, note that arrays provide you with an extra degree of freedom: if necessary, you can select array elements by a run-time index. This is something that's not possible with structs. Only you know whether it matters to you.

但是,请注意,数组为您提供了额外的*度:如有必要,您可以通过运行时索引选择数组元素。这是结构不可能实现的。只有你知道这对你是否重要。

On the other hand, note also that arrays in C are not copyable, which means that you'll be forced to use memcpy to copy your array-based My4x4Matrix. With struct-based version normal language-level copying will work. With arrays this issue can be worked around by wrapping the actual array in a struct.

另一方面,请注意C中的数组不可复制,这意味着您将*使用memcpy复制基于阵列的My4x4Matrix。使用基于结构的版本,正常的语言级复制将起作用。对于数组,可以通过将实际数组包装在结构中来解决此问题。

#3

I guess both are the same speed. The difference between a struct and an array is just its meaning (in human terms.) Both will be compiled as memory addresses.

我想两者都是一样的速度。结构和数组之间的区别只是它的含义(用人的话来说。)两者都将被编译为内存地址。

#4

I would say the best way is to create a test to try it yourself. Results may vary based on system environments and compilers.

我想说最好的方法是创建一个测试来自己尝试。结果可能因系统环境和编译器而异。

#1

#2