为什么结构体比浮点数慢?

时间:2021-07-01 21:21:04

If I have array of structs MyStruct[]:

如果我有MyStruct的数组[]:

struct MyStruct 
{
    float x;
    float y;
}

And it's slower than if I do float[] -> x = > i; y => i + 1 (so this array is 2x bigger than with structs).

它比浮点数[]-> x = > I要慢;y => i + 1(所以这个数组比结构体大2x)。

Time difference for 10,000 items compare each other (two fors inside) : struct 500ms, array with only floats - 78ms

10000个项目的时间差比较(内部的两个fors): struct 500ms,数组只有float - 78ms。

I thought, that struct appears like eg. float, int etc (on heap).

我想,那个结构体看起来像。浮点,整数等(在堆上)。

5 个解决方案

#1


2  

Firstly structs don't necessarily appear on the heap - they can and often do appear on the stack.

首先,结构并不一定出现在堆上——它们可以而且经常出现在堆栈上。

Regarding your performance measurements, I think you must have tested it incorrectly. Using this benchmarking code I get almost the same performance results for both types:

关于您的性能度量,我认为您一定测试错误了。使用这个基准代码,我得到了几乎相同的性能结果为这两种类型:

TwoFloats[] a = new TwoFloats[10000];
float[] b = new float[20000];

void test1()
{
    int count = 0;
    for (int i = 0; i < 10000; i += 1)
    {
        if (a[i].x < 10) count++;
    }
}

void test2()
{
    int count = 0;
    for (int i = 0; i < 20000; i += 2)
    {
        if (b[i] < 10) count++;
    }
}

Results:

结果:

Method   Iterations per second
test1                 55200000
test2                 54800000

#2


1  

You are doing something seriously wrong if you get times like that. Float comparisons are dramatically fast, I clock them at 2 nanoseconds with the loop overhead. Crafting a test like this is tricky because the JIT compiler will optimize stuff away if you don't use the result of the comparison.

如果你遇到这样的情况,那你就大错特错了。浮点数的比较是非常快的,我在2纳秒内用循环的开销计时。像这样制作一个测试是很困难的,因为如果不使用比较结果,JIT编译器会优化这些东西。

The structure is slightly faster, 1.96 nanoseconds vs 2.20 nanoseconds for the float[] on my laptop. That's the way it should be, accessing the Y member of the struct doesn't cost an extra array index.

它的结构稍微快一些,1.96纳秒,而我笔记本上的浮点数是2.20纳秒。它应该是这样的,访问结构的Y成员不需要额外的数组索引。

Test code:

测试代码:

using System;
using System.Diagnostics;

class Program {
    static void Main(string[] args) {
        var test1 = new float[100000000];  // 100 million
        for (int ix = 0; ix < test1.Length; ++ix) test1[ix] = ix;
        var test2 = new Test[test1.Length / 2];
        for (int ix = 0; ix < test2.Length; ++ix) test2[ix].x = test2[ix].y = ix;
        for (int cnt = 0; cnt < 20; ++cnt) {
            var sw1 = Stopwatch.StartNew();
            bool dummy = false;
            for (int ix = 0; ix < test1.Length; ix += 2) {
                dummy ^= test1[ix] >= test1[ix + 1];
            }
            sw1.Stop();
            var sw2 = Stopwatch.StartNew();
            for (int ix = 0; ix < test2.Length; ++ix) {
                dummy ^= test2[ix].x >= test2[ix].y;
            }
            sw2.Stop();
            Console.Write("", dummy);
            Console.WriteLine("{0} {1}", sw1.ElapsedMilliseconds, sw2.ElapsedMilliseconds);
        }
        Console.ReadLine();
    }
    struct Test {
        public float x;
        public float y;
    }
}

#3


1  

I get results that seem to agree with you (and disagree with Mark). I'm curious if I've made a mistake constructing this (albeit crude) benchmark or if there is another factor at play.

我得到的结果似乎与你一致(与马克不同)。我很好奇我是否在构建这个基准(尽管很粗糙)时犯了一个错误,或者还有其他因素在起作用。

Compiled on MS C# targeting .NET 3.5 framework with VS2008. Release mode, no debugger attached.

使用VS2008在MS c#目标。net 3.5框架上编译。释放模式,没有调试器附加。

Here's my test code:

这是我的测试代码:

class Program {
    static void Main(string[] args) {
        for (int i = 0; i < 10; i++) {
            RunBench();
        }

        Console.ReadKey();
    }

    static void RunBench() {
        Stopwatch sw = new Stopwatch();

        const int numPoints = 10000;
        const int numFloats = numPoints * 2;
        int numEqs = 0;
        float[] rawFloats = new float[numFloats];
        Vec2[] vecs = new Vec2[numPoints];

        Random rnd = new Random();
        for (int i = 0; i < numPoints; i++) {
            rawFloats[i * 2] = (float) rnd.NextDouble();
            rawFloats[i * 2 + 1] = (float)rnd.NextDouble();
            vecs[i] = new Vec2() { X = rawFloats[i * 2], Y = rawFloats[i * 2 + 1] };
        }

        sw.Start();
        for (int i = 0; i < numFloats; i += 2) {
            for (int j = 0; j < numFloats; j += 2) {
                if (i != j &&
                    Math.Abs(rawFloats[i] - rawFloats[j]) < 0.0001 &&
                    Math.Abs(rawFloats[i + 1] - rawFloats[j + 1]) < 0.0001) {
                    numEqs++;
                }
            }
        }
        sw.Stop();

        Console.WriteLine(sw.ElapsedMilliseconds.ToString() + " : numEqs = " + numEqs);

        numEqs = 0;
        sw.Reset();
        sw.Start();
        for (int i = 0; i < numPoints; i++) {
            for (int j = 0; j < numPoints; j++) {
                if (i != j &&
                    Math.Abs(vecs[i].X - vecs[j].X) < 0.0001 &&
                    Math.Abs(vecs[i].Y - vecs[j].Y) < 0.0001) {
                    numEqs++;
                }
            }
        }
        sw.Stop();

        Console.WriteLine(sw.ElapsedMilliseconds.ToString() + " : numEqs = " + numEqs);
    }
}

struct Vec2 {
    public float X;
    public float Y;
}

Edit: Ah! I wasn't iterating the proper amounts. With the updated code my timings look like I expected:

编辑:啊!我没有迭代适当的量。随着更新的代码,我的计时看起来就像我期望的那样:

269 : numEqs = 8
269 : numEqs = 8
270 : numEqs = 2
269 : numEqs = 2
268 : numEqs = 4
270 : numEqs = 4
269 : numEqs = 2
268 : numEqs = 2
270 : numEqs = 6
270 : numEqs = 6
269 : numEqs = 8
268 : numEqs = 8
268 : numEqs = 4
270 : numEqs = 4
269 : numEqs = 6
269 : numEqs = 6
268 : numEqs = 2
270 : numEqs = 2
268 : numEqs = 4
270 : numEqs = 4

#4


0  

The most likely reason is that the C# runtime optimizer perform a better job when you work with floats that with full structs, probably because optimizer is mapping x and y to registers or likewise changes not done with full struct.

最有可能的原因是,当使用全结构浮点数时,c#运行时优化器会执行更好的工作,这可能是因为优化器将x和y映射到寄存器,或者同样地,对全结构的修改没有完成。

In your particular example there seems not to be any fundamental reason why it couldn't perform as good a job when you use structs (it's hard to be sure without seeing you actual benchmarking code), but it just doesn't. However it would be interesting to compare the performance of the resulting code when compiled with another C# implementations (I'm thinking of mono on Linux).

在您的特定示例中,似乎没有任何根本原因可以解释为什么在使用struct时它不能很好地执行任务(如果没有看到实际的基准代码,就很难确定),但它就是不能。但是,如果用另一个c#实现(我正在考虑Linux上的mono)来比较结果代码的性能,那将会很有趣。

I tested Ron Warholic benchmark with mono, and results are consistant with Mark's, difference between the two types of access seems to be minimal (version with floats is 1% faster). However I still should do more testing as it is not unexpected that library calls like Math.Abs take a large amount of time and it could hide a real difference.

我用mono测试了Ron Warholic基准,结果与Mark一致,两种访问之间的差异似乎很小(带浮点数的版本快1%)。但是我还是应该做更多的测试,因为像数学这样的库调用并不意外。Abs需要大量的时间,它可以隐藏真正的差异。

After removing calls to Math.Abs and just doing tests like rawFloats[i] < rawFloats[j] the structure version becomes marginally faster (about 5%) than the two arrays of floats.

在删除对数学的调用之后。Abs和仅仅做像rawfloat [i] < rawfloat [j]这样的测试,结构版本比两组浮点数稍微快一些(大约5%)。

#5


0  

The code below is based on different ways of iteration. On my machine, Test1b takes almost twice as long as Test1a. I wonder if this relates to your issue.

下面的代码基于不同的迭代方法。在我的机器上,Test1b花的时间几乎是Test1a的两倍。我不知道这是否与你的问题有关。

class Program
{
    struct TwoFloats
    {
        public float x;
        public float y;
    }

    static TwoFloats[] a = new TwoFloats[10000];

    static int Test1a()
    {
        int count = 0;
        for (int i = 0; i < 10000; i += 1)
        {
            if (a[i].x < a[i].y) count++;
        }
        return count;
    }

    static int Test1b()
    {
        int count = 0;
        foreach (TwoFloats t in a)
        {
            if (t.x < t.y) count++;
        }
        return count;
    }

    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int j = 0; j < 5000; ++j)
        {
            Test1a();
        }
        sw.Stop();
        Trace.WriteLine(sw.ElapsedMilliseconds);
        sw.Reset();
        sw.Start();
        for (int j = 0; j < 5000; ++j)
        {
            Test1b();
        }
        sw.Stop();
        Trace.WriteLine(sw.ElapsedMilliseconds);
    }

}

#1


2  

Firstly structs don't necessarily appear on the heap - they can and often do appear on the stack.

首先,结构并不一定出现在堆上——它们可以而且经常出现在堆栈上。

Regarding your performance measurements, I think you must have tested it incorrectly. Using this benchmarking code I get almost the same performance results for both types:

关于您的性能度量,我认为您一定测试错误了。使用这个基准代码,我得到了几乎相同的性能结果为这两种类型:

TwoFloats[] a = new TwoFloats[10000];
float[] b = new float[20000];

void test1()
{
    int count = 0;
    for (int i = 0; i < 10000; i += 1)
    {
        if (a[i].x < 10) count++;
    }
}

void test2()
{
    int count = 0;
    for (int i = 0; i < 20000; i += 2)
    {
        if (b[i] < 10) count++;
    }
}

Results:

结果:

Method   Iterations per second
test1                 55200000
test2                 54800000

#2


1  

You are doing something seriously wrong if you get times like that. Float comparisons are dramatically fast, I clock them at 2 nanoseconds with the loop overhead. Crafting a test like this is tricky because the JIT compiler will optimize stuff away if you don't use the result of the comparison.

如果你遇到这样的情况,那你就大错特错了。浮点数的比较是非常快的,我在2纳秒内用循环的开销计时。像这样制作一个测试是很困难的,因为如果不使用比较结果,JIT编译器会优化这些东西。

The structure is slightly faster, 1.96 nanoseconds vs 2.20 nanoseconds for the float[] on my laptop. That's the way it should be, accessing the Y member of the struct doesn't cost an extra array index.

它的结构稍微快一些,1.96纳秒,而我笔记本上的浮点数是2.20纳秒。它应该是这样的,访问结构的Y成员不需要额外的数组索引。

Test code:

测试代码:

using System;
using System.Diagnostics;

class Program {
    static void Main(string[] args) {
        var test1 = new float[100000000];  // 100 million
        for (int ix = 0; ix < test1.Length; ++ix) test1[ix] = ix;
        var test2 = new Test[test1.Length / 2];
        for (int ix = 0; ix < test2.Length; ++ix) test2[ix].x = test2[ix].y = ix;
        for (int cnt = 0; cnt < 20; ++cnt) {
            var sw1 = Stopwatch.StartNew();
            bool dummy = false;
            for (int ix = 0; ix < test1.Length; ix += 2) {
                dummy ^= test1[ix] >= test1[ix + 1];
            }
            sw1.Stop();
            var sw2 = Stopwatch.StartNew();
            for (int ix = 0; ix < test2.Length; ++ix) {
                dummy ^= test2[ix].x >= test2[ix].y;
            }
            sw2.Stop();
            Console.Write("", dummy);
            Console.WriteLine("{0} {1}", sw1.ElapsedMilliseconds, sw2.ElapsedMilliseconds);
        }
        Console.ReadLine();
    }
    struct Test {
        public float x;
        public float y;
    }
}

#3


1  

I get results that seem to agree with you (and disagree with Mark). I'm curious if I've made a mistake constructing this (albeit crude) benchmark or if there is another factor at play.

我得到的结果似乎与你一致(与马克不同)。我很好奇我是否在构建这个基准(尽管很粗糙)时犯了一个错误,或者还有其他因素在起作用。

Compiled on MS C# targeting .NET 3.5 framework with VS2008. Release mode, no debugger attached.

使用VS2008在MS c#目标。net 3.5框架上编译。释放模式,没有调试器附加。

Here's my test code:

这是我的测试代码:

class Program {
    static void Main(string[] args) {
        for (int i = 0; i < 10; i++) {
            RunBench();
        }

        Console.ReadKey();
    }

    static void RunBench() {
        Stopwatch sw = new Stopwatch();

        const int numPoints = 10000;
        const int numFloats = numPoints * 2;
        int numEqs = 0;
        float[] rawFloats = new float[numFloats];
        Vec2[] vecs = new Vec2[numPoints];

        Random rnd = new Random();
        for (int i = 0; i < numPoints; i++) {
            rawFloats[i * 2] = (float) rnd.NextDouble();
            rawFloats[i * 2 + 1] = (float)rnd.NextDouble();
            vecs[i] = new Vec2() { X = rawFloats[i * 2], Y = rawFloats[i * 2 + 1] };
        }

        sw.Start();
        for (int i = 0; i < numFloats; i += 2) {
            for (int j = 0; j < numFloats; j += 2) {
                if (i != j &&
                    Math.Abs(rawFloats[i] - rawFloats[j]) < 0.0001 &&
                    Math.Abs(rawFloats[i + 1] - rawFloats[j + 1]) < 0.0001) {
                    numEqs++;
                }
            }
        }
        sw.Stop();

        Console.WriteLine(sw.ElapsedMilliseconds.ToString() + " : numEqs = " + numEqs);

        numEqs = 0;
        sw.Reset();
        sw.Start();
        for (int i = 0; i < numPoints; i++) {
            for (int j = 0; j < numPoints; j++) {
                if (i != j &&
                    Math.Abs(vecs[i].X - vecs[j].X) < 0.0001 &&
                    Math.Abs(vecs[i].Y - vecs[j].Y) < 0.0001) {
                    numEqs++;
                }
            }
        }
        sw.Stop();

        Console.WriteLine(sw.ElapsedMilliseconds.ToString() + " : numEqs = " + numEqs);
    }
}

struct Vec2 {
    public float X;
    public float Y;
}

Edit: Ah! I wasn't iterating the proper amounts. With the updated code my timings look like I expected:

编辑:啊!我没有迭代适当的量。随着更新的代码,我的计时看起来就像我期望的那样:

269 : numEqs = 8
269 : numEqs = 8
270 : numEqs = 2
269 : numEqs = 2
268 : numEqs = 4
270 : numEqs = 4
269 : numEqs = 2
268 : numEqs = 2
270 : numEqs = 6
270 : numEqs = 6
269 : numEqs = 8
268 : numEqs = 8
268 : numEqs = 4
270 : numEqs = 4
269 : numEqs = 6
269 : numEqs = 6
268 : numEqs = 2
270 : numEqs = 2
268 : numEqs = 4
270 : numEqs = 4

#4


0  

The most likely reason is that the C# runtime optimizer perform a better job when you work with floats that with full structs, probably because optimizer is mapping x and y to registers or likewise changes not done with full struct.

最有可能的原因是,当使用全结构浮点数时,c#运行时优化器会执行更好的工作,这可能是因为优化器将x和y映射到寄存器,或者同样地,对全结构的修改没有完成。

In your particular example there seems not to be any fundamental reason why it couldn't perform as good a job when you use structs (it's hard to be sure without seeing you actual benchmarking code), but it just doesn't. However it would be interesting to compare the performance of the resulting code when compiled with another C# implementations (I'm thinking of mono on Linux).

在您的特定示例中,似乎没有任何根本原因可以解释为什么在使用struct时它不能很好地执行任务(如果没有看到实际的基准代码,就很难确定),但它就是不能。但是,如果用另一个c#实现(我正在考虑Linux上的mono)来比较结果代码的性能,那将会很有趣。

I tested Ron Warholic benchmark with mono, and results are consistant with Mark's, difference between the two types of access seems to be minimal (version with floats is 1% faster). However I still should do more testing as it is not unexpected that library calls like Math.Abs take a large amount of time and it could hide a real difference.

我用mono测试了Ron Warholic基准,结果与Mark一致,两种访问之间的差异似乎很小(带浮点数的版本快1%)。但是我还是应该做更多的测试,因为像数学这样的库调用并不意外。Abs需要大量的时间,它可以隐藏真正的差异。

After removing calls to Math.Abs and just doing tests like rawFloats[i] < rawFloats[j] the structure version becomes marginally faster (about 5%) than the two arrays of floats.

在删除对数学的调用之后。Abs和仅仅做像rawfloat [i] < rawfloat [j]这样的测试,结构版本比两组浮点数稍微快一些(大约5%)。

#5


0  

The code below is based on different ways of iteration. On my machine, Test1b takes almost twice as long as Test1a. I wonder if this relates to your issue.

下面的代码基于不同的迭代方法。在我的机器上,Test1b花的时间几乎是Test1a的两倍。我不知道这是否与你的问题有关。

class Program
{
    struct TwoFloats
    {
        public float x;
        public float y;
    }

    static TwoFloats[] a = new TwoFloats[10000];

    static int Test1a()
    {
        int count = 0;
        for (int i = 0; i < 10000; i += 1)
        {
            if (a[i].x < a[i].y) count++;
        }
        return count;
    }

    static int Test1b()
    {
        int count = 0;
        foreach (TwoFloats t in a)
        {
            if (t.x < t.y) count++;
        }
        return count;
    }

    static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();
        sw.Start();
        for (int j = 0; j < 5000; ++j)
        {
            Test1a();
        }
        sw.Stop();
        Trace.WriteLine(sw.ElapsedMilliseconds);
        sw.Reset();
        sw.Start();
        for (int j = 0; j < 5000; ++j)
        {
            Test1b();
        }
        sw.Stop();
        Trace.WriteLine(sw.ElapsedMilliseconds);
    }

}