std::vector比普通数组慢得多吗?

时间:2021-12-01 04:06:28

I've always thought it's the general wisdom that std::vector is "implemented as an array," blah blah blah. Today I went down and tested it, and it seems to be not so:

我一直认为std::vector是“作为一个数组实现的”,等等。今天我去做了测试,结果似乎不是这样:

Here's some test results:

这里有一些测试结果:

UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds

That's about 3 - 4 times slower! Doesn't really justify for the "vector may be slower for a few nanosecs" comments.

大约慢了3 - 4倍!对于“一些纳米秒”的注释来说,“向量可能会更慢”这一说法并不是很合理。

And the code I used:

以及我使用的代码:

#include <cstdlib>
#include <vector>

#include <iostream>
#include <string>

#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>

class TestTimer
{
    public:
        TestTimer(const std::string & name) : name(name),
            start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
        {
        }

        ~TestTimer()
        {
            using namespace std;
            using namespace boost;

            posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
            posix_time::time_duration d = now - start;

            cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
                " seconds" << endl;
        }

    private:
        std::string name;
        boost::posix_time::ptime start;
};

struct Pixel
{
    Pixel()
    {
    }

    Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
    {
    }

    unsigned char r, g, b;
};

void UseVector()
{
    TestTimer t("UseVector");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
        pixels.resize(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

void UseVectorPushBack()
{
    TestTimer t("UseVectorPushBack");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
            pixels.reserve(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
            pixels.push_back(Pixel(255, 0, 0));
    }
}

void UseArray()
{
    TestTimer t("UseArray");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);

        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }

        free(pixels);
    }
}

int main()
{
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();
    UseVectorPushBack();

    return 0;
}

Am I doing it wrong or something? Or have I just busted this performance myth?

我做错了吗?还是我刚刚打破了这个表演神话?

I'm using Release mode in Visual Studio 2005.

我在Visual Studio 2005中使用发布模式。


In Visual C++, #define _SECURE_SCL 0 reduces UseVector by half (bringing it down to 4 seconds). This is really huge, IMO.

在Visual c++中,#define _SECURE_SCL 0将UseVector减少了一半(将其降低到4秒)。这真是太大了,在我看来。

19 个解决方案

#1


246  

Using the following:

使用以下:

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseArray completed in 2.196 seconds
UseVector completed in 4.412 seconds
UseVectorPushBack completed in 8.017 seconds
The whole thing completed in 14.626 seconds

g++ o3的时间。cpp -我< MyBoost >。/。UseArray跑完了2.196秒,跑完了4.412秒,跑完了8.017秒,跑完了14.626秒

So array is twice as quick as vector.

所以数组的速度是矢量的两倍。

But after looking at the code in more detail this is expected; as you run across the vector twice and the array only once. Note: when you resize() the vector you are not only allocating the memory but also running through the vector and calling the constructor on each member.

但在更详细地看了代码之后,这是预期的;当你在向量上运行两次,而数组只运行一次。注意:当您调整()向量的大小时,您不仅要分配内存,而且还要遍历向量并调用每个成员的构造函数。

Re-Arranging the code slightly so that the vector only initializes each object once:

稍微重新排列代码,使矢量只初始化每个对象一次:

 std::vector<Pixel>  pixels(dimensions * dimensions, Pixel(255,0,0));

Now doing the same timing again:

现在再次做同样的时机:

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseVector completed in 2.216 seconds

g++ o3的时间。cpp -我< MyBoost >。/。UseVector在2.216秒完成任务

The vector now performance only slightly worse than the array. IMO this difference is insignificant and could be caused by a whole bunch of things not associated with the test.

向量现在的性能只比数组差一点点。在我看来,这种差异是无关紧要的,它可能是由一系列与测试无关的事情造成的。

I would also take into account that you are not correctly initializing/Destroying the Pixel object in the UseArrray() method as neither constructor/destructor is not called (this may not be an issue for this simple class but anything slightly more complex (ie with pointers or members with pointers) will cause problems.

我还要考虑到,在UseArrray()方法中,您没有正确地初始化/销毁像素对象,因为没有调用构造函数/析构函数(对于这个简单的类来说,这可能不是问题,但是任何稍微复杂一点的东西(比如指针或指针成员)都会导致问题。

#2


52  

Great question. I came in here expecting to find some simple fix that would speed the vector tests right up. That didn't work out quite like I expected!

好问题。我来这里希望找到一些简单的方法来加速矢量测试。这和我预想的不一样!

Optimization helps, but it's not enough. With optimization on I'm still seeing a 2X performance difference between UseArray and UseVector. Interestingly, UseVector was significantly slower than UseVectorPushBack without optimization.

优化是有帮助的,但这还不够。随着优化,我仍然看到UseArray和UseVector之间的性能差异。有趣的是,在没有优化的情况下,UseVector比UseVectorPushBack要慢得多。

# g++ -Wall -Wextra -pedantic -o vector vector.cpp
# ./vector
UseArray completed in 20.68 seconds
UseVector completed in 120.509 seconds
UseVectorPushBack completed in 37.654 seconds
The whole thing completed in 178.845 seconds
# g++ -Wall -Wextra -pedantic -O3 -o vector vector.cpp
# ./vector
UseArray completed in 3.09 seconds
UseVector completed in 6.09 seconds
UseVectorPushBack completed in 9.847 seconds
The whole thing completed in 19.028 seconds

Idea #1 - Use new[] instead of malloc

I tried changing malloc() to new[] in UseArray so the objects would get constructed. And changing from individual field assignment to assigning a Pixel instance. Oh, and renaming the inner loop variable to j.

我尝试在UseArray中将malloc()更改为new[],以便构造对象。从单独的字段分配到分配一个像素实例。将内部循环变量重命名为j。

void UseArray()
{
    TestTimer t("UseArray");

    for(int i = 0; i < 1000; ++i)
    {   
        int dimension = 999;

        // Same speed as malloc().
        Pixel * pixels = new Pixel[dimension * dimension];

        for(int j = 0 ; j < dimension * dimension; ++j)
            pixels[j] = Pixel(255, 0, 0);

        delete[] pixels;
    }
}

Surprisingly (to me), none of those changes made any difference whatsoever. Not even the change to new[] which will default construct all of the Pixels. It seems that gcc can optimize out the default constructor calls when using new[], but not when using vector.

令人惊讶的是(对我来说),这些改变都没有带来任何改变。甚至没有对new[]的更改,它将默认构造所有的像素。似乎gcc在使用new[]时可以优化默认的构造函数调用,而在使用vector时则不能。

Idea #2 - Remove repeated operator[] calls

I also attempted to get rid of the triple operator[] lookup and cache the reference to pixels[j]. That actually slowed UseVector down! Oops.

我还试图摆脱三重运算符[]查找并缓存对像素的引用[j]。这实际上减慢了UseVector的速度!哦。

for(int j = 0; j < dimension * dimension; ++j)
{
    // Slower than accessing pixels[j] three times.
    Pixel &pixel = pixels[j];
    pixel.r = 255;
    pixel.g = 0;
    pixel.b = 0;
}

# ./vector 
UseArray completed in 3.226 seconds
UseVector completed in 7.54 seconds
UseVectorPushBack completed in 9.859 seconds
The whole thing completed in 20.626 seconds

Idea #3 - Remove constructors

What about removing the constructors entirely? Then perhaps gcc can optimize out the construction of all of the objects when the vectors are created. What happens if we change Pixel to:

完全删除构造函数怎么样?然后,也许gcc可以在创建向量时优化所有对象的构造。如果我们将像素改为:

struct Pixel
{
    unsigned char r, g, b;
};

Result: about 10% faster. Still slower than an array. Hm.

结果:大约快10%。仍然比数组慢。嗯。

# ./vector 
UseArray completed in 3.239 seconds
UseVector completed in 5.567 seconds

Idea #4 - Use iterator instead of loop index

How about using a vector<Pixel>::iterator instead of a loop index?

使用向量 ::iterator代替循环索引怎么样?

for (std::vector<Pixel>::iterator j = pixels.begin(); j != pixels.end(); ++j)
{
    j->r = 255;
    j->g = 0;
    j->b = 0;
}

Result:

结果:

# ./vector 
UseArray completed in 3.264 seconds
UseVector completed in 5.443 seconds

Nope, no different. At least it's not slower. I thought this would have performance similar to #2 where I used a Pixel& reference.

不,没有什么不同。至少它不慢。我认为它的性能与我使用Pixel& reference时的性能类似。

Conclusion

Even if some smart cookie figures out how to make the vector loop as fast as the array one, this does not speak well of the default behavior of std::vector. So much for the compiler being smart enough to optimize out all the C++ness and make STL containers as fast as raw arrays.

即使一些智能cookie知道如何使向量循环和数组一样快,这也不能很好地说明std::vector的默认行为。编译器足够聪明,可以优化所有c++特性,使STL容器和原始数组一样快。

The bottom line is that the compiler is unable to optimize away the no-op default constructor calls when using std::vector. If you use plain new[] it optimizes them away just fine. But not with std::vector. Even if you can rewrite your code to eliminate the constructor calls that flies in face of the mantra around here: "The compiler is smarter than you. The STL is just as fast as plain C. Don't worry about it."

最重要的是,当使用std:::vector时,编译器无法优化无op默认构造函数调用。如果你使用普通的new[],它会很好地优化它们。但不是用std::向量。即使您可以重写代码,以消除构造函数调用,这也会违背这里的格言:“编译器比您更聪明。”STL和普通c一样快,别担心。

#3


33  

To be fair, you cannot compare a C++ implementation to a C implementation, as I would call your malloc version. malloc does not create objects - it only allocates raw memory. That you then treat that memory as objects without calling the constructor is poor C++ (possibly invalid - I'll leave that to the language lawyers).

公平地说,您不能将c++实现与C实现进行比较,我将其称为malloc版本。malloc不创建对象——它只分配原始内存。然后,您将该内存视为对象,而不调用构造函数,这是很糟糕的c++(可能无效——我将把它留给语言律师)。

That said, simply changing the malloc to new Pixel[dimensions*dimensions] and free to delete [] pixels does not make much difference with the simple implementation of Pixel that you have. Here's the results on my box (E6600, 64-bit):

也就是说,简单地将malloc更改为新的像素[维度*维度],并且可以*地删除[]像素,这与您所拥有的像素的简单实现没有太大的差别。这是我的结果(E6600, 64位):

UseArray completed in 0.269 seconds
UseVector completed in 1.665 seconds
UseVectorPushBack completed in 7.309 seconds
The whole thing completed in 9.244 seconds

But with a slight change, the tables turn:

但是稍微改变一下,情况就变了:

Pixel.h

struct Pixel
{
    Pixel();
    Pixel(unsigned char r, unsigned char g, unsigned char b);

    unsigned char r, g, b;
};

Pixel.cc

#include "Pixel.h"

Pixel::Pixel() {}
Pixel::Pixel(unsigned char r, unsigned char g, unsigned char b) 
  : r(r), g(g), b(b) {}

main.cc

#include "Pixel.h"
[rest of test harness without class Pixel]
[UseArray now uses new/delete not malloc/free]

Compiled this way:

编译:

$ g++ -O3 -c -o Pixel.o Pixel.cc
$ g++ -O3 -c -o main.o main.cc
$ g++ -o main main.o Pixel.o

we get very different results:

我们得到了非常不同的结果:

UseArray completed in 2.78 seconds
UseVector completed in 1.651 seconds
UseVectorPushBack completed in 7.826 seconds
The whole thing completed in 12.258 seconds

With a non-inlined constructor for Pixel, std::vector now beats a raw array.

有了Pixel的非内联构造函数,std::vector现在打败了原始数组。

It would appear that the complexity of allocation through std::vector and std:allocator is too much to be optimised as effectively as a simple new Pixel[n]. However, we can see that the problem is simply with the allocation not the vector access by tweaking a couple of the test functions to create the vector/array once by moving it outside the loop:

似乎通过std::vector和std:分配器分配的复杂性太大了,不能像一个简单的新像素那样有效地优化[n]。但是,我们可以看到,问题仅仅是通过调整两个测试函数来创建一个矢量/数组,而不是通过将它移动到循环之外来创建矢量/数组。

void UseVector()
{
    TestTimer t("UseVector");

    int dimension = 999;
    std::vector<Pixel> pixels;
    pixels.resize(dimension * dimension);

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

and

void UseArray()
{
    TestTimer t("UseArray");

    int dimension = 999;
    Pixel * pixels = new Pixel[dimension * dimension];

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
    delete [] pixels;
}

We get these results now:

我们现在得到的结果是:

UseArray completed in 0.254 seconds
UseVector completed in 0.249 seconds
UseVectorPushBack completed in 7.298 seconds
The whole thing completed in 7.802 seconds

What we can learn from this is that std::vector is comparable to a raw array for access, but if you need to create and delete the vector/array many times, creating a complex object will be more time consuming that creating a simple array when the element's constructor is not inlined. I don't think that this is very surprising.

我们可以从中学到的是,std::vector可以与原始数组进行访问,但是如果您需要多次创建和删除vector/array,那么在元素的构造函数没有内联时,创建一个复杂的对象将花费更多的时间来创建一个简单的数组。我不认为这很令人惊讶。

#4


33  

This is an old but popular question.

这是一个古老而流行的问题。

At this point, many programmers will be working in C++11. And in C++11 the OP's code as written runs equally fast for UseArray or UseVector.

此时,许多程序员将使用c++ 11。在c++ 11中,对于UseArray或UseVector, OP的代码写得一样快。

UseVector completed in 3.74482 seconds
UseArray completed in 3.70414 seconds

The fundamental problem was that while your Pixel structure was uninitialized, std::vector<T>::resize( size_t, T const&=T() ) takes a default constructed Pixel and copies it. The compiler did not notice it was being asked to copy uninitialized data, so it actually performed the copy.

基本的问题是,当像素结构未初始化时,std::vector ::resize(size_t, T const&=T())获取一个默认构造的像素并复制它。编译器没有注意到它被要求复制未初始化的数据,所以它实际上执行了复制。

In C++11, std::vector<T>::resize has two overloads. The first is std::vector<T>::resize(size_t), the other is std::vector<T>::resize(size_t, T const&). This means when you invoke resize without a second argument, it simply default constructs, and the compiler is smart enough to realize that default construction does nothing, so it skips the pass over the buffer.

在c++ 11中,std::vector ::resize有两个重载。第一个是std:, vector ::resize(size_t),另一个是std::vector ::resize(size_t, T const&)。这意味着,当您在没有第二个参数的情况下调用resize时,它只是默认构造,而且编译器足够聪明,能够意识到默认构造不做任何事情,所以它跳过了缓冲区的传递。

(The two overloads where added to handle movable, constructable and non-copyable types -- the performance improvement when working on uninitialized data is a bonus).

(添加到处理可移动、可构造和不可复制类型的两个重载——处理未初始化数据时的性能改进是一个额外的好处)。

The push_back solution also does fencepost checking, which slows it down, so it remains slower than the malloc version.

push_back解决方案还进行了fencepost检查,这降低了它的速度,因此它仍然比malloc版本慢。

live example (I also replaced the timer with chrono::high_resolution_clock).

live example(我也用chrono::high_resolution_clock替换了计时器)。

Note that if you have a structure that usually requires initialization, but you want to handle it after growing your buffer, you can do this with a custom std::vector allocator. If you want to then move it into a more normal std::vector, I believe careful use of allocator_traits and overriding of == might pull that off, but am unsure.

注意,如果您有一个结构,通常需要初始化,但是您希望在扩展缓冲区之后处理它,那么可以使用自定义的std::vector分配器来实现这一点。如果您想要将它移动到更普通的std::vector中,我认为仔细使用allocator_traits和重写== =可能会成功,但我不确定。

#5


26  

Try with this:

试试这个:

void UseVectorCtor()
{
    TestTimer t("UseConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));
    }
}

I get almost exactly the same performance as with array.

我得到的性能与数组几乎完全相同。

The thing about vector is that it's a much more general tool than an array. And that means you have to consider how you use it. It can be used in a lot of different ways, providing functionality that an array doesn't even have. And if you use it "wrong" for your purpose, you incur a lot of overhead, but if you use it correctly, it is usually basically a zero-overhead data structure. In this case, the problem is that you separately initialized the vector (causing all elements to have their default ctor called), and then overwriting each element individually with the correct value. That is much harder for the compiler to optimize away than when you do the same thing with an array. Which is why the vector provides a constructor which lets you do exactly that: initialize N elements with value X.

关于向量,它是一个比数组更通用的工具。这意味着你必须考虑如何使用它。它可以以许多不同的方式使用,提供数组甚至没有的功能。如果您的目的是“错误的”,那么您就会产生大量的开销,但是如果您正确地使用它,它通常基本上是一个零开销的数据结构。在这种情况下,问题在于分别初始化向量(使所有元素都调用它们的默认ctor),然后用正确的值分别覆盖每个元素。对编译器来说,这比对数组做同样的事情要难得多。这就是为什么vector提供了一个构造函数,它可以让您精确地实现:用值X初始化N个元素。

And when you use that, the vector is just as fast as an array.

当你使用它时,这个向量就像一个数组一样快。

So no, you haven't busted the performance myth. But you have shown that it's only true if you use the vector optimally, which is a pretty good point too. :)

所以,不,你没有打破性能神话。但是你已经证明了只有当你用最优向量时它才成立,这也是一个很好的点。:)

On the bright side, it's really the simplest usage that turns out to be fastest. If you contrast my code snippet (a single line) with John Kugelman's answer, containing heaps and heaps of tweaks and optimizations, which still don't quite eliminate the performance difference, it's pretty clear that vector is pretty cleverly designed after all. You don't have to jump through hoops to get speed equal to an array. On the contrary, you have to use the simplest possible solution.

从好的方面来看,它是最简单的用法,结果是最快的。如果您将我的代码片段(一行)与John Kugelman的答案进行对比,其中包含大量的微调和优化,但仍然不能完全消除性能差异,那么很明显,vector毕竟设计得相当巧妙。你不需要跳圈就能得到速度等于一个数组。相反,你必须使用最简单的可能的解决方案。

#6


21  

It was hardly a fair comparison when I first looked at your code; I definitely thought you weren't comparing apples with apples. So I thought, let's get constructors and destructors being called on all tests; and then compare.

当我第一次看到你的代码时,这很难说是一个公平的比较;我肯定你不是在拿苹果和苹果做比较。所以我想,让我们在所有测试中调用构造函数和析构函数;然后进行比较。

const size_t dimension = 1000;

void UseArray() {
    TestTimer t("UseArray");
    for(size_t j = 0; j < dimension; ++j) {
        Pixel* pixels = new Pixel[dimension * dimension];
        for(size_t i = 0 ; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
        delete[] pixels;
    }
}

void UseVector() {
    TestTimer t("UseVector");
    for(size_t j = 0; j < dimension; ++j) {
        std::vector<Pixel> pixels(dimension * dimension);
        for(size_t i = 0; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
    }
}

int main() {
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();

    return 0;
}

My thoughts were, that with this setup, they should be exactly the same. It turns out, I was wrong.

我的想法是,在这个设置下,它们应该是完全相同的。结果证明,我错了。

UseArray completed in 3.06 seconds
UseVector completed in 4.087 seconds
The whole thing completed in 10.14 seconds

So why did this 30% performance loss even occur? The STL has everything in headers, so it should have been possible for the compiler to understand everything that was required.

为什么30%的性能损失会发生呢?STL包含了头文件中的所有内容,因此编译器应该能够理解所需的所有内容。

My thoughts were that it is in how the loop initialises all values to the default constructor. So I performed a test:

我的想法是,循环是如何将所有值初始化为默认构造函数的。所以我做了一个测试:

class Tester {
public:
    static int count;
    static int count2;
    Tester() { count++; }
    Tester(const Tester&) { count2++; }
};
int Tester::count = 0;
int Tester::count2 = 0;

int main() {
    std::vector<Tester> myvec(300);
    printf("Default Constructed: %i\nCopy Constructed: %i\n", Tester::count, Tester::count2);

    return 0;
}

The results were as I suspected:

结果正如我猜想的那样:

Default Constructed: 1
Copy Constructed: 300

This is clearly the source of the slowdown, the fact that the vector uses the copy constructor to initialise the elements from a default constructed object.

这显然是放缓的根源,事实上向量使用复制构造函数从默认构造的对象初始化元素。

This means, that the following pseudo-operation order is happening during construction of the vector:

这意味着,在向量的构建过程中会发生以下伪操作顺序:

Pixel pixel;
for (auto i = 0; i < N; ++i) vector[i] = pixel;

Which, due to the implicit copy constructor made by the compiler, is expanded to the following:

由于编译器所作的隐式复制构造函数,将其扩展为:

Pixel pixel;
for (auto i = 0; i < N; ++i) {
    vector[i].r = pixel.r;
    vector[i].g = pixel.g;
    vector[i].b = pixel.b;
}

So the default Pixel remains un-initialised, while the rest are initialised with the default Pixel's un-initialised values.

因此,默认像素仍然是未初始化的,而其余像素则用默认像素的未初始化值初始化。

Compared to the alternative situation with New[]/Delete[]:

与新[]/删除[]的替代情况相比:

int main() {
    Tester* myvec = new Tester[300];

    printf("Default Constructed: %i\nCopy Constructed:%i\n", Tester::count, Tester::count2);

    delete[] myvec;

    return 0;
}

Default Constructed: 300
Copy Constructed: 0

They are all left to their un-initialised values, and without the double iteration over the sequence.

它们都留给未初始化的值,并且不需要对序列进行重复迭代。

Armed with this information, how can we test it? Let's try over-writing the implicit copy constructor.

有了这些信息,我们如何进行测试?让我们尝试重写隐式复制构造函数。

Pixel(const Pixel&) {}

And the results?

和结果?

UseArray completed in 2.617 seconds
UseVector completed in 2.682 seconds
The whole thing completed in 5.301 seconds

So in summary, if you're making hundreds of vectors very often: re-think your algorithm.

总结一下,如果你经常画几百个向量:重新考虑你的算法。

In any case, the STL implementation isn't slower for some unknown reason, it just does exactly what you ask; hoping you know better.

无论如何,STL实现并不会因为某些未知的原因而变慢,它只是按照您的要求执行;希望你知道更好。

#7


7  

Try disabling checked iterators and building in release mode. You shouldn't see much of a performance difference.

尝试禁用检查迭代器并在发布模式中构建。您不应该看到性能差异太大。

#8


4  

GNU's STL (and others), given vector<T>(n), default constructs a prototypal object T() - the compiler will optimise away the empty constructor - but then a copy of whatever garbage happened to be in the memory addresses now reserved for the object is taken by the STL's __uninitialized_fill_n_aux, which loops populating copies of that object as the default values in the vector. So, "my" STL is not looping constructing, but constructing then loop/copying. It's counter intuitive, but I should have remembered as I commented on a recent * question about this very point: the construct/copy can be more efficient for reference counted objects etc..

GNU的STL(和其他人),给出向量< T >(n),默认构造原型的对象T()——编译器会优化掉空构造函数,然后复制任何垃圾碰巧在对象的内存地址现在保留了STL的__uninitialized_fill_n_aux循环填充该对象的副本为默认值的向量。因此,“my”STL不是循环构造,而是构造循环/复制。这是违反直觉的,但我应该记得最近关于这一点的*问题的评论:构造/复制对于引用计数对象等来说更有效。

So:

所以:

vector<T> x(n);

or

vector<T> x;
x.resize(n);

is - on many STL implementations - something like:

在许多STL实现中,is是:

T temp;
for (int i = 0; i < n; ++i)
    x[i] = temp;

The issue being that the current generation of compiler optimisers don't seem to work from the insight that temp is uninitialised garbage, and fail to optimise out the loop and default copy constructor invocations. You could credibly argue that compilers absolutely shouldn't optimise this away, as a programmer writing the above has a reasonable expectation that all the objects will be identical after the loop, even if garbage (usual caveats about 'identical'/operator== vs memcmp/operator= etc apply). The compiler can't be expected to have any extra insight into the larger context of std::vector<> or the later usage of the data that would suggest this optimisation safe.

问题是,当前的编译器优化器似乎不能工作,因为temp是未初始化的垃圾,并且未能优化循环和默认的复制构造函数调用。您可以令人相信地认为编译器绝对不应该优化这一点,因为程序员编写上面的代码有一个合理的期望,即在循环之后所有对象都是相同的,即使是垃圾(关于“相同的”/操作符==和memcmp/运算符=等)。不能期望编译器对std::vector<>或稍后会建议这种优化安全的数据的使用情况有任何额外的了解。

This can be contrasted with the more obvious, direct implementation:

这可以与更明显、更直接的执行作比较:

for (int i = 0; i < n; ++i)
    x[i] = T();

Which we can expect a compiler to optimise out.

我们可以期望编译器进行优化。

To be a bit more explicit about the justification for this aspect of vector's behaviour, consider:

为了更明确地说明向量行为这一方面的理由,请考虑:

std::vector<big_reference_counted_object> x(10000);

Clearly it's a major difference if we make 10000 independent objects versus 10000 referencing the same data. There's a reasonable argument that the advantage of protecting casual C++ users from accidentally doing something so expensive outweights the very small real-world cost of hard-to-optimise copy construction.

显然,如果我们让10000个独立的对象与10000个引用相同数据的对象相比,这是一个很大的不同。有一种合理的观点认为,保护临时的c++用户不被意外地做这么昂贵的事情的好处远远超过了难以优化的拷贝结构的实际成本。

ORIGINAL ANSWER (for reference / making sense of the comments): No chance. vector is as fast as an array, at least if you reserve space sensibly. ...

原始回答(供参考/理解评论):不可能。向量就像数组一样快,至少如果你合理地保留空间的话。

#9


3  

Martin York's answer bothers me because it seems like an attempt to brush the initialisation problem under the carpet. But he is right to identify redundant default construction as the source of performance problems.

马丁·约克(Martin York)的回答让我感到困扰,因为这似乎是在掩盖初始化问题。但是,他认为冗余的默认构造是性能问题的根源是正确的。

[EDIT: Martin's answer no longer suggests changing the default constructor.]

[编辑:马丁的答案不再建议修改默认构造函数。]

For the immediate problem at hand, you could certainly call the 2-parameter version of the vector<Pixel> ctor instead:

对于眼前的问题,您当然可以将向量的2参数版本称为 ctor:

std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));

That works if you want to initialise with a constant value, which is a common case. But the more general problem is: How can you efficiently initialise with something more complicated than a constant value?

如果你想要初始化一个常数值,这是很常见的情况。但更普遍的问题是:如何有效地初始化一些比常量值更复杂的东西呢?

For this you can use a back_insert_iterator, which is an iterator adaptor. Here's an example with a vector of ints, although the general idea works just as well for Pixels:

为此,您可以使用back_insert_iterator,它是一个迭代器适配器。这里有一个ints向量的例子,尽管对于像素来说一般的思想同样适用:

#include <iterator>
// Simple functor return a list of squares: 1, 4, 9, 16...
struct squares {
    squares() { i = 0; }
    int operator()() const { ++i; return i * i; }

private:
    int i;
};

...

std::vector<int> v;
v.reserve(someSize);     // To make insertions efficient
std::generate_n(std::back_inserter(v), someSize, squares());

Alternatively you could use copy() or transform() instead of generate_n().

您也可以使用copy()或transform()代替generate_n()。

The downside is that the logic to construct the initial values needs to be moved into a separate class, which is less convenient than having it in-place (although lambdas in C++1x make this much nicer). Also I expect this will still not be as fast as a malloc()-based non-STL version, but I expect it will be close, since it only does one construction for each element.

缺点是构造初始值的逻辑需要转移到一个单独的类中,这比将其放置在适当的位置要不那么方便(尽管c++ 1x中的lambdas使其变得更好)。此外,我预计这仍然不会像基于malloc()的非stl版本那样快,但我预计它会很接近,因为它只对每个元素执行一个构造。

#10


2  

The vector ones are additionally calling Pixel constructors.

矢量图是另外的调用像素构造函数。

Each is causing almost a million ctor runs that you're timing.

每一种都导致了大约一百万次的ctor运行。

edit: then there's the outer 1...1000 loop, so make that a billion ctor calls!

编辑:然后是外部1……1000个循环,让十亿的ctor调用!

edit 2: it'd be interesting to see the disassembly for the UseArray case. An optimizer could optimize the whole thing away, since it has no effect other than burning CPU.

编辑2:看看UseArray案例的拆解会很有趣。优化器可以优化整个东西,因为它除了消耗CPU之外没有其他效果。

#11


1  

Here's how the push_back method in vector works:

向量中的push_back方法是这样工作的:

  1. The vector allocates X amount of space when it is initialized.
  2. 向量在初始化时分配X个空间量。
  3. As stated below it checks if there is room in the current underlying array for the item.
  4. 如下面所述,它检查当前底层数组中是否有空间用于项。
  5. It makes a copy of the item in the push_back call.
  6. 它在push_back调用中复制项目。

After calling push_back X items:

调用push_back X后:

  1. The vector reallocates kX amount of space into a 2nd array.
  2. 向量将kX的空间重新分配到第二个数组中。
  3. It Copies the entries of the first array onto the second.
  4. 它将第一个数组的条目复制到第二个数组中。
  5. Discards the first array.
  6. 丢弃第一个数组。
  7. Now uses the second array as storage until it reaches kX entries.
  8. 现在使用第二个数组作为存储,直到它到达kX条目。

Repeat. If you're not reserving space its definitely going to be slower. More than that, if it's expensive to copy the item then 'push_back' like that is going to eat you alive.

重复。如果你不预留空间,它肯定会更慢。更重要的是,如果复制这个东西很昂贵,那么像这样的“push_back”会把你活活吃掉。

As to the vector versus array thing, I'm going to have to agree with the other people. Run in release, turn optimizations on, and put in a few more flags so that the friendly people at Microsoft don't #@%$^ it up for ya.

对于向量与数组的关系,我必须同意其他人的观点。在发布运行,打开优化,把更多的旗帜,这样友好的人在微软不# @ % $ ^丫。

One more thing, if you don't need to resize, use Boost.Array.

还有一件事,如果不需要调整大小,可以使用boot . array。

#12


1  

Some profiler data (pixel is aligned to 32 bits):

一些分析器数据(像素对齐到32位):

g++ -msse3 -O3 -ftree-vectorize -g test.cpp -DNDEBUG && ./a.out
UseVector completed in 3.123 seconds
UseArray completed in 1.847 seconds
UseVectorPushBack completed in 9.186 seconds
The whole thing completed in 14.159 seconds

Blah

废话

andrey@nv:~$ opannotate --source libcchem/src/a.out  | grep "Total samples for file" -A3
Overflow stats not available
 * Total samples for file : "/usr/include/c++/4.4/ext/new_allocator.h"
 *
 * 141008 52.5367
 */
--
 * Total samples for file : "/home/andrey/libcchem/src/test.cpp"
 *
 *  61556 22.9345
 */
--
 * Total samples for file : "/usr/include/c++/4.4/bits/stl_vector.h"
 *
 *  41956 15.6320
 */
--
 * Total samples for file : "/usr/include/c++/4.4/bits/stl_uninitialized.h"
 *
 *  20956  7.8078
 */
--
 * Total samples for file : "/usr/include/c++/4.4/bits/stl_construct.h"
 *
 *   2923  1.0891
 */

In allocator:

在分配器:

               :      // _GLIBCXX_RESOLVE_LIB_DEFECTS
               :      // 402. wrong new expression in [some_] allocator::construct
               :      void
               :      construct(pointer __p, const _Tp& __val)
141008 52.5367 :      { ::new((void *)__p) _Tp(__val); }

vector:

向量:

               :void UseVector()
               :{ /* UseVector() total:  60121 22.3999 */
...
               :
               :
 10790  4.0201 :        for (int i = 0; i < dimension * dimension; ++i) {
               :
   495  0.1844 :            pixels[i].r = 255;
               :
 12618  4.7012 :            pixels[i].g = 0;
               :
  2253  0.8394 :            pixels[i].b = 0;
               :
               :        }

array

数组

               :void UseArray()
               :{ /* UseArray() total:  35191 13.1114 */
               :
...
               :
   136  0.0507 :        for (int i = 0; i < dimension * dimension; ++i) {
               :
  9897  3.6874 :            pixels[i].r = 255;
               :
  3511  1.3081 :            pixels[i].g = 0;
               :
 21647  8.0652 :            pixels[i].b = 0;

Most of the overhead is in the copy constructor. For example,

大部分开销都在复制构造函数中。例如,

    std::vector < Pixel > pixels;//(dimension * dimension, Pixel());

    pixels.reserve(dimension * dimension);

    for (int i = 0; i < dimension * dimension; ++i) {

        pixels[i].r = 255;

        pixels[i].g = 0;

        pixels[i].b = 0;
    }

It has the same performance as an array.

它的性能与数组相同。

#13


1  

My laptop is Lenova G770 (4 GB RAM).

我的笔记本是Lenova G770 (4gb内存)。

The OS is Windows 7 64-bit (the one with laptop)

操作系统是Windows 7 64位(使用笔记本电脑的操作系统)

Compiler is MinGW 4.6.1.

编译器MinGW 4.6.1。

The IDE is Code::Blocks.

IDE代码::块。

I test the source codes of the first post.

我测试第一篇文章的源代码。

The results

O2 optimization

O2优化

UseArray completed in 2.841 seconds

UseArray只用了2.841秒就完成了

UseVector completed in 2.548 seconds

UseVector在2.548秒完成任务

UseVectorPushBack completed in 11.95 seconds

UseVectorPushBack在11.95秒内完成。

The whole thing completed in 17.342 seconds

整个过程只用了17.342秒就完成了

system pause

系统暂停

O3 optimization

O3优化

UseArray completed in 1.452 seconds

UseArray在1.452秒内完成

UseVector completed in 2.514 seconds

UseVector在2.514秒完成任务

UseVectorPushBack completed in 12.967 seconds

UseVectorPushBack以12.967秒完成

The whole thing completed in 16.937 seconds

整个过程用时16.937秒

It looks like the performance of vector is worse under O3 optimization.

在O3优化下,向量的性能似乎更差。

If you change the loop to

如果将循环改为

    pixels[i].r = i;
    pixels[i].g = i;
    pixels[i].b = i;

The speed of array and vector under O2 and O3 are almost the same.

在O2和O3下,阵列和矢量的速度几乎相同。

#14


1  

A better benchmark (I think...), compiler due to optimizations can change code, becouse results of allocated vectors/arrays are not used anywhere. Results:

一个更好的基准(我认为…),由于优化的编译器可以更改代码,因为分配的向量/数组的结果在任何地方都不使用。结果:

$ g++ test.cpp -o test -O3 -march=native
$ ./test 
UseArray inner completed in 0.652 seconds
UseArray completed in 0.773 seconds
UseVector inner completed in 0.638 seconds
UseVector completed in 0.757 seconds
UseVectorPushBack inner completed in 6.732 seconds
UseVectorPush completed in 6.856 seconds
The whole thing completed in 8.387 seconds

Compiler:

编译器:

gcc version 6.2.0 20161019 (Debian 6.2.0-9)

CPU:

CPU:

model name  : Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz

And the code:

和代码:

#include <cstdlib>
#include <vector>

#include <iostream>
#include <string>

#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>

class TestTimer
{
    public:
        TestTimer(const std::string & name) : name(name),
            start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
        {
        }

        ~TestTimer()
        {
            using namespace std;
            using namespace boost;

            posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
            posix_time::time_duration d = now - start;

            cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
                " seconds" << endl;
        }

    private:
        std::string name;
        boost::posix_time::ptime start;
};

struct Pixel
{
    Pixel()
    {
    }

    Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
    {
    }

    unsigned char r, g, b;
};

void UseVector(std::vector<std::vector<Pixel> >& results)
{
    TestTimer t("UseVector inner");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel>& pixels = results.at(i);
        pixels.resize(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

void UseVectorPushBack(std::vector<std::vector<Pixel> >& results)
{
    TestTimer t("UseVectorPushBack inner");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel>& pixels = results.at(i);
            pixels.reserve(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
            pixels.push_back(Pixel(255, 0, 0));
    }
}

void UseArray(Pixel** results)
{
    TestTimer t("UseArray inner");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);

        results[i] = pixels;

        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }

        // free(pixels);
    }
}

void UseArray()
{
    TestTimer t("UseArray");
    Pixel** array = (Pixel**)malloc(sizeof(Pixel*)* 1000);
    UseArray(array);
    for(int i=0;i<1000;++i)
        free(array[i]);
    free(array);
}

void UseVector()
{
    TestTimer t("UseVector");
    {
        std::vector<std::vector<Pixel> > vector(1000, std::vector<Pixel>());
        UseVector(vector);
    }
}

void UseVectorPushBack()
{
    TestTimer t("UseVectorPush");
    {
        std::vector<std::vector<Pixel> > vector(1000, std::vector<Pixel>());
        UseVectorPushBack(vector);
    }
}


int main()
{
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();
    UseVectorPushBack();

    return 0;
}

#15


0  

With the right options, vectors and arrays can generate identical asm. In these cases, they are of course the same speed, because you get the same executable file either way.

通过正确的选项,向量和数组可以生成相同的asm。在这些情况下,它们当然是相同的速度,因为无论哪种方式都可以得到相同的可执行文件。

#16


0  

By the way the slow down your seeing in classes using vector also occurs with standard types like int. Heres a multithreaded code:

顺便说一下,在使用vector的类中,你看到的东西变慢了,这种情况也发生在标准类型中,比如int。

#include <iostream>
#include <cstdio>
#include <map>
#include <string>
#include <typeinfo>
#include <vector>
#include <pthread.h>
#include <sstream>
#include <fstream>
using namespace std;

//pthread_mutex_t map_mutex=PTHREAD_MUTEX_INITIALIZER;

long long num=500000000;
int procs=1;

struct iterate
{
    int id;
    int num;
    void * member;
    iterate(int a, int b, void *c) : id(a), num(b), member(c) {}
};

//fill out viterate and piterate
void * viterate(void * input)
{
    printf("am in viterate\n");
    iterate * info=static_cast<iterate *> (input);
    // reproduce member type
    vector<int> test= *static_cast<vector<int>*> (info->member);
    for (int i=info->id; i<test.size(); i+=info->num)
    {
        //printf("am in viterate loop\n");
        test[i];
    }
    pthread_exit(NULL);
}

void * piterate(void * input)
{
    printf("am in piterate\n");
    iterate * info=static_cast<iterate *> (input);;
    int * test=static_cast<int *> (info->member);
    for (int i=info->id; i<num; i+=info->num) {
        //printf("am in piterate loop\n");
        test[i];
    }
    pthread_exit(NULL);
}

int main()
{
    cout<<"producing vector of size "<<num<<endl;
    vector<int> vtest(num);
    cout<<"produced  a vector of size "<<vtest.size()<<endl;
    pthread_t thread[procs];

    iterate** it=new iterate*[procs];
    int ans;
    void *status;

    cout<<"begining to thread through the vector\n";
    for (int i=0; i<procs; i++) {
        it[i]=new iterate(i, procs, (void *) &vtest);
    //  ans=pthread_create(&thread[i],NULL,viterate, (void *) it[i]);
    }
    for (int i=0; i<procs; i++) {
        pthread_join(thread[i], &status);
    }
    cout<<"end of threading through the vector";
    //reuse the iterate structures

    cout<<"producing a pointer with size "<<num<<endl;
    int * pint=new int[num];
    cout<<"produced a pointer with size "<<num<<endl;

    cout<<"begining to thread through the pointer\n";
    for (int i=0; i<procs; i++) {
        it[i]->member=&pint;
        ans=pthread_create(&thread[i], NULL, piterate, (void*) it[i]);
    }
    for (int i=0; i<procs; i++) {
        pthread_join(thread[i], &status);
    }
    cout<<"end of threading through the pointer\n";

    //delete structure array for iterate
    for (int i=0; i<procs; i++) {
        delete it[i];
    }
    delete [] it;

    //delete pointer
    delete [] pint;

    cout<<"end of the program"<<endl;
    return 0;
}

The behavior from the code shows the instantiation of vector is the longest part of the code. Once you get through that bottle neck. The rest of the code runs extremely fast. This is true no matter how many threads you are running on.

代码中的行为表明向量的实例化是代码中最长的部分。一旦你穿过那个瓶颈。其余的代码运行得非常快。无论您正在运行多少线程,这都是正确的。

By the way ignore the absolutely insane number of includes. I have been using this code to test things for a project so the number of includes keep growing.

顺便说一下,忽略绝对疯狂的包含数。我一直在使用这段代码来测试一个项目的内容,所以包含的数量在不断增加。

#17


0  

I just want to mention that vector (and smart_ptr) is just a thin layer add on top of raw arrays (and raw pointers). And actually the access time of an vector in continuous memory is faster than array. The following code shows the result of initialize and access vector and array.

我只想提一下,向量(和smart_ptr)只是在原始数组(和原始指针)之上添加的一个薄层。实际上,一个向量在连续存储器中的访问时间比数组要快。下面的代码显示了初始化和访问向量和数组的结果。

#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
#include <vector>
#define SIZE 20000
int main() {
    srand (time(NULL));
    vector<vector<int>> vector2d;
    vector2d.reserve(SIZE);
    int index(0);
    boost::posix_time::ptime start_total = boost::posix_time::microsec_clock::local_time();
    //  timer start - build + access
    for (int i = 0; i < SIZE; i++) {
        vector2d.push_back(vector<int>(SIZE));
    }
    boost::posix_time::ptime start_access = boost::posix_time::microsec_clock::local_time();
    //  timer start - access
    for (int i = 0; i < SIZE; i++) {
        index = rand()%SIZE;
        for (int j = 0; j < SIZE; j++) {

            vector2d[index][index]++;
        }
    }
    boost::posix_time::ptime end = boost::posix_time::microsec_clock::local_time();
    boost::posix_time::time_duration msdiff = end - start_total;
    cout << "Vector total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
    msdiff = end - start_acess;
    cout << "Vector access time: " << msdiff.total_milliseconds() << "milliseconds.\n"; 


    int index(0);
    int** raw2d = nullptr;
    raw2d = new int*[SIZE];
    start_total = boost::posix_time::microsec_clock::local_time();
    //  timer start - build + access
    for (int i = 0; i < SIZE; i++) {
        raw2d[i] = new int[SIZE];
    }
    start_access = boost::posix_time::microsec_clock::local_time();
    //  timer start - access
    for (int i = 0; i < SIZE; i++) {
        index = rand()%SIZE;
        for (int j = 0; j < SIZE; j++) {

            raw2d[index][index]++;
        }
    }
    end = boost::posix_time::microsec_clock::local_time();
    msdiff = end - start_total;
    cout << "Array total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
    msdiff = end - start_acess;
    cout << "Array access time: " << msdiff.total_milliseconds() << "milliseconds.\n"; 
    for (int i = 0; i < SIZE; i++) {
        delete [] raw2d[i];
    }
    return 0;
}

The output is:

的输出是:

    Vector total time: 925milliseconds.
    Vector access time: 4milliseconds.
    Array total time: 30milliseconds.
    Array access time: 21milliseconds.

So the speed will be almost the same if you use it properly. (as others mentioned using reserve() or resize()).

所以如果使用得当,速度几乎是一样的。(正如其他人提到的使用reserve()或resize()))。

#18


0  

Well, because vector::resize() does much more processing than plain memory allocation (by malloc).

因为vector::resize()比普通内存分配(malloc)要做更多的处理。

Try to put a breakpoint in your copy constructor (define it so that you can breakpoint!) and there goes the additional processing time.

尝试在您的复制构造函数中添加一个断点(定义它以使您可以断点!),并且还有额外的处理时间。

#19


0  

I Have to say I'm not an expert in C++. But to add some experiments results:

我不得不说我不是c++方面的专家。但加入一些实验结果:

compile: gcc-6.2.0/bin/g++ -O3 -std=c++14 vector.cpp

编译:gcc-6.2.0/bin/g+ -O3 -std=c+ 14 vector.cpp

machine:

机:

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 

OS:

操作系统:

2.6.32-642.13.1.el6.x86_64

Output:

输出:

UseArray completed in 0.167821 seconds
UseVector completed in 0.134402 seconds
UseConstructor completed in 0.134806 seconds
UseFillConstructor completed in 1.00279 seconds
UseVectorPushBack completed in 6.6887 seconds
The whole thing completed in 8.12888 seconds

Here the only thing I feel strange is that "UseFillConstructor" performance compared with "UseConstructor".

这里唯一让我感到奇怪的是“UseFillConstructor”与“UseConstructor”的性能相比。

The code:

代码:

void UseConstructor()
{
    TestTimer t("UseConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension*dimension);
        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}


void UseFillConstructor()
{
    TestTimer t("UseFillConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension*dimension, Pixel(255,0,0));
    }
}

So the additional "value" provided slows down performance quite a lot, which I think is due to multiple call to copy constructor. But...

因此,提供的附加“值”大大降低了性能,我认为这是由于对copy构造函数的多次调用。但是…

Compile:

编译:

gcc-6.2.0/bin/g++ -std=c++14 -O vector.cpp

Output:

输出:

UseArray completed in 1.02464 seconds
UseVector completed in 1.31056 seconds
UseConstructor completed in 1.47413 seconds
UseFillConstructor completed in 1.01555 seconds
UseVectorPushBack completed in 6.9597 seconds
The whole thing completed in 11.7851 seconds

So in this case, gcc optimization is very important but it can't help you much when a value is provided as default. This, is against my tuition actually. Hopefully it helps new programmer when choose which vector initialization format.

因此,在这种情况下,gcc优化是非常重要的,但是当一个值作为默认值提供时,它并不能提供太多帮助。这个,其实是反对我的学费。希望它能帮助新程序员选择哪种矢量初始化格式。

#1


246  

Using the following:

使用以下:

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseArray completed in 2.196 seconds
UseVector completed in 4.412 seconds
UseVectorPushBack completed in 8.017 seconds
The whole thing completed in 14.626 seconds

g++ o3的时间。cpp -我< MyBoost >。/。UseArray跑完了2.196秒,跑完了4.412秒,跑完了8.017秒,跑完了14.626秒

So array is twice as quick as vector.

所以数组的速度是矢量的两倍。

But after looking at the code in more detail this is expected; as you run across the vector twice and the array only once. Note: when you resize() the vector you are not only allocating the memory but also running through the vector and calling the constructor on each member.

但在更详细地看了代码之后,这是预期的;当你在向量上运行两次,而数组只运行一次。注意:当您调整()向量的大小时,您不仅要分配内存,而且还要遍历向量并调用每个成员的构造函数。

Re-Arranging the code slightly so that the vector only initializes each object once:

稍微重新排列代码,使矢量只初始化每个对象一次:

 std::vector<Pixel>  pixels(dimensions * dimensions, Pixel(255,0,0));

Now doing the same timing again:

现在再次做同样的时机:

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseVector completed in 2.216 seconds

g++ o3的时间。cpp -我< MyBoost >。/。UseVector在2.216秒完成任务

The vector now performance only slightly worse than the array. IMO this difference is insignificant and could be caused by a whole bunch of things not associated with the test.

向量现在的性能只比数组差一点点。在我看来,这种差异是无关紧要的,它可能是由一系列与测试无关的事情造成的。

I would also take into account that you are not correctly initializing/Destroying the Pixel object in the UseArrray() method as neither constructor/destructor is not called (this may not be an issue for this simple class but anything slightly more complex (ie with pointers or members with pointers) will cause problems.

我还要考虑到,在UseArrray()方法中,您没有正确地初始化/销毁像素对象,因为没有调用构造函数/析构函数(对于这个简单的类来说,这可能不是问题,但是任何稍微复杂一点的东西(比如指针或指针成员)都会导致问题。

#2


52  

Great question. I came in here expecting to find some simple fix that would speed the vector tests right up. That didn't work out quite like I expected!

好问题。我来这里希望找到一些简单的方法来加速矢量测试。这和我预想的不一样!

Optimization helps, but it's not enough. With optimization on I'm still seeing a 2X performance difference between UseArray and UseVector. Interestingly, UseVector was significantly slower than UseVectorPushBack without optimization.

优化是有帮助的,但这还不够。随着优化,我仍然看到UseArray和UseVector之间的性能差异。有趣的是,在没有优化的情况下,UseVector比UseVectorPushBack要慢得多。

# g++ -Wall -Wextra -pedantic -o vector vector.cpp
# ./vector
UseArray completed in 20.68 seconds
UseVector completed in 120.509 seconds
UseVectorPushBack completed in 37.654 seconds
The whole thing completed in 178.845 seconds
# g++ -Wall -Wextra -pedantic -O3 -o vector vector.cpp
# ./vector
UseArray completed in 3.09 seconds
UseVector completed in 6.09 seconds
UseVectorPushBack completed in 9.847 seconds
The whole thing completed in 19.028 seconds

Idea #1 - Use new[] instead of malloc

I tried changing malloc() to new[] in UseArray so the objects would get constructed. And changing from individual field assignment to assigning a Pixel instance. Oh, and renaming the inner loop variable to j.

我尝试在UseArray中将malloc()更改为new[],以便构造对象。从单独的字段分配到分配一个像素实例。将内部循环变量重命名为j。

void UseArray()
{
    TestTimer t("UseArray");

    for(int i = 0; i < 1000; ++i)
    {   
        int dimension = 999;

        // Same speed as malloc().
        Pixel * pixels = new Pixel[dimension * dimension];

        for(int j = 0 ; j < dimension * dimension; ++j)
            pixels[j] = Pixel(255, 0, 0);

        delete[] pixels;
    }
}

Surprisingly (to me), none of those changes made any difference whatsoever. Not even the change to new[] which will default construct all of the Pixels. It seems that gcc can optimize out the default constructor calls when using new[], but not when using vector.

令人惊讶的是(对我来说),这些改变都没有带来任何改变。甚至没有对new[]的更改,它将默认构造所有的像素。似乎gcc在使用new[]时可以优化默认的构造函数调用,而在使用vector时则不能。

Idea #2 - Remove repeated operator[] calls

I also attempted to get rid of the triple operator[] lookup and cache the reference to pixels[j]. That actually slowed UseVector down! Oops.

我还试图摆脱三重运算符[]查找并缓存对像素的引用[j]。这实际上减慢了UseVector的速度!哦。

for(int j = 0; j < dimension * dimension; ++j)
{
    // Slower than accessing pixels[j] three times.
    Pixel &pixel = pixels[j];
    pixel.r = 255;
    pixel.g = 0;
    pixel.b = 0;
}

# ./vector 
UseArray completed in 3.226 seconds
UseVector completed in 7.54 seconds
UseVectorPushBack completed in 9.859 seconds
The whole thing completed in 20.626 seconds

Idea #3 - Remove constructors

What about removing the constructors entirely? Then perhaps gcc can optimize out the construction of all of the objects when the vectors are created. What happens if we change Pixel to:

完全删除构造函数怎么样?然后,也许gcc可以在创建向量时优化所有对象的构造。如果我们将像素改为:

struct Pixel
{
    unsigned char r, g, b;
};

Result: about 10% faster. Still slower than an array. Hm.

结果:大约快10%。仍然比数组慢。嗯。

# ./vector 
UseArray completed in 3.239 seconds
UseVector completed in 5.567 seconds

Idea #4 - Use iterator instead of loop index

How about using a vector<Pixel>::iterator instead of a loop index?

使用向量 ::iterator代替循环索引怎么样?

for (std::vector<Pixel>::iterator j = pixels.begin(); j != pixels.end(); ++j)
{
    j->r = 255;
    j->g = 0;
    j->b = 0;
}

Result:

结果:

# ./vector 
UseArray completed in 3.264 seconds
UseVector completed in 5.443 seconds

Nope, no different. At least it's not slower. I thought this would have performance similar to #2 where I used a Pixel& reference.

不,没有什么不同。至少它不慢。我认为它的性能与我使用Pixel& reference时的性能类似。

Conclusion

Even if some smart cookie figures out how to make the vector loop as fast as the array one, this does not speak well of the default behavior of std::vector. So much for the compiler being smart enough to optimize out all the C++ness and make STL containers as fast as raw arrays.

即使一些智能cookie知道如何使向量循环和数组一样快,这也不能很好地说明std::vector的默认行为。编译器足够聪明,可以优化所有c++特性,使STL容器和原始数组一样快。

The bottom line is that the compiler is unable to optimize away the no-op default constructor calls when using std::vector. If you use plain new[] it optimizes them away just fine. But not with std::vector. Even if you can rewrite your code to eliminate the constructor calls that flies in face of the mantra around here: "The compiler is smarter than you. The STL is just as fast as plain C. Don't worry about it."

最重要的是,当使用std:::vector时,编译器无法优化无op默认构造函数调用。如果你使用普通的new[],它会很好地优化它们。但不是用std::向量。即使您可以重写代码,以消除构造函数调用,这也会违背这里的格言:“编译器比您更聪明。”STL和普通c一样快,别担心。

#3


33  

To be fair, you cannot compare a C++ implementation to a C implementation, as I would call your malloc version. malloc does not create objects - it only allocates raw memory. That you then treat that memory as objects without calling the constructor is poor C++ (possibly invalid - I'll leave that to the language lawyers).

公平地说,您不能将c++实现与C实现进行比较,我将其称为malloc版本。malloc不创建对象——它只分配原始内存。然后,您将该内存视为对象,而不调用构造函数,这是很糟糕的c++(可能无效——我将把它留给语言律师)。

That said, simply changing the malloc to new Pixel[dimensions*dimensions] and free to delete [] pixels does not make much difference with the simple implementation of Pixel that you have. Here's the results on my box (E6600, 64-bit):

也就是说,简单地将malloc更改为新的像素[维度*维度],并且可以*地删除[]像素,这与您所拥有的像素的简单实现没有太大的差别。这是我的结果(E6600, 64位):

UseArray completed in 0.269 seconds
UseVector completed in 1.665 seconds
UseVectorPushBack completed in 7.309 seconds
The whole thing completed in 9.244 seconds

But with a slight change, the tables turn:

但是稍微改变一下,情况就变了:

Pixel.h

struct Pixel
{
    Pixel();
    Pixel(unsigned char r, unsigned char g, unsigned char b);

    unsigned char r, g, b;
};

Pixel.cc

#include "Pixel.h"

Pixel::Pixel() {}
Pixel::Pixel(unsigned char r, unsigned char g, unsigned char b) 
  : r(r), g(g), b(b) {}

main.cc

#include "Pixel.h"
[rest of test harness without class Pixel]
[UseArray now uses new/delete not malloc/free]

Compiled this way:

编译:

$ g++ -O3 -c -o Pixel.o Pixel.cc
$ g++ -O3 -c -o main.o main.cc
$ g++ -o main main.o Pixel.o

we get very different results:

我们得到了非常不同的结果:

UseArray completed in 2.78 seconds
UseVector completed in 1.651 seconds
UseVectorPushBack completed in 7.826 seconds
The whole thing completed in 12.258 seconds

With a non-inlined constructor for Pixel, std::vector now beats a raw array.

有了Pixel的非内联构造函数,std::vector现在打败了原始数组。

It would appear that the complexity of allocation through std::vector and std:allocator is too much to be optimised as effectively as a simple new Pixel[n]. However, we can see that the problem is simply with the allocation not the vector access by tweaking a couple of the test functions to create the vector/array once by moving it outside the loop:

似乎通过std::vector和std:分配器分配的复杂性太大了,不能像一个简单的新像素那样有效地优化[n]。但是,我们可以看到,问题仅仅是通过调整两个测试函数来创建一个矢量/数组,而不是通过将它移动到循环之外来创建矢量/数组。

void UseVector()
{
    TestTimer t("UseVector");

    int dimension = 999;
    std::vector<Pixel> pixels;
    pixels.resize(dimension * dimension);

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

and

void UseArray()
{
    TestTimer t("UseArray");

    int dimension = 999;
    Pixel * pixels = new Pixel[dimension * dimension];

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
    delete [] pixels;
}

We get these results now:

我们现在得到的结果是:

UseArray completed in 0.254 seconds
UseVector completed in 0.249 seconds
UseVectorPushBack completed in 7.298 seconds
The whole thing completed in 7.802 seconds

What we can learn from this is that std::vector is comparable to a raw array for access, but if you need to create and delete the vector/array many times, creating a complex object will be more time consuming that creating a simple array when the element's constructor is not inlined. I don't think that this is very surprising.

我们可以从中学到的是,std::vector可以与原始数组进行访问,但是如果您需要多次创建和删除vector/array,那么在元素的构造函数没有内联时,创建一个复杂的对象将花费更多的时间来创建一个简单的数组。我不认为这很令人惊讶。

#4


33  

This is an old but popular question.

这是一个古老而流行的问题。

At this point, many programmers will be working in C++11. And in C++11 the OP's code as written runs equally fast for UseArray or UseVector.

此时,许多程序员将使用c++ 11。在c++ 11中,对于UseArray或UseVector, OP的代码写得一样快。

UseVector completed in 3.74482 seconds
UseArray completed in 3.70414 seconds

The fundamental problem was that while your Pixel structure was uninitialized, std::vector<T>::resize( size_t, T const&=T() ) takes a default constructed Pixel and copies it. The compiler did not notice it was being asked to copy uninitialized data, so it actually performed the copy.

基本的问题是,当像素结构未初始化时,std::vector ::resize(size_t, T const&=T())获取一个默认构造的像素并复制它。编译器没有注意到它被要求复制未初始化的数据,所以它实际上执行了复制。

In C++11, std::vector<T>::resize has two overloads. The first is std::vector<T>::resize(size_t), the other is std::vector<T>::resize(size_t, T const&). This means when you invoke resize without a second argument, it simply default constructs, and the compiler is smart enough to realize that default construction does nothing, so it skips the pass over the buffer.

在c++ 11中,std::vector ::resize有两个重载。第一个是std:, vector ::resize(size_t),另一个是std::vector ::resize(size_t, T const&)。这意味着,当您在没有第二个参数的情况下调用resize时,它只是默认构造,而且编译器足够聪明,能够意识到默认构造不做任何事情,所以它跳过了缓冲区的传递。

(The two overloads where added to handle movable, constructable and non-copyable types -- the performance improvement when working on uninitialized data is a bonus).

(添加到处理可移动、可构造和不可复制类型的两个重载——处理未初始化数据时的性能改进是一个额外的好处)。

The push_back solution also does fencepost checking, which slows it down, so it remains slower than the malloc version.

push_back解决方案还进行了fencepost检查,这降低了它的速度,因此它仍然比malloc版本慢。

live example (I also replaced the timer with chrono::high_resolution_clock).

live example(我也用chrono::high_resolution_clock替换了计时器)。

Note that if you have a structure that usually requires initialization, but you want to handle it after growing your buffer, you can do this with a custom std::vector allocator. If you want to then move it into a more normal std::vector, I believe careful use of allocator_traits and overriding of == might pull that off, but am unsure.

注意,如果您有一个结构,通常需要初始化,但是您希望在扩展缓冲区之后处理它,那么可以使用自定义的std::vector分配器来实现这一点。如果您想要将它移动到更普通的std::vector中,我认为仔细使用allocator_traits和重写== =可能会成功,但我不确定。

#5


26  

Try with this:

试试这个:

void UseVectorCtor()
{
    TestTimer t("UseConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));
    }
}

I get almost exactly the same performance as with array.

我得到的性能与数组几乎完全相同。

The thing about vector is that it's a much more general tool than an array. And that means you have to consider how you use it. It can be used in a lot of different ways, providing functionality that an array doesn't even have. And if you use it "wrong" for your purpose, you incur a lot of overhead, but if you use it correctly, it is usually basically a zero-overhead data structure. In this case, the problem is that you separately initialized the vector (causing all elements to have their default ctor called), and then overwriting each element individually with the correct value. That is much harder for the compiler to optimize away than when you do the same thing with an array. Which is why the vector provides a constructor which lets you do exactly that: initialize N elements with value X.

关于向量,它是一个比数组更通用的工具。这意味着你必须考虑如何使用它。它可以以许多不同的方式使用,提供数组甚至没有的功能。如果您的目的是“错误的”,那么您就会产生大量的开销,但是如果您正确地使用它,它通常基本上是一个零开销的数据结构。在这种情况下,问题在于分别初始化向量(使所有元素都调用它们的默认ctor),然后用正确的值分别覆盖每个元素。对编译器来说,这比对数组做同样的事情要难得多。这就是为什么vector提供了一个构造函数,它可以让您精确地实现:用值X初始化N个元素。

And when you use that, the vector is just as fast as an array.

当你使用它时,这个向量就像一个数组一样快。

So no, you haven't busted the performance myth. But you have shown that it's only true if you use the vector optimally, which is a pretty good point too. :)

所以,不,你没有打破性能神话。但是你已经证明了只有当你用最优向量时它才成立,这也是一个很好的点。:)

On the bright side, it's really the simplest usage that turns out to be fastest. If you contrast my code snippet (a single line) with John Kugelman's answer, containing heaps and heaps of tweaks and optimizations, which still don't quite eliminate the performance difference, it's pretty clear that vector is pretty cleverly designed after all. You don't have to jump through hoops to get speed equal to an array. On the contrary, you have to use the simplest possible solution.

从好的方面来看,它是最简单的用法,结果是最快的。如果您将我的代码片段(一行)与John Kugelman的答案进行对比,其中包含大量的微调和优化,但仍然不能完全消除性能差异,那么很明显,vector毕竟设计得相当巧妙。你不需要跳圈就能得到速度等于一个数组。相反,你必须使用最简单的可能的解决方案。

#6


21  

It was hardly a fair comparison when I first looked at your code; I definitely thought you weren't comparing apples with apples. So I thought, let's get constructors and destructors being called on all tests; and then compare.

当我第一次看到你的代码时,这很难说是一个公平的比较;我肯定你不是在拿苹果和苹果做比较。所以我想,让我们在所有测试中调用构造函数和析构函数;然后进行比较。

const size_t dimension = 1000;

void UseArray() {
    TestTimer t("UseArray");
    for(size_t j = 0; j < dimension; ++j) {
        Pixel* pixels = new Pixel[dimension * dimension];
        for(size_t i = 0 ; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
        delete[] pixels;
    }
}

void UseVector() {
    TestTimer t("UseVector");
    for(size_t j = 0; j < dimension; ++j) {
        std::vector<Pixel> pixels(dimension * dimension);
        for(size_t i = 0; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
    }
}

int main() {
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();

    return 0;
}

My thoughts were, that with this setup, they should be exactly the same. It turns out, I was wrong.

我的想法是,在这个设置下,它们应该是完全相同的。结果证明,我错了。

UseArray completed in 3.06 seconds
UseVector completed in 4.087 seconds
The whole thing completed in 10.14 seconds

So why did this 30% performance loss even occur? The STL has everything in headers, so it should have been possible for the compiler to understand everything that was required.

为什么30%的性能损失会发生呢?STL包含了头文件中的所有内容,因此编译器应该能够理解所需的所有内容。

My thoughts were that it is in how the loop initialises all values to the default constructor. So I performed a test:

我的想法是,循环是如何将所有值初始化为默认构造函数的。所以我做了一个测试:

class Tester {
public:
    static int count;
    static int count2;
    Tester() { count++; }
    Tester(const Tester&) { count2++; }
};
int Tester::count = 0;
int Tester::count2 = 0;

int main() {
    std::vector<Tester> myvec(300);
    printf("Default Constructed: %i\nCopy Constructed: %i\n", Tester::count, Tester::count2);

    return 0;
}

The results were as I suspected:

结果正如我猜想的那样:

Default Constructed: 1
Copy Constructed: 300

This is clearly the source of the slowdown, the fact that the vector uses the copy constructor to initialise the elements from a default constructed object.

这显然是放缓的根源,事实上向量使用复制构造函数从默认构造的对象初始化元素。

This means, that the following pseudo-operation order is happening during construction of the vector:

这意味着,在向量的构建过程中会发生以下伪操作顺序:

Pixel pixel;
for (auto i = 0; i < N; ++i) vector[i] = pixel;

Which, due to the implicit copy constructor made by the compiler, is expanded to the following:

由于编译器所作的隐式复制构造函数,将其扩展为:

Pixel pixel;
for (auto i = 0; i < N; ++i) {
    vector[i].r = pixel.r;
    vector[i].g = pixel.g;
    vector[i].b = pixel.b;
}

So the default Pixel remains un-initialised, while the rest are initialised with the default Pixel's un-initialised values.

因此,默认像素仍然是未初始化的,而其余像素则用默认像素的未初始化值初始化。

Compared to the alternative situation with New[]/Delete[]:

与新[]/删除[]的替代情况相比:

int main() {
    Tester* myvec = new Tester[300];

    printf("Default Constructed: %i\nCopy Constructed:%i\n", Tester::count, Tester::count2);

    delete[] myvec;

    return 0;
}

Default Constructed: 300
Copy Constructed: 0

They are all left to their un-initialised values, and without the double iteration over the sequence.

它们都留给未初始化的值,并且不需要对序列进行重复迭代。

Armed with this information, how can we test it? Let's try over-writing the implicit copy constructor.

有了这些信息,我们如何进行测试?让我们尝试重写隐式复制构造函数。

Pixel(const Pixel&) {}

And the results?

和结果?

UseArray completed in 2.617 seconds
UseVector completed in 2.682 seconds
The whole thing completed in 5.301 seconds

So in summary, if you're making hundreds of vectors very often: re-think your algorithm.

总结一下,如果你经常画几百个向量:重新考虑你的算法。

In any case, the STL implementation isn't slower for some unknown reason, it just does exactly what you ask; hoping you know better.

无论如何,STL实现并不会因为某些未知的原因而变慢,它只是按照您的要求执行;希望你知道更好。

#7


7  

Try disabling checked iterators and building in release mode. You shouldn't see much of a performance difference.

尝试禁用检查迭代器并在发布模式中构建。您不应该看到性能差异太大。

#8


4  

GNU's STL (and others), given vector<T>(n), default constructs a prototypal object T() - the compiler will optimise away the empty constructor - but then a copy of whatever garbage happened to be in the memory addresses now reserved for the object is taken by the STL's __uninitialized_fill_n_aux, which loops populating copies of that object as the default values in the vector. So, "my" STL is not looping constructing, but constructing then loop/copying. It's counter intuitive, but I should have remembered as I commented on a recent * question about this very point: the construct/copy can be more efficient for reference counted objects etc..

GNU的STL(和其他人),给出向量< T >(n),默认构造原型的对象T()——编译器会优化掉空构造函数,然后复制任何垃圾碰巧在对象的内存地址现在保留了STL的__uninitialized_fill_n_aux循环填充该对象的副本为默认值的向量。因此,“my”STL不是循环构造,而是构造循环/复制。这是违反直觉的,但我应该记得最近关于这一点的*问题的评论:构造/复制对于引用计数对象等来说更有效。

So:

所以:

vector<T> x(n);

or

vector<T> x;
x.resize(n);

is - on many STL implementations - something like:

在许多STL实现中,is是:

T temp;
for (int i = 0; i < n; ++i)
    x[i] = temp;

The issue being that the current generation of compiler optimisers don't seem to work from the insight that temp is uninitialised garbage, and fail to optimise out the loop and default copy constructor invocations. You could credibly argue that compilers absolutely shouldn't optimise this away, as a programmer writing the above has a reasonable expectation that all the objects will be identical after the loop, even if garbage (usual caveats about 'identical'/operator== vs memcmp/operator= etc apply). The compiler can't be expected to have any extra insight into the larger context of std::vector<> or the later usage of the data that would suggest this optimisation safe.

问题是,当前的编译器优化器似乎不能工作,因为temp是未初始化的垃圾,并且未能优化循环和默认的复制构造函数调用。您可以令人相信地认为编译器绝对不应该优化这一点,因为程序员编写上面的代码有一个合理的期望,即在循环之后所有对象都是相同的,即使是垃圾(关于“相同的”/操作符==和memcmp/运算符=等)。不能期望编译器对std::vector<>或稍后会建议这种优化安全的数据的使用情况有任何额外的了解。

This can be contrasted with the more obvious, direct implementation:

这可以与更明显、更直接的执行作比较:

for (int i = 0; i < n; ++i)
    x[i] = T();

Which we can expect a compiler to optimise out.

我们可以期望编译器进行优化。

To be a bit more explicit about the justification for this aspect of vector's behaviour, consider:

为了更明确地说明向量行为这一方面的理由,请考虑:

std::vector<big_reference_counted_object> x(10000);

Clearly it's a major difference if we make 10000 independent objects versus 10000 referencing the same data. There's a reasonable argument that the advantage of protecting casual C++ users from accidentally doing something so expensive outweights the very small real-world cost of hard-to-optimise copy construction.

显然,如果我们让10000个独立的对象与10000个引用相同数据的对象相比,这是一个很大的不同。有一种合理的观点认为,保护临时的c++用户不被意外地做这么昂贵的事情的好处远远超过了难以优化的拷贝结构的实际成本。

ORIGINAL ANSWER (for reference / making sense of the comments): No chance. vector is as fast as an array, at least if you reserve space sensibly. ...

原始回答(供参考/理解评论):不可能。向量就像数组一样快,至少如果你合理地保留空间的话。

#9


3  

Martin York's answer bothers me because it seems like an attempt to brush the initialisation problem under the carpet. But he is right to identify redundant default construction as the source of performance problems.

马丁·约克(Martin York)的回答让我感到困扰,因为这似乎是在掩盖初始化问题。但是,他认为冗余的默认构造是性能问题的根源是正确的。

[EDIT: Martin's answer no longer suggests changing the default constructor.]

[编辑:马丁的答案不再建议修改默认构造函数。]

For the immediate problem at hand, you could certainly call the 2-parameter version of the vector<Pixel> ctor instead:

对于眼前的问题,您当然可以将向量的2参数版本称为 ctor:

std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));

That works if you want to initialise with a constant value, which is a common case. But the more general problem is: How can you efficiently initialise with something more complicated than a constant value?

如果你想要初始化一个常数值,这是很常见的情况。但更普遍的问题是:如何有效地初始化一些比常量值更复杂的东西呢?

For this you can use a back_insert_iterator, which is an iterator adaptor. Here's an example with a vector of ints, although the general idea works just as well for Pixels:

为此,您可以使用back_insert_iterator,它是一个迭代器适配器。这里有一个ints向量的例子,尽管对于像素来说一般的思想同样适用:

#include <iterator>
// Simple functor return a list of squares: 1, 4, 9, 16...
struct squares {
    squares() { i = 0; }
    int operator()() const { ++i; return i * i; }

private:
    int i;
};

...

std::vector<int> v;
v.reserve(someSize);     // To make insertions efficient
std::generate_n(std::back_inserter(v), someSize, squares());

Alternatively you could use copy() or transform() instead of generate_n().

您也可以使用copy()或transform()代替generate_n()。

The downside is that the logic to construct the initial values needs to be moved into a separate class, which is less convenient than having it in-place (although lambdas in C++1x make this much nicer). Also I expect this will still not be as fast as a malloc()-based non-STL version, but I expect it will be close, since it only does one construction for each element.

缺点是构造初始值的逻辑需要转移到一个单独的类中,这比将其放置在适当的位置要不那么方便(尽管c++ 1x中的lambdas使其变得更好)。此外,我预计这仍然不会像基于malloc()的非stl版本那样快,但我预计它会很接近,因为它只对每个元素执行一个构造。

#10


2  

The vector ones are additionally calling Pixel constructors.

矢量图是另外的调用像素构造函数。

Each is causing almost a million ctor runs that you're timing.

每一种都导致了大约一百万次的ctor运行。

edit: then there's the outer 1...1000 loop, so make that a billion ctor calls!

编辑:然后是外部1……1000个循环,让十亿的ctor调用!

edit 2: it'd be interesting to see the disassembly for the UseArray case. An optimizer could optimize the whole thing away, since it has no effect other than burning CPU.

编辑2:看看UseArray案例的拆解会很有趣。优化器可以优化整个东西,因为它除了消耗CPU之外没有其他效果。

#11


1  

Here's how the push_back method in vector works:

向量中的push_back方法是这样工作的:

  1. The vector allocates X amount of space when it is initialized.
  2. 向量在初始化时分配X个空间量。
  3. As stated below it checks if there is room in the current underlying array for the item.
  4. 如下面所述,它检查当前底层数组中是否有空间用于项。
  5. It makes a copy of the item in the push_back call.
  6. 它在push_back调用中复制项目。

After calling push_back X items:

调用push_back X后:

  1. The vector reallocates kX amount of space into a 2nd array.
  2. 向量将kX的空间重新分配到第二个数组中。
  3. It Copies the entries of the first array onto the second.
  4. 它将第一个数组的条目复制到第二个数组中。
  5. Discards the first array.
  6. 丢弃第一个数组。
  7. Now uses the second array as storage until it reaches kX entries.
  8. 现在使用第二个数组作为存储,直到它到达kX条目。

Repeat. If you're not reserving space its definitely going to be slower. More than that, if it's expensive to copy the item then 'push_back' like that is going to eat you alive.

重复。如果你不预留空间,它肯定会更慢。更重要的是,如果复制这个东西很昂贵,那么像这样的“push_back”会把你活活吃掉。

As to the vector versus array thing, I'm going to have to agree with the other people. Run in release, turn optimizations on, and put in a few more flags so that the friendly people at Microsoft don't #@%$^ it up for ya.

对于向量与数组的关系,我必须同意其他人的观点。在发布运行,打开优化,把更多的旗帜,这样友好的人在微软不# @ % $ ^丫。

One more thing, if you don't need to resize, use Boost.Array.

还有一件事,如果不需要调整大小,可以使用boot . array。

#12


1  

Some profiler data (pixel is aligned to 32 bits):

一些分析器数据(像素对齐到32位):

g++ -msse3 -O3 -ftree-vectorize -g test.cpp -DNDEBUG && ./a.out
UseVector completed in 3.123 seconds
UseArray completed in 1.847 seconds
UseVectorPushBack completed in 9.186 seconds
The whole thing completed in 14.159 seconds

Blah

废话

andrey@nv:~$ opannotate --source libcchem/src/a.out  | grep "Total samples for file" -A3
Overflow stats not available
 * Total samples for file : "/usr/include/c++/4.4/ext/new_allocator.h"
 *
 * 141008 52.5367
 */
--
 * Total samples for file : "/home/andrey/libcchem/src/test.cpp"
 *
 *  61556 22.9345
 */
--
 * Total samples for file : "/usr/include/c++/4.4/bits/stl_vector.h"
 *
 *  41956 15.6320
 */
--
 * Total samples for file : "/usr/include/c++/4.4/bits/stl_uninitialized.h"
 *
 *  20956  7.8078
 */
--
 * Total samples for file : "/usr/include/c++/4.4/bits/stl_construct.h"
 *
 *   2923  1.0891
 */

In allocator:

在分配器:

               :      // _GLIBCXX_RESOLVE_LIB_DEFECTS
               :      // 402. wrong new expression in [some_] allocator::construct
               :      void
               :      construct(pointer __p, const _Tp& __val)
141008 52.5367 :      { ::new((void *)__p) _Tp(__val); }

vector:

向量:

               :void UseVector()
               :{ /* UseVector() total:  60121 22.3999 */
...
               :
               :
 10790  4.0201 :        for (int i = 0; i < dimension * dimension; ++i) {
               :
   495  0.1844 :            pixels[i].r = 255;
               :
 12618  4.7012 :            pixels[i].g = 0;
               :
  2253  0.8394 :            pixels[i].b = 0;
               :
               :        }

array

数组

               :void UseArray()
               :{ /* UseArray() total:  35191 13.1114 */
               :
...
               :
   136  0.0507 :        for (int i = 0; i < dimension * dimension; ++i) {
               :
  9897  3.6874 :            pixels[i].r = 255;
               :
  3511  1.3081 :            pixels[i].g = 0;
               :
 21647  8.0652 :            pixels[i].b = 0;

Most of the overhead is in the copy constructor. For example,

大部分开销都在复制构造函数中。例如,

    std::vector < Pixel > pixels;//(dimension * dimension, Pixel());

    pixels.reserve(dimension * dimension);

    for (int i = 0; i < dimension * dimension; ++i) {

        pixels[i].r = 255;

        pixels[i].g = 0;

        pixels[i].b = 0;
    }

It has the same performance as an array.

它的性能与数组相同。

#13


1  

My laptop is Lenova G770 (4 GB RAM).

我的笔记本是Lenova G770 (4gb内存)。

The OS is Windows 7 64-bit (the one with laptop)

操作系统是Windows 7 64位(使用笔记本电脑的操作系统)

Compiler is MinGW 4.6.1.

编译器MinGW 4.6.1。

The IDE is Code::Blocks.

IDE代码::块。

I test the source codes of the first post.

我测试第一篇文章的源代码。

The results

O2 optimization

O2优化

UseArray completed in 2.841 seconds

UseArray只用了2.841秒就完成了

UseVector completed in 2.548 seconds

UseVector在2.548秒完成任务

UseVectorPushBack completed in 11.95 seconds

UseVectorPushBack在11.95秒内完成。

The whole thing completed in 17.342 seconds

整个过程只用了17.342秒就完成了

system pause

系统暂停

O3 optimization

O3优化

UseArray completed in 1.452 seconds

UseArray在1.452秒内完成

UseVector completed in 2.514 seconds

UseVector在2.514秒完成任务

UseVectorPushBack completed in 12.967 seconds

UseVectorPushBack以12.967秒完成

The whole thing completed in 16.937 seconds

整个过程用时16.937秒

It looks like the performance of vector is worse under O3 optimization.

在O3优化下,向量的性能似乎更差。

If you change the loop to

如果将循环改为

    pixels[i].r = i;
    pixels[i].g = i;
    pixels[i].b = i;

The speed of array and vector under O2 and O3 are almost the same.

在O2和O3下,阵列和矢量的速度几乎相同。

#14


1  

A better benchmark (I think...), compiler due to optimizations can change code, becouse results of allocated vectors/arrays are not used anywhere. Results:

一个更好的基准(我认为…),由于优化的编译器可以更改代码,因为分配的向量/数组的结果在任何地方都不使用。结果:

$ g++ test.cpp -o test -O3 -march=native
$ ./test 
UseArray inner completed in 0.652 seconds
UseArray completed in 0.773 seconds
UseVector inner completed in 0.638 seconds
UseVector completed in 0.757 seconds
UseVectorPushBack inner completed in 6.732 seconds
UseVectorPush completed in 6.856 seconds
The whole thing completed in 8.387 seconds

Compiler:

编译器:

gcc version 6.2.0 20161019 (Debian 6.2.0-9)

CPU:

CPU:

model name  : Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz

And the code:

和代码:

#include <cstdlib>
#include <vector>

#include <iostream>
#include <string>

#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>

class TestTimer
{
    public:
        TestTimer(const std::string & name) : name(name),
            start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
        {
        }

        ~TestTimer()
        {
            using namespace std;
            using namespace boost;

            posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
            posix_time::time_duration d = now - start;

            cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
                " seconds" << endl;
        }

    private:
        std::string name;
        boost::posix_time::ptime start;
};

struct Pixel
{
    Pixel()
    {
    }

    Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
    {
    }

    unsigned char r, g, b;
};

void UseVector(std::vector<std::vector<Pixel> >& results)
{
    TestTimer t("UseVector inner");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel>& pixels = results.at(i);
        pixels.resize(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

void UseVectorPushBack(std::vector<std::vector<Pixel> >& results)
{
    TestTimer t("UseVectorPushBack inner");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel>& pixels = results.at(i);
            pixels.reserve(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
            pixels.push_back(Pixel(255, 0, 0));
    }
}

void UseArray(Pixel** results)
{
    TestTimer t("UseArray inner");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);

        results[i] = pixels;

        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }

        // free(pixels);
    }
}

void UseArray()
{
    TestTimer t("UseArray");
    Pixel** array = (Pixel**)malloc(sizeof(Pixel*)* 1000);
    UseArray(array);
    for(int i=0;i<1000;++i)
        free(array[i]);
    free(array);
}

void UseVector()
{
    TestTimer t("UseVector");
    {
        std::vector<std::vector<Pixel> > vector(1000, std::vector<Pixel>());
        UseVector(vector);
    }
}

void UseVectorPushBack()
{
    TestTimer t("UseVectorPush");
    {
        std::vector<std::vector<Pixel> > vector(1000, std::vector<Pixel>());
        UseVectorPushBack(vector);
    }
}


int main()
{
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();
    UseVectorPushBack();

    return 0;
}

#15


0  

With the right options, vectors and arrays can generate identical asm. In these cases, they are of course the same speed, because you get the same executable file either way.

通过正确的选项,向量和数组可以生成相同的asm。在这些情况下,它们当然是相同的速度,因为无论哪种方式都可以得到相同的可执行文件。

#16


0  

By the way the slow down your seeing in classes using vector also occurs with standard types like int. Heres a multithreaded code:

顺便说一下,在使用vector的类中,你看到的东西变慢了,这种情况也发生在标准类型中,比如int。

#include <iostream>
#include <cstdio>
#include <map>
#include <string>
#include <typeinfo>
#include <vector>
#include <pthread.h>
#include <sstream>
#include <fstream>
using namespace std;

//pthread_mutex_t map_mutex=PTHREAD_MUTEX_INITIALIZER;

long long num=500000000;
int procs=1;

struct iterate
{
    int id;
    int num;
    void * member;
    iterate(int a, int b, void *c) : id(a), num(b), member(c) {}
};

//fill out viterate and piterate
void * viterate(void * input)
{
    printf("am in viterate\n");
    iterate * info=static_cast<iterate *> (input);
    // reproduce member type
    vector<int> test= *static_cast<vector<int>*> (info->member);
    for (int i=info->id; i<test.size(); i+=info->num)
    {
        //printf("am in viterate loop\n");
        test[i];
    }
    pthread_exit(NULL);
}

void * piterate(void * input)
{
    printf("am in piterate\n");
    iterate * info=static_cast<iterate *> (input);;
    int * test=static_cast<int *> (info->member);
    for (int i=info->id; i<num; i+=info->num) {
        //printf("am in piterate loop\n");
        test[i];
    }
    pthread_exit(NULL);
}

int main()
{
    cout<<"producing vector of size "<<num<<endl;
    vector<int> vtest(num);
    cout<<"produced  a vector of size "<<vtest.size()<<endl;
    pthread_t thread[procs];

    iterate** it=new iterate*[procs];
    int ans;
    void *status;

    cout<<"begining to thread through the vector\n";
    for (int i=0; i<procs; i++) {
        it[i]=new iterate(i, procs, (void *) &vtest);
    //  ans=pthread_create(&thread[i],NULL,viterate, (void *) it[i]);
    }
    for (int i=0; i<procs; i++) {
        pthread_join(thread[i], &status);
    }
    cout<<"end of threading through the vector";
    //reuse the iterate structures

    cout<<"producing a pointer with size "<<num<<endl;
    int * pint=new int[num];
    cout<<"produced a pointer with size "<<num<<endl;

    cout<<"begining to thread through the pointer\n";
    for (int i=0; i<procs; i++) {
        it[i]->member=&pint;
        ans=pthread_create(&thread[i], NULL, piterate, (void*) it[i]);
    }
    for (int i=0; i<procs; i++) {
        pthread_join(thread[i], &status);
    }
    cout<<"end of threading through the pointer\n";

    //delete structure array for iterate
    for (int i=0; i<procs; i++) {
        delete it[i];
    }
    delete [] it;

    //delete pointer
    delete [] pint;

    cout<<"end of the program"<<endl;
    return 0;
}

The behavior from the code shows the instantiation of vector is the longest part of the code. Once you get through that bottle neck. The rest of the code runs extremely fast. This is true no matter how many threads you are running on.

代码中的行为表明向量的实例化是代码中最长的部分。一旦你穿过那个瓶颈。其余的代码运行得非常快。无论您正在运行多少线程,这都是正确的。

By the way ignore the absolutely insane number of includes. I have been using this code to test things for a project so the number of includes keep growing.

顺便说一下,忽略绝对疯狂的包含数。我一直在使用这段代码来测试一个项目的内容,所以包含的数量在不断增加。

#17


0  

I just want to mention that vector (and smart_ptr) is just a thin layer add on top of raw arrays (and raw pointers). And actually the access time of an vector in continuous memory is faster than array. The following code shows the result of initialize and access vector and array.

我只想提一下,向量(和smart_ptr)只是在原始数组(和原始指针)之上添加的一个薄层。实际上,一个向量在连续存储器中的访问时间比数组要快。下面的代码显示了初始化和访问向量和数组的结果。

#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
#include <vector>
#define SIZE 20000
int main() {
    srand (time(NULL));
    vector<vector<int>> vector2d;
    vector2d.reserve(SIZE);
    int index(0);
    boost::posix_time::ptime start_total = boost::posix_time::microsec_clock::local_time();
    //  timer start - build + access
    for (int i = 0; i < SIZE; i++) {
        vector2d.push_back(vector<int>(SIZE));
    }
    boost::posix_time::ptime start_access = boost::posix_time::microsec_clock::local_time();
    //  timer start - access
    for (int i = 0; i < SIZE; i++) {
        index = rand()%SIZE;
        for (int j = 0; j < SIZE; j++) {

            vector2d[index][index]++;
        }
    }
    boost::posix_time::ptime end = boost::posix_time::microsec_clock::local_time();
    boost::posix_time::time_duration msdiff = end - start_total;
    cout << "Vector total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
    msdiff = end - start_acess;
    cout << "Vector access time: " << msdiff.total_milliseconds() << "milliseconds.\n"; 


    int index(0);
    int** raw2d = nullptr;
    raw2d = new int*[SIZE];
    start_total = boost::posix_time::microsec_clock::local_time();
    //  timer start - build + access
    for (int i = 0; i < SIZE; i++) {
        raw2d[i] = new int[SIZE];
    }
    start_access = boost::posix_time::microsec_clock::local_time();
    //  timer start - access
    for (int i = 0; i < SIZE; i++) {
        index = rand()%SIZE;
        for (int j = 0; j < SIZE; j++) {

            raw2d[index][index]++;
        }
    }
    end = boost::posix_time::microsec_clock::local_time();
    msdiff = end - start_total;
    cout << "Array total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
    msdiff = end - start_acess;
    cout << "Array access time: " << msdiff.total_milliseconds() << "milliseconds.\n"; 
    for (int i = 0; i < SIZE; i++) {
        delete [] raw2d[i];
    }
    return 0;
}

The output is:

的输出是:

    Vector total time: 925milliseconds.
    Vector access time: 4milliseconds.
    Array total time: 30milliseconds.
    Array access time: 21milliseconds.

So the speed will be almost the same if you use it properly. (as others mentioned using reserve() or resize()).

所以如果使用得当,速度几乎是一样的。(正如其他人提到的使用reserve()或resize()))。

#18


0  

Well, because vector::resize() does much more processing than plain memory allocation (by malloc).

因为vector::resize()比普通内存分配(malloc)要做更多的处理。

Try to put a breakpoint in your copy constructor (define it so that you can breakpoint!) and there goes the additional processing time.

尝试在您的复制构造函数中添加一个断点(定义它以使您可以断点!),并且还有额外的处理时间。

#19


0  

I Have to say I'm not an expert in C++. But to add some experiments results:

我不得不说我不是c++方面的专家。但加入一些实验结果:

compile: gcc-6.2.0/bin/g++ -O3 -std=c++14 vector.cpp

编译:gcc-6.2.0/bin/g+ -O3 -std=c+ 14 vector.cpp

machine:

机:

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz 

OS:

操作系统:

2.6.32-642.13.1.el6.x86_64

Output:

输出:

UseArray completed in 0.167821 seconds
UseVector completed in 0.134402 seconds
UseConstructor completed in 0.134806 seconds
UseFillConstructor completed in 1.00279 seconds
UseVectorPushBack completed in 6.6887 seconds
The whole thing completed in 8.12888 seconds

Here the only thing I feel strange is that "UseFillConstructor" performance compared with "UseConstructor".

这里唯一让我感到奇怪的是“UseFillConstructor”与“UseConstructor”的性能相比。

The code:

代码:

void UseConstructor()
{
    TestTimer t("UseConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension*dimension);
        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}


void UseFillConstructor()
{
    TestTimer t("UseFillConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension*dimension, Pixel(255,0,0));
    }
}

So the additional "value" provided slows down performance quite a lot, which I think is due to multiple call to copy constructor. But...

因此,提供的附加“值”大大降低了性能,我认为这是由于对copy构造函数的多次调用。但是…

Compile:

编译:

gcc-6.2.0/bin/g++ -std=c++14 -O vector.cpp

Output:

输出:

UseArray completed in 1.02464 seconds
UseVector completed in 1.31056 seconds
UseConstructor completed in 1.47413 seconds
UseFillConstructor completed in 1.01555 seconds
UseVectorPushBack completed in 6.9597 seconds
The whole thing completed in 11.7851 seconds

So in this case, gcc optimization is very important but it can't help you much when a value is provided as default. This, is against my tuition actually. Hopefully it helps new programmer when choose which vector initialization format.

因此,在这种情况下,gcc优化是非常重要的,但是当一个值作为默认值提供时,它并不能提供太多帮助。这个,其实是反对我的学费。希望它能帮助新程序员选择哪种矢量初始化格式。