使用-std = gnu ++ 11时发现了哪些已知的性能差异

I've been working on a Genetic Algorithm which I'd previously been compiling using g++ 4.8.1 with the arguments

我一直在研究一种遗传算法,我以前一直在使用g ++ 4.8.1编译参数

CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse -march=native -std=gnu++11

I wasn't using many of the features of c++11 and have a reasonable profiling system so I replaced literally 3-4 lines of code and had it compile without -std=gnu++11

我没有使用c ++ 11的许多功能并且有一个合理的分析系统所以我替换了3-4行代码并且在不使用-std = gnu ++ 11的情况下进行编译

CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse -march=native

When I ran my profiler again, I noticed that I could see ~5% performance improvement almost everywhere, except for my sort function, which was now taking about twice as long. (It's an overloaded operator< on the object)

当我再次运行我的探查器时,我注意到除了我的排序功能之外,我几乎可以在任何地方看到约5%的性能提升,现在排序时间大约是其两倍。 (它是一个重载的运算符 <在对象上)< p>

My questions are:

我的问题是:

What performance differences are known between the two versions, and is it expected that c++11 would be faster in newer compilers?

两个版本之间存在哪些性能差异,是否预计c ++ 11在新编译器中会更快?

I'm also expecting the fact I'm using -Ofast is playing a role, am I right in my assumption?

我也期待我正在使用的事实--Ofast正在扮演一个角色,我的假设是正确的吗?

UPDATE:

As suggested in comments I ran the tests again using with and without -march=native

正如评论中所建议的那样,我使用with和without -march = native再次运行测试

// Fast sort, slightly slower in other tests
CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse -march=native -std=gnu++11  

// Fast sort, slower in other tests
CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse -std=gnu++11  

// Slow sort, slower in other tests
CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse                     

// Slow sort, fastest in other tests
CCFLAGS=-c -Wall  -Ofast -fopenmp -mfpmath=sse  -march=native

The conclusion seems to be the same that -std=gnu++11 speeds up sort drastically with a slight penalty almost everywhere else. -march=native speeds up program whenever used.

结论似乎是相同的 - -std = gnu ++ 11大幅加速排序,几乎在其他任何地方都会受到轻微的惩罚。 -march = native使用时加速程序。

Given that sort is only called once per generation, I'll take the speed benefit of not compiling with -std=gnu++11, but I'm still very interested in what is causing these results.

鉴于每一代只调用一次,我将采用不使用-std = gnu ++ 11编译的速度优势,但我仍然对导致这些结果的原因非常感兴趣。

I'm using the // std::sort provided from #include

我正在使用#include提供的// std :: sort

4 个解决方案

#1

I am not certain why using --std=gnu++11 would make parts of the code slower. I do not use that personally (instead, I use --std=c++11). Perhaps the extra GNU features are slowing something down? More likely, the optimization hasn't caught up with the new language features yet.

我不确定为什么使用--std = gnu ++ 11会使部分代码变慢。我个人不使用它(相反,我使用--std = c ++ 11)。也许额外的GNU功能正在减慢一些东西?更有可能的是,优化还没有赶上新的语言功能。

As for why the sort part is faster, I have a plausible explanation:

至于为什么排序部分更快,我有一个似是而非的解释:

You've enabled move semantics. Even if you don't explicitly write them yourself, if your classes are reasonably constructed, they will be generated. The "sort" algorithm probably takes advantage of them.

您已启用移动语义。即使你没有自己明确地编写它们,如果你的类是合理构造的,它们也会被生成。 “排序”算法可能会利用它们。

However, the class you've listed above doesn't seem to have much storage. However, it does not have a "swap" method, so without C++11 move semantics, the sort routine must do more work. You might look at this question and answers for more about sort and move semantics and interactions with compiler options.

但是,您上面列出的类似乎没有太多存储空间。但是,它没有“交换”方法,所以如果没有C ++ 11移动语义,排序例程必须做更多的工作。您可以查看此问题和答案,了解有关排序和移动语义以及与编译器选项的交互的更多信息。

#2

There has been a lot of interest in why the sort method had such a performance drop.

人们对排序方法出现这种性能下降的原因很感兴趣。

I'm more interested in why the remaining code saw a good improvement, but to help conversation, below is the only part of my code which was quicker under -std=gnu++11

我更感兴趣的是为什么剩下的代码看到了很好的改进,但为了帮助对话,下面是我的代码的唯一部分,在-std = gnu ++ 11下更快

Its just the comparison of a double on a vector objects member.

它只是矢量对象成员上的double的比较。

class TvectorPM {
public:
    pthread_mutex_t lock;
    std::vector<PopulationMember> v; 
    void add(PopulationMember p);
};

void TvectorPM::add(PopulationMember p) {
    pthread_mutex_lock(&lock);
    v.push_back(p);
    pthread_mutex_unlock(&lock);
}


class PopulationManager {
public:
    TvectorPM populationlist;
}


void PopulationManager::sortByScore() {
    // Have overloaded operator< to make this fast
    sort(populationlist.v.begin(),populationlist.v.end());
}


class PopulationMember {
public:
    bool hasChanged;
    double score;

    inline bool operator< (const PopulationMember& rhs) const{
        return this->score < rhs.score;
    }

#3

I believe this boils down to the features GNU adds(documentation on GNU Extensions).

我相信这归结为GNU添加的功能(GNU Extensions文档)。

Those extensions might optimize some functionality in rather reasonable manner and provide additional overhead for others, as the performance depends on the shape of the code.

这些扩展可能以相当合理的方式优化某些功能,并为其他人提供额外的开销,因为性能取决于代码的形状。

Unfortunately I'm unable to provide specifics.

不幸的是我无法提供细节。

#4

-5

C++11 differs from the old versions in number of aspects. Many enhancements have been made to the language's original core too.
Also, some additional features are added. You can visit this webpage and look at the items with the C++11 tag.
Some of the minor, yet heavily used features -

C ++ 11在许多方面与旧版本不同。语言的原始核心也进行了许多增强。此外,还添加了一些其他功能。您可以访问此网页并查看包含C ++ 11标记的项目。一些次要但使用频繁的功能 -

1. initializer list for `vectors`<br>
2. range based `for` loop<br>
3. the `auto` keyword, for declaring data types, <br>
4. the 'uniform initialization syntax', in its full glory

and also the -std=c++11 flag that must be used to be able to enjoy any of the above features.

还必须使用-std = c ++ 11标志才能享受上述任何功能。

As for the performance issues, it may have been just a coincidence. But to be sure, run the compilations multiple times.

至于性能问题,可能只是巧合。但可以肯定的是,多次运行编译。

#1