如何编写快速(低级别)代码?

I would like to learn more about low level code optimization, and how to take advantage of the underlying machine architecture. I am looking for good pointers on where to read about this topic.

我想学习更多关于底层代码优化的知识，以及如何利用底层机器架构。我正在寻找关于这个话题的好建议。

More details:

更多的细节:

I am interested in optimization in the context of scientific computing (which is a lot of number crunching but not only) in low level languages such as C/C++. I am in particular interested in optimization methods that are not obvious unless one has a good understanding of how the machine works (which I don't---yet).

我感兴趣的是在科学计算的上下文中进行优化(这是大量的数字运算，但不只是)，使用C/ c++等低级语言。我特别感兴趣的是优化方法，这些方法并不明显，除非人们对机器的工作原理有很好的理解(我还不了解)。

For example, it's clear that a better algorithm is faster, without knowing anything about the machine it's run on. It's not at all obvious that it matters if one loops through the columns or the rows of a matrix first. (It's better to loop through the matrix so that elements that are stored at adjacent locations are read successively.)

例如，很明显，更好的算法更快，而不需要知道它运行的机器。如果首先循环遍历矩阵的列或行，这一点并不明显。(最好循环遍历矩阵，以便连续读取存储在相邻位置的元素。)

Basic advice on the topic or pointers to articles are most welcome.

最受欢迎的是关于主题的基本建议或文章链接。

Answers

答案

Got answers with lots of great pointers, a lot more than I'll ever have time to read. Here's a list of all of them:

我得到了很多很好的答案，比我读到的要多得多。以下是他们所有人的名单:

The software optimization cookbook from Intel (book)
来自英特尔的软件优化食谱(书)
What every programmer should know about memory (pdf book)
每个程序员都应该知道的关于内存的知识(pdf书)
Write Great Code, Volume 2: Thinking Low-Level, Writing High-Level (book)
写出伟大的代码，第二卷:思考低水平，写高水平(书)
Software optimization resources by Agner Fog (five detailed pdf manuals)
Agner Fog的软件优化资源(5个详细的pdf手册)

I'll need a bit of skim time to decide which one to use (not having time for all).

我需要一点时间来决定用哪一个(不是所有的时间)。

8 个解决方案

#1

Drepper's What Every Programmer Should Know About Memory [pdf] is a good reference to one aspect of low-level optimisation.

Drepper是每个程序员都应该知道的关于内存的东西[pdf]，它很好地引用了低层次优化的一个方面。

#2

For Intel architectures this is priceless: The Software Optimization Cookbook, Second Edition

对于英特尔架构来说，这是无价的:软件优化食谱，第二版

#3

It's been a few years since I read it, but Write Great Code, Volume 2: Thinking Low-Level, Writing High-Level by Randall Hyde was quite good. It gives good examples of how C/C++ code translates into assembly, e.g. what really happens when you have a big switch statement.

我读它已经有好几年了，但是写了很棒的代码，第2卷:低层次思考，兰德尔·海德写的高水平非常好。它给出了C/ c++代码如何转换成汇编的很好的例子，例如，当你有一个大的switch语句时，真正发生了什么。

Also, altdevblogaday.com is focused on game development, but the programming articles might give you some ideas.

此外，altdevblogaday.com关注的是游戏开发，但是编程文章可能会给您一些想法。

#4

An interesting book about bit manipulation and smart ways of doing low-level things is Hacker's Delight.

一本关于位操纵和处理低级事务的聪明方法的有趣书籍是Hacker的得意之作。

This is definitely worth a read for everyone interested in low-level coding.

对于所有对低级代码感兴趣的人来说，这绝对值得一读。

#5

Check out: http://www.agner.org/optimize/

查阅:http://www.agner.org/optimize/

#6

C and C++ are usually the languages that are used for this because of their speed (ignoring Fortran as you didn't mention it). What you can take advantage of (which the icc compiler does a lot) is SSE instruction sets for a lot of floating point number crunching. Another thing that is possible is the use of CUDA and Stream API's for Nvidia/Ati respectively to do VERY fast floating point operations on the graphics card while leaving the CPU free to do the rest of the work.

C和c++通常是由于它们的速度而使用的语言(忽略Fortran，因为您没有提到它)。您可以利用(icc编译器经常做的)大量浮点数处理的SSE指令集。另一种可能的方法是使用CUDA和流API分别在显卡上执行非常快速的浮点操作，同时让CPU*地完成其余的工作。

#7

Another approach to this is hands-on comparison. You can get a library like Blitz++ (http://www.oonumerics.org/blitz/) which - I've been told - implements aggressive optimisations for numeric/scientific computing, then write some simple programs doing operations of interest to you (e.g. matrix multiplications). As you use Blitz++ to perform them, write your own class that does the same, and if Blitz++ proves faster start investigating it's implementation until you realise why. (If yours is significantly faster you can tell the Blitz++ developers!)

另一种方法是亲力亲为的比较。你可以得到一个像Blitz++ ++ (http://www.oonumerics.org/blitz/)这样的库，我被告知，它为数字/科学计算实现了积极的优化，然后编写一些简单的程序来执行你感兴趣的操作(例如矩阵乘法)。当您使用Blitz++ +执行它们时，编写自己的类来执行它们，如果Blitz++被证明速度更快，那么就开始研究它的实现，直到您意识到原因。(如果你的速度快得多，你可以告诉Blitz+开发人员!)

You should end up learning about a lot of things, for example:

你最终应该学到很多东西，例如:

memory cache access patterns
内存缓存访问模式
expression templates (there are some bad links atop Google search results re expression templates - the key scenario/property you want to find discussion of is that they can encode many successive steps in a chain of operations such that they all be applied during one loop over a data set)
表达式模板(在谷歌搜索结果re表达式模板上有一些不好的链接——您希望找到讨论的关键场景/属性是，它们可以在操作链中编码多个连续步骤，以便在数据集的一个循环中应用它们)
some CPU-specific instructions (though I haven't checked they've used such non-portable techniques)...
一些特定于cpu的指令(尽管我还没有检查过他们使用过这种非便携技术)……

#8

I learned a lot from the book Inner Loops. It's ancient now, in computer terms, but it's very well written and Rick Booth is so enthusiastic about his subject I would still say it's worth looking at to see the kind of mindset you need to make a CPU fly.

我从《内心循环》这本书中学到了很多。现在已经很古老了，用计算机的术语来说，但它写得很好，里克·布斯对他的主题非常热衷，我还是会说，看一看你需要做一个CPU飞行的思维方式是值得的。

#1