c++虚拟机语言在高频金融中的性能

I thought the C/C++ vs C#/Java performance question was well trodden, meaning that I'd read enough evidence to suggest that the VM languages are not necessarily any slower than the "close-to-silicon" languages. Mostly because the JIT compiler can do optimizations that the statically compiled languages cannot.

我认为C/ c++ vs c# /Java性能问题已经被广泛讨论，这意味着我已经阅读了足够的证据，表明VM语言不一定比“接近硅”语言慢。主要是因为JIT编译器可以执行静态编译语言不能执行的优化。

However, I recently received a CV from a guy who claims that Java-based high frequency trading is always beaten by C++, and that he'd been in a situation where this was the case.

然而，我最近收到一个家伙的简历，他声称基于java的高频交易总是被c++打败，而且他一直处于这种情况。

A quick browse on job sites indeed shows that HFT applicants need knowledge of C++, and a look at Wilmott forum shows all the practitioners talking about C++.

在招聘网站上快速浏览一下就会发现，HFT的申请者需要了解c++，而在Wilmott论坛上，可以看到所有的从业者都在谈论c++。

Is there any particular reason why this is the case? I would have thought that with modern financial business being somewhat complex, a VM language with type safety, managed memory, and a rich library would be preferred. Productivity is higher that way. Plus, JIT compilers are getting better and better. They can do optimizations as the program is running, so you'd think they's use that run-time info to beat the performance of the unmanaged program.

为什么会这样?我本以为，由于现代金融业务有些复杂，最好使用具有类型安全、托管内存和丰富库的VM语言。这样生产率就更高了。另外，JIT编译器越来越好了。它们可以在程序运行时进行优化，因此您可能认为它们正在使用运行时信息来击败非托管程序的性能。

Perhaps these guys are writing the critical bits in C++ and and calling them from a managed environment (P/Invoke etc)? Is that possible?

也许这些人正在用c++编写关键位，并从托管环境(P/Invoke等)调用它们?这有可能吗?

Finally, does anyone have experience with the central question in this, which is why in this domain unmanaged code is without doubt preferred over managed?

最后，是否有人对这个中心问题有过经验，这就是为什么在这个领域中，非托管代码无疑比托管代码更受欢迎?

As far as I can tell, the HFT guys need to react as fast as possible to incoming market data, but this is not necessarily a hard realtime requirement. You're worse off if you're slow, that's for sure, but you don't need to guarantee a certain speed on each response, you just need a fast average.

据我所知，高频交易公司需要对即将到来的市场数据做出尽可能快的反应，但这并不一定是一个硬性的实时要求。如果你反应慢，你的情况会更糟，这是肯定的，但是你不需要保证每个反应都有一定的速度，你只需要一个快速的平均值。

EDIT

编辑

Right, a couple of good answers thus far, but pretty general (well-trodden ground). Let me specify what kind of program HFT guys would be running.

到目前为止，有几个不错的答案，但都很普遍(人人皆知)。我来具体说明一下HFT会运行什么样的程序。

The main criterion is responsiveness. When an order hits the market, you want to be the first to be able to react to it. If you're late, someone else might take it before you, but each firm has a slightly different strategy, so you might be OK if one iteration is a bit slow.

主要的标准是反应性。当订单进入市场时，你希望自己是第一个能够对订单做出反应的人。如果你迟到了，其他人可能会抢先一步，但是每个公司都有一个稍微不同的策略，所以如果一次迭代有点慢的话，你可能没问题。

The program runs all day long, with almost no user intervention. Whatever function is handling each new piece of market data is run dozens (even hundreds) of times a second.

该程序运行了一整天，几乎没有用户干预。处理每一个新的市场数据的功能都是每秒运行几十次(甚至上百次)。

These firms generally have no limit as to how expensive the hardware is.

这些公司通常对硬件的价格没有限制。

15 个解决方案

#1

Firstly, 1 ms is an eternity in HFT. If you think it is not then it would be good to do a bit more reading about the domain. (It is like being 100 miles away from the exchange.) Throughput and latency are deeply intertwined as the formulae in any elementary queuing theory textbook will tell you. The same formulae will show jitter values (frequently dominated by the standard deviation of CPU queue delay if the network fabric is right and you have not configured quite enough cores).

首先，1毫秒是高频的永恒。如果你认为它不是，那么最好多读一些关于这个领域的知识。(就像离交易所100英里远一样。)吞吐量和延迟是紧密交织在一起的，任何基本排队理论教科书都会告诉你这一点。同样的公式将显示抖动值(如果网络结构是正确的，并且您还没有配置足够的内核，则通常由CPU队列延迟的标准偏差决定)。

One of the problems with HFT arbitrage is that once you decide to capture a spread, there are two legs (or more) to realize the profit. If you fail to hit all legs you can be left with a position that you really don't want (and a subsequent loss) - after all you were arbitraging not investing.

高频交易套利的一个问题是，一旦你决定捕捉价差，就有两条(或更多)腿来实现利润。如果你没有击中所有的腿，你可能会得到一个你真的不想要的位置(以及随后的损失)——毕竟你是在做套利，而不是投资。

You don't want positions unless your strategy is predicting the (VERY near term!!!) future (and this, believe it or not, is done VERY successfully). If you are 1 ms away from exchange then some significant fraction of your orders won't be executed and what you wanted will be picked off. Most likely the ones that have executed one leg will end up losers or at least not profitable.

除非你的策略是预测(非常短期的!!)未来(信不信由你)，否则你不会想要头寸。如果你距离交易只有1毫秒，那么你的订单中有相当一部分将不会被执行，你想要的将会被剔除。最有可能的是，那些执行了一条腿的人最终会成为输家，或者至少不会盈利。

Whatever your strategy is for argument's sake let us say it ends up a 55%/45% win/loss ratio. Even a small change in the win/loss ratio can have in big change in profitability.

不管你的策略是什么为了争论，我们假设它最终是55%/45%的盈亏比。即使是盈亏比率的微小变化，也会对盈利能力产生重大影响。

re: "run dozens (even hundreds)" seems off by orders of magnitude Even looking at 20000 ticks a second seems low, though this might be the average for the entire day for the instrument set that he is looking at.

re:“跑几十次(甚至上百次)”看起来数量级不大，即使每秒2万次也不高，尽管这可能是他正在观察的仪器的全天平均值。

There is high variability in the rates seen in any given second. I will give an example. In some of my testing I look at 7 OTC stocks (CSCO,GOOG,MSFT,EBAY,AAPL,INTC,DELL) in the middle of the day the per second rates for this stream can range from 0 mps (very very rare) to almost almost 2000 quotes and trades per peak second. (see why I think the 20000 above is low.)

在任何给定的一秒钟内看到的速率都有很大的变异性。我来举个例子。在我的一些测试中，我观察了7只场外股票(CSCO,GOOG,MSFT,EBAY,AAPL,INTC,DELL)。在一天当中，这些股票的每秒钟收益率从0 mps(非常罕见)到几乎2000个报价和每峰值秒的交易。(看看为什么我认为上面的20000点比较低。)

I build infrastructure and measurement software for this domain and the numbers we talk about are 100000's and millions per second. I have C++ producer/consumer infrastructure libraries that can push almost 5000000 (5 million) messages/second between producer and consumer, (32 bit,2.4 GHz cores). These are 64 byte messages with new, construct, enqueue, synchronize, on the producer side and synchronize,dequeue,touch every byte,run virtual destructor,free on the consumer side. Now admittedly that is a simple benchmark with no Socket IO (and socket IO can be ugly) as would be at the end points of the end point pipe stages. It is ALL custom synchronization classes that only synchronize when empty, custom allocators, custom lock free queues and lists, occasional STL(with custom allocators) but more often custom intrusive collections (of which I have a significant library). More than once I have given a vendor in this arena a quadruple (and more) in throughput without increased batching at the socket endpoints.

我为这个领域建立了基础设施和测量软件，我们讨论的数字是每秒10万或数百万。我有c++生产者/消费者基础设施库，可以在生产者和消费者之间每秒钟推送近5000000(500万)条消息(32位，2.4 GHz内核)。这些是64字节的消息，在生产者端有新的、构造的、入队列的、同步的、同步的、去队列的、触摸每个字节的、运行虚拟析构的，在消费者端是免费的。现在必须承认，这是一个简单的基准，没有Socket IO (Socket IO可能很难看)，就像端点管道阶段的端点一样。它是所有自定义同步类，只有在空的、自定义的分配器、自定义的锁释放队列和列表、偶尔的STL(使用自定义的分配器)和更常见的自定义的插入式集合(其中我有一个重要的库)时才会同步。不止一次，我在这个领域给了一个供应商一个四倍(更多)的吞吐量，而没有增加在套接字端点的批处理。

I have OrderBook and OrderBook::Universe classes that take less than 2us for new, insert, find, partial fill, find, second fill, erase, delete sequence when averaged over 22000 instruments. The benchmark iterates over all 22000 instruments serially between the insert first fill and last fill so there are no cheap caching tricks involved. Operations into the same book are separated by accesses of 22000 different books. These are very much NOT the caching characteristics of real data. Real data is much more localized in time and consecutive trades frequently hit the same book.

我有OrderBook和OrderBook:::宇宙类，在超过22000个仪器的平均情况下，这些类的新、插入、查找、部分填充、查找、二次填充、擦除、删除序列值小于2us。基准测试在插入第一次填充和最后一次填充之间连续地遍历所有22000个工具，因此不涉及廉价的缓存技巧。同一本书的操作通过访问22000本不同的书进行分离。这些都不是真实数据的缓存特性。真正的数据在时间上更加本地化，连续的交易经常击中同一本书。

All of this work involves careful consideration of the constants and caching characteristics in any of the algorithmic costs of the collections used. (Sometimes it seems that the K's in KO(n) KO(n*log n) etc., etc., etc. are dismissed a bit too glibly)

所有这些工作都需要仔细考虑所使用集合的任何算法成本中的常量和缓存特性。(有时K的K在KO(n) KO(n*log n)等中似乎有些过于圆滑了)

I work on the Marketdata infrastructure side of things. It is inconceivable to even think of using java or a managed environment for this work. And when you can get this kind of performance with C++ and I think it is quite hard to get million+/mps performance with a managed environment) I can't imagine any of the significant investment banks or hedge funds (for whom a $250000 salary for a top notch C++ programmer is nothing) not going with C++.

我在市场数据基础设施方面工作。甚至想到使用java或托管的环境来完成这项工作都是不可想象的。当你能得到这样的性能与c++和我认为这是很难获得百万+ /*的性能管理环境下)我无法想象的任何重大投资银行或对冲基金(250000美元的工资对于一个*的c++程序员没有)不会与c++。

Is anybody out there really getting 2000000+/mps performance out of a managed environment? I know a few people in this arena and no one ever bragged about it to me. And I think 2mm in a managed environment would have some bragging rights.

有没有人真的从托管环境中获得了2000000+/mps的性能?我在这个舞台上认识几个人，但从来没有人向我吹嘘过。我认为在一个有管理的环境中2mm会有一些吹嘘的权利。

I know of one major player's FIX order decoder doing 12000000 field decodes/sec. (3Ghz CPU) It is C++ and the guy who wrote it almost challenged anybody to come up with something in a managed environment that is even half that speed.

我知道有一个主要玩家的固定指令解码器做12000000场解码器/秒。(3Ghz CPU)它是c++，写它的人几乎挑战了任何人在一个管理环境中想出哪怕只有一半速度的东西。

Technologically it is an interesting area with lots of fun performance challenges. Consider the options market when the underlying security changes - there might be say 6 outstanding price points with 3 or 4 different expiration dates. Now for each trade there were probably 10-20 quotes. Those quotes can trigger price changes in the options. So for each trade there might be 100 or 200 changes in options quotes. It is just a ton of data - not a Large Hadron Collider collision-detector-like amount of data but still a bit of a challenge. It is a bit different than dealing with keystrokes.

从技术上讲，这是一个有趣的领域，有很多有趣的性能挑战。当潜在的安全变化时，考虑期权市场——可能会有6个未完成的价格点，有3或4个不同的到期日期。每笔交易大概有10-20个报价。这些报价可以触发期权的价格变化。每笔交易的期权报价可能有100或200个变化。这只是一大堆数据——不是大型强子对撞机(lhc)那样的大数据量，但仍然有一点挑战。它与处理击键有点不同。

Even the debate about FPGA's goes on. Many people take the position that a well coded parser running on 3GHZ commodity HW can beat a 500MHz FPGA. But even if a tiny bit slower (not saying they are) FPGA based systems can tend to have tighter delay distributions. (Read "tend" - this is not a blanket statement) Of course if you have a great C++ parser that you push through a Cfront and then push that through the FPGA image generator... But that another debate...

甚至关于FPGA的争论也在继续。许多人认为，在3GHZ商品HW上运行的编码良好的解析器可以击败500MHz FPGA。但是即使稍微慢一点(不是说它们是)FPGA的系统也可能会有更严格的延迟分布。当然，如果你有一个很棒的c++解析器，你可以通过一个Cfront，然后通过FPGA图像生成器来推动它……但是,另一个争论……

#2

A lot of it comes down to a simple difference between fact and theory. People have advanced theories to explain why Java should be (or at least might be) faster than C++. Most of the arguments have little to do with Java or C++ per se, but to dynamic versus static compilation, with Java and C++ really being little more than examples of the two (though, of course, it's possible to compile Java statically, or C++ dynamically). Most of these people have benchmarks to "prove" their claim. When those benchmarks are examined in any detail, it quickly becomes obvious that in quite a few cases, they took rather extreme measures to get the results they wanted (e.g., quite a number enable optimization when compiling the Java, but specifically disabled optimization when compiling the C++).

很多问题都归结为事实和理论之间的一个简单的区别。人们有先进的理论来解释为什么Java应该(或至少可能)比c++快。大多数参数与Java或c++本身没什么关系，但与动态编译和静态编译相比，Java和c++实际上只是这两者的示例(当然，静态编译Java或c++也有可能)。大多数人都有基准来“证明”他们的主张。当对这些基准进行任何细节检查时，很快就会发现，在相当多的情况下，他们采取了相当极端的措施来获得他们想要的结果(例如，在编译Java时有相当数量的基准支持优化，但在编译c++时，特别禁用了优化)。

Compare this to the Computer Language Benchmarks Game, where pretty much anybody can submit an entry, so all of the code tends to be optimized to a reasonable degree (and, in a few cases, even an unreasonable degree). It seems pretty clear that a fair number of people treat this as essentially a competition, with advocates of each language doing their best to "prove" that their preferred language is best. Since anybody can submit an implementation of any problem, a particularly poor submission has little effect on overall results. In this situation, C and C++ emerge as clear leaders.

与计算机语言基准测试游戏相比，几乎每个人都可以提交一个条目，所以所有的代码都趋向于优化到一个合理的程度(在一些情况下，甚至是一个不合理的程度)。很明显，相当多的人把这看成是一种竞争，每一种语言的提倡者都尽力“证明”他们的首选语言是最好的。因为任何人都可以提交任何问题的实现，所以一个特别糟糕的提交对总体结果几乎没有影响。在这种情况下，C和c++成为了明显的领导者。

Worse, if anything these results probably show Java in better light than is entirely accurate. In particular, somebody who uses C or C++ and really cares about performance can (and often will) use Intel's compiler instead of g++. This will typically give at least a 20% improvement in speed compared to g++.

更糟糕的是，这些结果可能比完全准确地显示了Java更好的光线。特别是，那些使用C或c++并真正关心性能的人可以(而且通常会)使用Intel的编译器而不是g++。与g++相比，这通常会使速度提高至少20%。

Edit (in response to a couple of points raised by jalf, but really too long to fit reasonably in comments):

编辑(针对jalf提出的几个要点，但实在太长了，不能合理的评论):

pointers being an optimizer writers nightmare. This is really overstating things (quite) a bit. Pointers lead to the possibility of aliasing, which prevents certain optimizations under certain circumstances. That said, inlining prevents the ill effects much of the time (i.e., the compiler can detect whether there's aliasing rather than always generating code under the assumption that there could be). Even when the code does have to assume aliasing, caching minimizes the performance hit from doing so (i.e., data in L1 cache is only minutely slower than data in a register). Preventing aliasing would help performance in C++, but not nearly as much as you might think.

指针是优化器编写程序的梦魇。这确实有点言过其实。指针导致了混叠的可能性，这在某些情况下阻止了某些优化。也就是说，内联在很多时候都可以防止不良影响。，编译器可以检测是否存在混叠，而不是在假定可能存在的情况下总是生成代码)。即使代码必须假设别名，缓存也会最小化这样做带来的性能损失(例如。， L1缓存中的数据只比寄存器中的数据慢一点点)。防止混叠将有助于c++的性能，但并没有您想象的那么好。
Allocation being a lot faster with a garbage collector. It's certainly true that the default allocator in many C++ implementations is slower than what most (current) garbage collected allocators provide. This is balanced (to at least a degree) by the fact that allocations in C++ tend to be on the stack, which is also fast, whereas in a GC language nearly all allocations are usually on the heap. Worse, in a managed language you usually allocate space for each object individually whereas in C++ you're normally allocating space for all the objects in a scope together.

使用垃圾收集器进行分配要快得多。的确，许多c++实现中的默认分配器比大多数(当前)垃圾收集分配器提供的分配器要慢。这是平衡的(至少在一定程度上)，因为c++中的分配往往是在堆栈上的，这也是快速的，而在GC语言中，几乎所有的分配都是在堆上的。更糟糕的是，在托管语言中，您通常为每个对象单独分配空间，而在c++中，您通常为一个范围中的所有对象分配空间。

It's also true that C++ directly supports replacing allocators both globally and on a class-by-class basis, so when/if allocation speed really is a problem it's usually fairly easy to fix.

确实，c++直接支持在全局和类之间替换分配器，所以当/if分配速度确实是个问题时，通常很容易修复。

Ultimately, jalf is right: both of these points undoubtedly do favor "managed" implementations. The degree of that improvement should be kept in perspective though: they're not enough to let dynamically compiled implementations run faster on much code -- not even benchmarks designed from the beginning to favor them as much as possible.

归根结底，jalf是对的:这两点无疑都支持“托管”实现。这种改进的程度应该保持正确:它们不足以让动态编译的实现在很多代码上运行得更快——甚至从一开始就设计的基准测试都不支持它们。

Edit2: I see Jon Harrop has attempted to insert his two (billionths of a) cent's worth. For those who don't know him, Jon's been a notorious troll and spammer for years, and seems to be looking for new ground into which to sow weeds. I'd try to reply to his comment in detail, but (as is typical for him) it consists solely of unqualified, unsupported generalizations containing so little actual content that a meaningful reply is impossible. About all that can be done is to give onlookers fair warning that he's become well known for being dishonest, self-serving, and best ignored.

我看到Jon Harrop试图投入他的20亿美元的资产。对于那些不认识他的人来说，琼恩是一个臭名昭著的巨魔和垃圾邮件制造者，而且似乎正在寻找新的土壤播种杂草。我会试着详细地回复他的评论，但(就像他的典型做法一样)它只包含不合格的、不受支持的概括，其中包含的实际内容太少，以至于不可能给出有意义的答复。我们所能做的就是给旁观者一个公平的警告，让他们知道他已经因为不诚实、自私和被忽视而出名。

#3

A JIT compiler could theoretically perform a lot of optimizations, yes, but how long are you willing to wait? A C++ app can take hours to compiler because it happens offline, and the user isn't sitting there tapping his fingers and waiting.

理论上，JIT编译器可以执行很多优化，是的，但是您愿意等待多长时间呢?一个c++应用程序需要花费数小时来编译，因为它是离线的，而且用户不会坐在那里敲打手指等待。

A JIT compiler has to finish within a couple of milliseconds. So which do you think can get away with the most complex optimizations?

JIT编译器必须在几毫秒内完成。那么，你认为哪种优化方法可以避免最复杂的优化呢?

The garbage collector is a factor too. Not because it is slower than manual memory management per se (I believe its amortized cost is pretty good, definitely comparable to manual memory handling), but it's less predictable. It can introduce a stall at pretty much any point, which might not be acceptable in systems that are required to be extremely responsive.

垃圾收集器也是一个因素。这并不是因为它比手动内存管理本身慢(我认为它的摊销成本相当不错，肯定可以与手动内存处理相比)，而是因为它的可预测性较低。它几乎可以在任何点引入一个失速，这在需要高度响应的系统中可能是不可接受的。

And of course, the languages lend themselves to different optimizations. C++ allows you to write very tight code, with virtually no memory overhead, and where a lot of high level operations are basically free (say, class construction).

当然，这些语言可以进行不同的优化。c++允许您编写非常紧凑的代码，几乎没有内存开销，并且许多高级操作基本上是免费的(例如，类构造)。

In C# on the other hand, you waste a good chunk of memory. And simply instantiating a class carries a good chunk of overhead, as the base Object has to be initialized, even if your actual class is empty.

另一方面，在c#中，您浪费了大量的内存。简单地实例化一个类会带来很大的开销，因为基本对象必须被初始化，即使实际的类是空的。

C++ allows the compiler to strip away unused code aggressively. In C#, most of it has to be there so it can be found with reflection.

c++允许编译器积极地删除未使用的代码。在c#中，它的大部分都必须在那里，所以可以通过反射找到它。

On the other hand, C# doesn't have pointers, which are an optimizing compiler's nightmare. And memory allocations in a managed language are far cheaper than in C++.

另一方面，c#没有指针，这是一个优化编译器的噩梦。管理语言中的内存分配比c++中的要便宜得多。

There are advantages either way, so it is naive to expect that you can get a simple "one or the other" answer. Depending on the exact source code, the compiler, the OS, the hardware it's running on, one or the other may be faster. And depending on your needs, raw performance might not be the #1 goal. Perhaps you're more interested in responsiveness, in avoiding unpredictable stalls.

这两种方法都有好处，所以指望你能得到一个简单的“一个或另一个”答案是天真的。取决于确切的源代码、编译器、操作系统、正在运行的硬件，一个或另一个可能会更快。根据您的需要，原始性能可能不是首要目标。也许你更感兴趣的是反应性，避免不可预知的停滞。

In general, your typical C++ code will perform similarly to equivalent C# code. Sometimes faster, sometimes slower, but probably not a dramatic difference either way.

通常，典型的c++代码执行起来与等效的c#代码类似。有时更快，有时更慢，但可能两种方式都没有显著差异。

But again, it depends on the exact circumstances. And it depends on how much time you're willing to spend on optimization. if you're willing to spend as much time as it takes, C++ code can usually achieve better performance than C#. It just takes a lot of work.

但这又取决于具体情况。这取决于你愿意在优化上花多少时间。如果您愿意花费尽可能多的时间，那么c++代码通常可以获得比c#更好的性能。这需要大量的工作。

And the other reason, of course, is that most companies who use C++ already have a large C++ code base which they don't particularly want to ditch. They need that to keep working, even if they gradually migrate (some) new components to a managed language.

当然，另一个原因是，大多数使用c++的公司都已经有了一个很大的c++代码库，他们并不想抛弃它。即使他们逐渐地将(一些)新组件迁移到托管语言中，他们也需要这些组件来继续工作。

#4

These firms generally have no limit as to how expensive the hardware is.

这些公司通常对硬件的价格没有限制。

If they also don't care how expensive the sofware is, then I'd think that of course C++ can be faster: for example, the programmer might use custom-allocated or pre-allocated memory; and/or they can run code in the kernel (avoiding ring transitions), or on a real-time O/S, and/or have it closely-coupled to the network protocol stack.

如果他们也不在乎软件的价格有多贵，那么我认为c++当然可以更快:例如，程序员可以使用自定义分配或预先分配的内存;并且/或者它们可以在内核中运行代码(避免循环转换)，或者在实时O/S上运行代码，或者将其紧密耦合到网络协议栈中。

#5

There are reasons to use C++ other than performance. There is a HUGE existing library of C and C++ code. Rewriting all of that in alternate languages would not be practical. In order for things like P/Invoke to work correctly, the target code has to be designed to be called from elsewhere. If nothing else you'd have to write some sort of wrapper around things exposing a completely C API because you can't P/Invoke to C++ classes.

有理由使用c++而不是性能。有一个巨大的现有的C和c++代码库。用其他语言重写所有这些内容是不现实的。为了使P/Invoke之类的东西能够正常工作，目标代码必须设计为可以从其他地方调用。如果没有别的东西，你就必须编写一些包装器来包装一个完全的C API，因为你不能调用c++类。

Finally, P/Invoke is a very expensive operation.

最后，P/Invoke操作非常昂贵。

JIT compilers are getting better and better. They can do optimizations as the program is running

JIT编译器越来越好了。它们可以在程序运行时进行优化

Yes, they can do this. But you forget that any C++ compiler is able to do the same optimizations. Sure, compile time will be worse, but the very fact that such optimizations have to be done at runtime is overhead. There are cases where managed languages can beat C++ at certain tasks, but this is usually because of their memory models and not the result of runtime optimizations. Strictly speaking, you could of course have such a memory model in C++, EDIT: such as C#'s handling of strings, /EDIT but few C++ programmers spend as much time optimizing their code as JIT guys do.

是的，他们能做到。但是您忘记了任何c++编译器都可以执行相同的优化。当然，编译时间会更糟糕，但这种优化必须在运行时完成，这本身就是开销。在某些情况下，托管语言可以在某些任务中打败c++，但这通常是因为它们的内存模型，而不是运行时优化的结果。严格地说，您当然可以在c++中有这样一个内存模型，编辑:比如c#处理字符串，/编辑，但是很少有c++程序员像JIT程序员那样花那么多时间来优化他们的代码。

There are some performance issues that are an inherit downside to managed languages -- namely disk I/O. It's a one time cost, but depending on the application it can be significant. Even with the best optimizers, you still need to load 30MB+ of JIT compiler from disk when the program starts; whereas it's rare for a C++ binary to approach that size.

有一些性能问题是托管语言继承的缺点——即磁盘I/O。这是一个时间成本，但取决于应用程序，它可能是重要的。即使有最好的优化器，当程序启动时，仍然需要从磁盘加载30MB的JIT编译器;然而，c++二进制文件很少能达到这种大小。

#6

The simple fact is that C++ is designed for speed. C#/Java aren't.

简单的事实是c++是为速度设计的。c# / Java不是。

Take the innumerable inheritance hierarchies endemic to those languages (such as IEnumerable), compared to the zero-overhead of std::sort or std::for_each being generic. C++'s raw execution speed isn't necessarily any faster, but the programmer can design fast or zero-overhead systems. Even things like buffer overruns- you can't turn their detection off. In C++, you have control. Fundamentally, C++ is a fast language- you don't pay for what you don't use. In contrast, in C#, if you use, say, stackalloc, you can't NOT do buffer overrun checking. You can't allocate classes on the stack, or contiguously.

将这些语言特有的无数继承层次结构(例如IEnumerable)与std: sort或std: for_each的零开销进行比较。c++的原始执行速度不一定更快，但程序员可以设计快速或零开销的系统。甚至像缓冲区溢出这样的事情——你不能关闭它们的检测。在c++中，你可以控制。从根本上说，c++是一种快速语言——你不需要为不使用的东西付费。相反，在c#中，如果您使用stackalloc，则不能执行缓冲区溢出检查。不能在堆栈上或连续地分配类。

There's also the whole compile-time thing, where C++ apps can take much longer, both to compile, and to develop.

还有整个编译过程，c++应用程序的编译和开发都需要更长时间。

#7

This might be kinda off topic, but I watched a video a couple of weeks ago which might appear to be of interest to you : http://ocaml.janestreet.com/?q=node/61

这可能有点离题，但我几周前看过一个视频，你可能会感兴趣:http://ocaml.janestreet.com/?

It comes from a trading company which decided to use ocaml as its main language for trading, and I think their motivations should be enlightening to you (basically, they valued speed of course, but also strong typing and functional style for quicker increments as well as easier comprehension).

它来自于一家贸易公司，他们决定使用ocaml作为主要的交易语言，我认为他们的动机应该对你有所启发(基本上，他们当然重视速度，但也重视强大的打字和功能风格，以更快的增量和更容易的理解)。

#8

Most of our code ends up having to be run on a Grid of 1000's of machines.

我们的大部分代码最终不得不在1000台机器的网格上运行。

I think this environment changes the argument. If the difference between c++ and c# execution speed is 25% for example then other factors come into play. When this is run on a grid it may make no difference as to how it is coded as the whole process once spread across machines may not be an issue or solved by allocating or purchasing a few more machines. The most important issue and cost may become 'time to market' where c# may prove the winner and faster option.

我认为这种环境改变了论点。例如，如果c++和c#执行速度之间的差异是25%，那么其他因素也会发挥作用。当这是在网格上运行时，它的编码方式可能不会有什么不同，因为整个过程在机器间传播可能不是问题，或者是通过分配或购买更多的机器来解决。最重要的问题和成本可能会变成“上市时间”，c#可能成为赢家和更快的选择。

Which is faster c++ or c#?

c++和c#哪个更快?

C# by six months......

c#……六个月

#9

It's not only a matter of programming language, the hardware and operating system will be relevant to.
The best overall performance you will get with a realtime operating system, a realtime programming language and efficient (!) programming.

这不仅是一个编程语言的问题，硬件和操作系统也将是相关的。使用实时操作系统、实时编程语言和高效编程，您将获得最佳的总体性能。

So you've quite a few possibilities in choosing an operating system, and a few in choosing the language. There's C, Realtime Java, Realtime Fortran and a few others.

所以你在选择操作系统时有很多的可能性，在选择语言时也有一些。有C、实时Java、实时Fortran和其他一些工具。

Or maybe you will have the best results in programming an FPGA/Processor to eliminate the cost of an operating system.

或者，在编写FPGA/处理器以消除操作系统的成本方面，您将获得最好的结果。

The greatest choice you have to do, how many possible performance optimizations you will ignore in favor of choosing a language that eases development and will run more stable, because you can do less bugs, which will result in a higher availiability of the system. This shouldn't be overlooked. You have no win in developing an application which performs 5% faster than any other application which crashes every few point due to some minor hard to find bugs.

您必须做的最大的选择是，您将忽略多少可能的性能优化，以便选择一种能够简化开发、运行更稳定的语言，因为您可以减少错误，这将导致系统的更高可靠性。这个不应该被忽视。开发一个比其他任何应用程序都要快5%的应用程序，这比任何其他应用程序都要快得多。

#10

In HFT, latency is a bigger issue that throughput. Given the inherent parallelism in the data source, you can always throw more cores at the problem, but you can't make up for response time with more hardware. Whether the language is compiled beforehand, or Just-In-Time, garbage collection can destroy your latency. There exist realtime JVMs with guaranteed garbage collection latency. It's a fairly new technology, a pain to tune, and ridiculously expensive, but if you have the resources, it can be done. It'll probably become much more mainstream in coming years, as the early adopters fund the R&D that's going on now.

在HFT中，延迟是吞吐量更大的问题。考虑到数据源中固有的并行性，您总是可以在这个问题上抛出更多的内核，但是您不能用更多的硬件来弥补响应时间。无论该语言是预先编译的，还是即时编译的，垃圾收集都可以破坏延迟。存在具有保证垃圾收集延迟的实时jvm。这是一项相当新的技术，难以调整，而且成本高得离谱，但如果您有资源，这是可以做到的。在未来的几年里，它可能会变得更加主流，因为早期的采用者会资助现在正在进行的研发。

#11

One of the most interesting thing in C++ is that its performance numbers are not better, but more reliable.

c++中最有趣的一点是，它的性能数据并不好，但更可靠。

It's not necessarily faster than Java/C#/..., but it is consistent accross runs.

它不一定比Java/ c# /…快。但这是一种持续的竞争。

Like in networking, sometimes the throughput isn't as important as a stable latency.

就像在网络中一样，有时吞吐量不如稳定的延迟那么重要。

#12

A huge reason to prefer c++ (or lower level) in this case other than what has already been said, is that there are some adaptability benefits of being low level.

在这种情况下，选择c++(或较低级别)的一个重要原因是，在低级别上有一些适应性优势。

If hardware technology changes, you can always drop into an __asm { } block and actually use it before languages/compilers catch up

如果硬件技术发生了变化，您总是可以进入__asm{}块并在语言/编译器赶上之前实际使用它

For example, there is still no support for SIMD in Java.

例如，在Java中仍然没有对SIMD的支持。

#13

Virtual Execution Engines (JVM or CLR of .Net) do not permit structuring the work in time-efficient way, as process instances cannot run on as many threads as might be needed.

虚拟执行引擎(JVM或. net的CLR)不允许以节省时间的方式构造工作，因为流程实例不能在可能需要的线程上运行。

In contrast, plain C++ enables execution of parallel algorithms and construction of objects outside the time-critical execution paths. That’s pretty much everything – simple and elegant. Plus, with C++ you pay only for what you use.

相比之下，plain c++支持并行算法的执行和在时间关键执行路径之外构建对象。这就是一切——简单而优雅。另外，使用c++你只需要为你使用的东西付费。

#14

The elephant in the room here is the fact that C++ is faster than Java.

这里的关键是c++比Java快。

We all know it. But we also know that if we state it plainly, as I just did, that we can't pretend to engage in a meaningful debate about this undebatable topic. How much faster is C++ than Java for your application? That has the ring of a debatable topic, but, alas, it will always be hypothetical unless you implement your application in both languages, at which point it there will be no room for debate.

我们都知道它。但我们也知道，如果我们像我刚才那样清楚地说明这一点，我们就不能假装对这个毫无争议的话题进行有意义的辩论。对于您的应用程序来说，c++比Java快多少?这是一个有争议的话题，但是，唉，它总是假设的，除非你用两种语言实现你的应用程序，那样的话，它就没有争论的余地了。

Let's go back to your first design meeting: The hard requirement for your project is high performance. Everyone in the room will think "C++" and a handful of other compiled languages. The guy in the room who suggests Java or C# will have to justify it with evidence (i.e., a prototype), not with hypotheticals, not with claims made by the vendors, not with statements on programmer gossip sites, and certainly not with "hello world" benchmarks.

让我们回到您的第一次设计会议:您的项目的硬要求是高性能。房间里的每个人都会想到“c++”和其他一些编译语言。房间里建议使用Java或c#的人必须用证据来证明(例如。没有假设，没有供应商提出的要求，没有程序员八卦网站上的声明，当然也没有“你好世界”的基准。

As it stands now, you have to move forward with what you know, not with what is hypothetically possible.

就目前的情况而言，你必须用你所知道的东西前进，而不是用假设可能的东西。

#15

Nikie wrote: “Could you explain what you can do with C++ threads and not with e.g. .NET threads?”

Nikie写道:“你能解释一下用c++线程而不是。net线程能做什么吗?”

Threading with .Net could perform virtually everything C++ threading can, except:

使用。net进行线程处理几乎可以执行c++线程可以执行的所有操作，但是:

Efficient execution of COM-encapsulated binary code. For examples, sensitive algorithms that might have to be kept secret from application developers. (Might be relevant in HFT)
高效地执行封装的二进制代码。例如，可能必须对应用程序开发人员保密的敏感算法。(可能与高频交易有关)
Creation of lean threads that do not exhaust system resources with chunky building blocks – wrapped OS APIs and synchronization & signaling OS primitives. (Extremely relevant with parallel algorithms for time-optimization of performance in HFT)
创建不耗尽系统资源的精益线程，使用粗块构建块——封装的OS api和同步和信令OS原语。(与HFT性能的时间优化并行算法极为相关)
Scaling up the throughput of a business process application 10 or more times on the same hardware and with the same latency. (Not relevant in HFT)
在相同的硬件上以相同的延迟将业务流程应用程序的吞吐量增加10倍或更多倍。高频交易(不相关的)
Scaling up 100 and more times the number of concurrently handled user interactions per unit of hardware. (Not relevant in HFT)
扩展为每台硬件并发处理的用户交互数量的100倍以上。高频交易(不相关的)

Using more CPU cores cannot fully compensate exhausting of system resources by the building blocks of .Net since more CPU cores are a guarantee for appearance of memory contention.

使用更多的CPU内核不能完全补偿. net构建块对系统资源的消耗，因为更多的CPU内核是出现内存争用的保证。

#1