量化垃圾收集与显式内存管理的性能

时间:2021-05-23 03:47:31

I found this article here:

我在这里找到了这篇文章:

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

量化垃圾收集与显式内存管理的性能

http://www.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf

In the conclusion section, it reads:

在结论部分,它写道:

Comparing runtime, space consumption, and virtual memory footprints over a range of benchmarks, we show that the runtime performance of the best-performing garbage collector is competitive with explicit memory management when given enough memory. In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collection’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. Garbage collection also is more susceptible to paging when physical memory is scarce. In such conditions, all of the garbage collectors we examine here suffer order-of-magnitude performance penalties relative to explicit memory management.

通过比较一系列基准测试中的运行时,空间消耗和虚拟内存占用,我们表明,在给定足够的内存时,性能最佳的垃圾收集器的运行时性能与显式内存管理相比具有竞争力。特别是,当垃圾收集的内存是所需内存的五倍时,其运行时性能与显式内存管理的性能相匹配或稍微超过。但是,当必须使用较小的堆时,垃圾收集的性能会大幅降低。内存的三倍,平均运行速度慢了17%,内存的两倍,运行速度慢了70%。当物理内存不足时,垃圾收集也更容易被分页。在这种情况下,我们在这里检查的所有垃圾收集器都会受到相对于显式内存管理的数量级性能损失。

So, if my understanding is correct: if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB? (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?)

所以,如果我的理解是正确的:如果我有一个用本机C ++编写的应用程序需要100 MB内存,要使用“托管”(即基于垃圾收集器)语言(例如Java,C#)实现相同的性能,应用程序应该要求5 * 100 MB = 500 MB? (并且2 * 100 MB = 200 MB,托管应用程序比本机应用程序运行速度慢70%?)

Do you know if current (i.e. latest Java VM's and .NET 4.0's) garbage collectors suffer the same problems described in the aforementioned article? Has the performance of modern garbage collectors improved?

您是否知道当前(即最新的Java VM和.NET 4.0)垃圾收集器是否遇到上述文章中描述的相同问题?现代垃圾收集器的性能有所改善吗?

Thanks.

5 个解决方案

#1


7  

You seem to be asking two things:

你好像在问两件事:

  • have GC's improved since that research was performed, and
  • 自从进行了研究以来GC的改进了

  • can I use the conclusions of the paper as a formula to predict required memory.
  • 我可以使用论文的结论作为预测所需记忆的公式。

The answer to the first is that there have been no major breakthroughs in GC algorithms that would invalidate the general conclusions:

第一个问题的答案是GC算法没有重大突破会导致一般结论无效:

  • GC'ed memory management still requires significantly (e.g. 3 to 5 times) more virtual memory.
  • GC的内存管理仍然需要显着(例如3到5倍)的虚拟内存。

  • If you try to constrain the heap size the GC performance drops significantly.
  • 如果您尝试约束堆大小,则GC性能会显着下降。

  • If real memory is restricted, the GC'ed memory management approach results in substantially worse performance due to paging overheads.
  • 如果实际内存受到限制,GC的内存管理方法会因寻呼开销而导致性能大幅下降。

However, the conclusions cannot really be used as a formula:

但是,结论不能真正用作公式:

  • The original study was done with JikesRVM rather than a Sun JVM.
  • 最初的研究是使用JikesRVM而不是Sun JVM完成的。

  • The Sun JVM's garbage collectors have improved in the ~5 years since the study.
  • 自研究以来,Sun JVM的垃圾收集器在5年内得到了改进。

  • The study does not seem to take into account that Java data structures take more space than equivalent C++ data structures for reasons that are not GC related.
  • 该研究似乎没有考虑到Java数据结构比同等C ++数据结构占用更多空间,原因与GC无关。

On the last point, I have seen a presentation by someone that talks about Java memory overheads. For instance, it found that the minimum representation size of a Java String is something like 48 bytes. (A String consists of two primitive objects; one an Object with 4 word-sized fields and the other an array with a minimum of 1 word of content. Each primitive object also has 3 or 4 words of overhead.) Java collection data structures similarly use far more memory than people realize.

最后一点,我看过有人谈论Java内存开销的演讲。例如,它发现Java String的最小表示大小类似于48个字节。 (一个String由两个原始对象组成;一个是一个具有4个字大小字段的对象,另一个是一个至少包含1个字的内容的数组。每个原始对象也有3或4个字的开销。)Java集合数据结构类似使用比人们意识到的更多的记忆。

These overheads are not GC-related per se. Rather they are direct and indirect consequences of design decisions in the Java language, JVM and class libraries. For example:

这些开销本身与GC无关。相反,它们是Java语言,JVM和类库中设计决策的直接和间接后果。例如:

  • Each Java primitive object header1 reserves one word for the object's "identity hashcode" value, and one or more words for representing the object lock.
  • 每个Java原始对象header1为对象的“身份哈希码”值保留一个单词,并为表示对象锁定保留一个或多个单词。

  • The representation of a String has to use a separate "array of characters" because of JVM limitations. Two of the three other fields are an attempt to make the substring operation less memory intensive.
  • 由于JVM的限制,String的表示必须使用单独的“字符数组”。其他三个字段中的两个是尝试使子字符串操作减少内存密集。

  • The Java collection types use a lot of memory because collection elements cannot be directly chained. So for example, the overheads of a (hypothetical) singly linked list collection class in Java would be 6 words per list element. By contrast an optimal C/C++ linked list (i.e. with each element having a "next" pointer) has an overhead of one word per list element.
  • Java集合类型使用大量内存,因为集合元素不能直接链接。因此,例如,Java中的(假设的)单链表集合类的开销将是每个列表元素6个字。相反,最佳C / C ++链表(即每个元素具有“下一个”指针)具有每个列表元素一个字的开销。


1 - In fact, the overheads are less than this on average. The JVM only "inflates" a lock following use & contention, and similar tricks are used for the identity hashcode. The fixed overhead is only a few bits. However, these bits add up to a measurably larger object header ... which is the real point here.

1 - 实际上,平均开销低于平均水平。 JVM仅在使用和争用后“膨胀”锁,并且类似的技巧用于身份哈希码。固定开销只有几位。然而,这些位累加到一个可测量的更大的对象标题......这是真正的重点。

#2


9  

if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB? (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?)

如果我有一个用本机C ++编写需要100 MB内存的应用程序来实现与“托管”(即基于垃圾收集器)语言(例如Java,C#)相同的性能,应用程序应该需要5 * 100 MB = 500 MB ? (并且2 * 100 MB = 200 MB,托管应用程序比本机应用程序运行速度慢70%?)

Only if the app is bottlenecked on allocating and deallocating memory. Note that the paper talks exclusively about the performance of the garbage collector itself.

仅当应用程序在分配和释放内存时遇到瓶颈。请注意,该文件专门讨论垃圾收集器本身的性能。

#3


4  

Michael Borgwardt is kind of right about if the application is bottlenecked on allocating memory. This is according to Amdahl's law.

如果应用程序在分配内存方面存在瓶颈,Michael Borgwardt是正确的。这符合阿姆达尔定律。

However, I have used C++, Java, and VB .NET. In C++ there are powerful techniques available that allocate memory on the stack instead of the heap. Stack allocation is easily a hundreds of times faster than heap allocation. I would say that use of these techniques could remove maybe one allocation in eight, and use of writable strings one allocation in four.

但是,我使用过C ++,Java和VB .NET。在C ++中,有一些强大的技术可用于在堆栈而不是堆上分配内存。堆栈分配很容易比堆分配快几百倍。我会说使用这些技术可以删除八个中的一个分配,并使用四个分配的可写字符串。

It's no joke when people claim highly optimized C++ code can trounce the best possible Java code. It's the flat out truth.

当人们声称高度优化的C ++代码可以破坏最好的Java代码时,这不是开玩笑。这是彻头彻尾的事实。

Microsoft claims the overhead in using any of the .NET family of languages over C++ is about two to one. I believe that number is just about right for most things.

微软声称使用任何.NET系列语言而不是C ++的开销大约是二比一。我相信这个数字对于大多数事情来说都是正确的。

HOWEVER, managed environments carry a particular benefit in that when dealing with inferior programmers you don't have to worry about one module trashing another module's memory and the resulting crash being blamed on the wrong developer and the bug difficult to find.

然而,托管环境带来了特别的好处,因为在与劣等程序员打交道时,您不必担心一个模块会破坏另一个模块的内存,并且导致崩溃被归咎于错误的开发人员并且很难找到错误。

#4


3  

At least as I read it, your real question is whether there have been significant developments in garbage collection or manual memory management since that paper was published that would invalidate its results. The answer to that is somewhat mixed. On one hand, the vendors who provide garbage collectors do tune them so their performance tends to improve over time. On the other hand, there hasn't been anything like a major breakthroughs such as major new garbage collection algorithms.

至少在我读到它时,你真正的问题是垃圾收集或手动内存管理是否有重大发展,因为该论文的发布会使其结果无效。对此的答案有点混乱。一方面,提供垃圾收集器的供应商会调整它们,因此它们的性能会随着时间的推移而改善。另一方面,没有像重大新垃圾收集算法这样的重大突破。

Manual heap managers generally improve over time as well. I doubt most are tuned with quite the regularity of garbage collectors, but in the course of 5 years, probably most have had at least a bit of work done.

手动堆管理器通常也随着时间的推移而改进。我怀疑大多数人都对垃圾收集器的规律性进行了调整,但是在5年的时间里,大多数人都至少完成了一些工作。

In short, both have undoubtedly improved at least a little, but in neither case have there been major new algorithms that change the fundamental landscape. It's doubtful that current implementations will give a difference of exactly 17% as quoted in the article, but there's a pretty good chance that if you repeated the tests today, you'd still get a difference somewhere around 15-20% or so. The differences between then and now are probably smaller than the differences between some of the different algorithms they tested at that time.

简而言之,两者无疑都至少有所改善,但两种情况都没有改变基本面的重要新算法。令人怀疑的是,目前的实施方案会给出文章中引用的17%的差异,但是如果你今天重复测试,你仍然可以获得大约15-20%左右的差异。当时和现在之间的差异可能小于他们当时测试的一些不同算法之间的差异。

#5


3  

I am not sure how relivent your question still is today. A performance critical application shouldn't spend a sigificant portion of its time doing object creation (as the micro-benchmark is very likely to do) and the performance on modern systems is more likely to be determined by how well the application fits into the CPUs cache, rather than how much main memory it uses.

我不确定你今天的问题是多么的重要。性能关键应用程序不应该花费大量时间来创建对象(因为微基准测试非常可能),现代系统的性能更可能取决于应用程序与CPU的匹配程度。缓存,而不是它使用多少主内存。

BTW: There are lots of ticks you can do in C++ which support this which are not available in Java.

顺便说一句:你可以在C ++中做很多滴答,它们支持Java,这是Java中没有的。

If you are worried about the cost of GC or object creation, you can take steps to minimise how many objects you create. This is generally a good idea where performance is critical in any language.

如果您担心GC或对象创建的成本,您可以采取措施来最小化您创建的对象数量。这通常是一个好主意,在任何语言中性能都至关重要。

The cost of main memory isn't as much of an issue as it used to me. A machine with 48 GB is relatively cheap these days. An 8 core server with 48 GB of main memory can be leased for £9/day. Try hiring a developer for £9/d. ;) However, what is still relatively expensive is CPU cache memory. It is fairly hard to find a system with more than 16 MB of CPU cache. c.f. 48,000 MB of main memory. A system performs much better when an application is using its CPU cache and this is the amount of memory to consider if performance is critical.

主内存的成本并不像我以前那么大。目前,48 GB的机器相对便宜。具有48 GB主内存的8核服务器可以每天9英镑租用。尝试以9英镑/天的价格聘请开发人员。 ;)然而,仍然相对昂贵的是CPU缓存。找到一个超过16 MB CPU缓存的系统是相当困难的。 C.F. 48,000 MB的主内存。当应用程序使用其CPU缓存时,系统执行得更好,这是性能至关重要时要考虑的内存量。

#1


7  

You seem to be asking two things:

你好像在问两件事:

  • have GC's improved since that research was performed, and
  • 自从进行了研究以来GC的改进了

  • can I use the conclusions of the paper as a formula to predict required memory.
  • 我可以使用论文的结论作为预测所需记忆的公式。

The answer to the first is that there have been no major breakthroughs in GC algorithms that would invalidate the general conclusions:

第一个问题的答案是GC算法没有重大突破会导致一般结论无效:

  • GC'ed memory management still requires significantly (e.g. 3 to 5 times) more virtual memory.
  • GC的内存管理仍然需要显着(例如3到5倍)的虚拟内存。

  • If you try to constrain the heap size the GC performance drops significantly.
  • 如果您尝试约束堆大小,则GC性能会显着下降。

  • If real memory is restricted, the GC'ed memory management approach results in substantially worse performance due to paging overheads.
  • 如果实际内存受到限制,GC的内存管理方法会因寻呼开销而导致性能大幅下降。

However, the conclusions cannot really be used as a formula:

但是,结论不能真正用作公式:

  • The original study was done with JikesRVM rather than a Sun JVM.
  • 最初的研究是使用JikesRVM而不是Sun JVM完成的。

  • The Sun JVM's garbage collectors have improved in the ~5 years since the study.
  • 自研究以来,Sun JVM的垃圾收集器在5年内得到了改进。

  • The study does not seem to take into account that Java data structures take more space than equivalent C++ data structures for reasons that are not GC related.
  • 该研究似乎没有考虑到Java数据结构比同等C ++数据结构占用更多空间,原因与GC无关。

On the last point, I have seen a presentation by someone that talks about Java memory overheads. For instance, it found that the minimum representation size of a Java String is something like 48 bytes. (A String consists of two primitive objects; one an Object with 4 word-sized fields and the other an array with a minimum of 1 word of content. Each primitive object also has 3 or 4 words of overhead.) Java collection data structures similarly use far more memory than people realize.

最后一点,我看过有人谈论Java内存开销的演讲。例如,它发现Java String的最小表示大小类似于48个字节。 (一个String由两个原始对象组成;一个是一个具有4个字大小字段的对象,另一个是一个至少包含1个字的内容的数组。每个原始对象也有3或4个字的开销。)Java集合数据结构类似使用比人们意识到的更多的记忆。

These overheads are not GC-related per se. Rather they are direct and indirect consequences of design decisions in the Java language, JVM and class libraries. For example:

这些开销本身与GC无关。相反,它们是Java语言,JVM和类库中设计决策的直接和间接后果。例如:

  • Each Java primitive object header1 reserves one word for the object's "identity hashcode" value, and one or more words for representing the object lock.
  • 每个Java原始对象header1为对象的“身份哈希码”值保留一个单词,并为表示对象锁定保留一个或多个单词。

  • The representation of a String has to use a separate "array of characters" because of JVM limitations. Two of the three other fields are an attempt to make the substring operation less memory intensive.
  • 由于JVM的限制,String的表示必须使用单独的“字符数组”。其他三个字段中的两个是尝试使子字符串操作减少内存密集。

  • The Java collection types use a lot of memory because collection elements cannot be directly chained. So for example, the overheads of a (hypothetical) singly linked list collection class in Java would be 6 words per list element. By contrast an optimal C/C++ linked list (i.e. with each element having a "next" pointer) has an overhead of one word per list element.
  • Java集合类型使用大量内存,因为集合元素不能直接链接。因此,例如,Java中的(假设的)单链表集合类的开销将是每个列表元素6个字。相反,最佳C / C ++链表(即每个元素具有“下一个”指针)具有每个列表元素一个字的开销。


1 - In fact, the overheads are less than this on average. The JVM only "inflates" a lock following use & contention, and similar tricks are used for the identity hashcode. The fixed overhead is only a few bits. However, these bits add up to a measurably larger object header ... which is the real point here.

1 - 实际上,平均开销低于平均水平。 JVM仅在使用和争用后“膨胀”锁,并且类似的技巧用于身份哈希码。固定开销只有几位。然而,这些位累加到一个可测量的更大的对象标题......这是真正的重点。

#2


9  

if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB? (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?)

如果我有一个用本机C ++编写需要100 MB内存的应用程序来实现与“托管”(即基于垃圾收集器)语言(例如Java,C#)相同的性能,应用程序应该需要5 * 100 MB = 500 MB ? (并且2 * 100 MB = 200 MB,托管应用程序比本机应用程序运行速度慢70%?)

Only if the app is bottlenecked on allocating and deallocating memory. Note that the paper talks exclusively about the performance of the garbage collector itself.

仅当应用程序在分配和释放内存时遇到瓶颈。请注意,该文件专门讨论垃圾收集器本身的性能。

#3


4  

Michael Borgwardt is kind of right about if the application is bottlenecked on allocating memory. This is according to Amdahl's law.

如果应用程序在分配内存方面存在瓶颈,Michael Borgwardt是正确的。这符合阿姆达尔定律。

However, I have used C++, Java, and VB .NET. In C++ there are powerful techniques available that allocate memory on the stack instead of the heap. Stack allocation is easily a hundreds of times faster than heap allocation. I would say that use of these techniques could remove maybe one allocation in eight, and use of writable strings one allocation in four.

但是,我使用过C ++,Java和VB .NET。在C ++中,有一些强大的技术可用于在堆栈而不是堆上分配内存。堆栈分配很容易比堆分配快几百倍。我会说使用这些技术可以删除八个中的一个分配,并使用四个分配的可写字符串。

It's no joke when people claim highly optimized C++ code can trounce the best possible Java code. It's the flat out truth.

当人们声称高度优化的C ++代码可以破坏最好的Java代码时,这不是开玩笑。这是彻头彻尾的事实。

Microsoft claims the overhead in using any of the .NET family of languages over C++ is about two to one. I believe that number is just about right for most things.

微软声称使用任何.NET系列语言而不是C ++的开销大约是二比一。我相信这个数字对于大多数事情来说都是正确的。

HOWEVER, managed environments carry a particular benefit in that when dealing with inferior programmers you don't have to worry about one module trashing another module's memory and the resulting crash being blamed on the wrong developer and the bug difficult to find.

然而,托管环境带来了特别的好处,因为在与劣等程序员打交道时,您不必担心一个模块会破坏另一个模块的内存,并且导致崩溃被归咎于错误的开发人员并且很难找到错误。

#4


3  

At least as I read it, your real question is whether there have been significant developments in garbage collection or manual memory management since that paper was published that would invalidate its results. The answer to that is somewhat mixed. On one hand, the vendors who provide garbage collectors do tune them so their performance tends to improve over time. On the other hand, there hasn't been anything like a major breakthroughs such as major new garbage collection algorithms.

至少在我读到它时,你真正的问题是垃圾收集或手动内存管理是否有重大发展,因为该论文的发布会使其结果无效。对此的答案有点混乱。一方面,提供垃圾收集器的供应商会调整它们,因此它们的性能会随着时间的推移而改善。另一方面,没有像重大新垃圾收集算法这样的重大突破。

Manual heap managers generally improve over time as well. I doubt most are tuned with quite the regularity of garbage collectors, but in the course of 5 years, probably most have had at least a bit of work done.

手动堆管理器通常也随着时间的推移而改进。我怀疑大多数人都对垃圾收集器的规律性进行了调整,但是在5年的时间里,大多数人都至少完成了一些工作。

In short, both have undoubtedly improved at least a little, but in neither case have there been major new algorithms that change the fundamental landscape. It's doubtful that current implementations will give a difference of exactly 17% as quoted in the article, but there's a pretty good chance that if you repeated the tests today, you'd still get a difference somewhere around 15-20% or so. The differences between then and now are probably smaller than the differences between some of the different algorithms they tested at that time.

简而言之,两者无疑都至少有所改善,但两种情况都没有改变基本面的重要新算法。令人怀疑的是,目前的实施方案会给出文章中引用的17%的差异,但是如果你今天重复测试,你仍然可以获得大约15-20%左右的差异。当时和现在之间的差异可能小于他们当时测试的一些不同算法之间的差异。

#5


3  

I am not sure how relivent your question still is today. A performance critical application shouldn't spend a sigificant portion of its time doing object creation (as the micro-benchmark is very likely to do) and the performance on modern systems is more likely to be determined by how well the application fits into the CPUs cache, rather than how much main memory it uses.

我不确定你今天的问题是多么的重要。性能关键应用程序不应该花费大量时间来创建对象(因为微基准测试非常可能),现代系统的性能更可能取决于应用程序与CPU的匹配程度。缓存,而不是它使用多少主内存。

BTW: There are lots of ticks you can do in C++ which support this which are not available in Java.

顺便说一句:你可以在C ++中做很多滴答,它们支持Java,这是Java中没有的。

If you are worried about the cost of GC or object creation, you can take steps to minimise how many objects you create. This is generally a good idea where performance is critical in any language.

如果您担心GC或对象创建的成本,您可以采取措施来最小化您创建的对象数量。这通常是一个好主意,在任何语言中性能都至关重要。

The cost of main memory isn't as much of an issue as it used to me. A machine with 48 GB is relatively cheap these days. An 8 core server with 48 GB of main memory can be leased for £9/day. Try hiring a developer for £9/d. ;) However, what is still relatively expensive is CPU cache memory. It is fairly hard to find a system with more than 16 MB of CPU cache. c.f. 48,000 MB of main memory. A system performs much better when an application is using its CPU cache and this is the amount of memory to consider if performance is critical.

主内存的成本并不像我以前那么大。目前,48 GB的机器相对便宜。具有48 GB主内存的8核服务器可以每天9英镑租用。尝试以9英镑/天的价格聘请开发人员。 ;)然而,仍然相对昂贵的是CPU缓存。找到一个超过16 MB CPU缓存的系统是相当困难的。 C.F. 48,000 MB的主内存。当应用程序使用其CPU缓存时,系统执行得更好,这是性能至关重要时要考虑的内存量。