
时间:2022-04-07 17:20:47

The default JVM parameters are not optimal for running large applications. Any insights from people who have tuned it on a real application would be helpful. We are running the application on a 32-bit windows machine, where the client JVM is used by default. We have added -server and changed the NewRatio to 1:3 (A larger young generation).


Any other parameters/tuning which you have tried and found useful?


[Update] The specific type of application I'm talking about is a server application that are rarely shutdown, taking at least -Xmx1024m. Also assume that the application is profiled already. I'm looking for general guidelines in terms of JVM performance only.


7 个解决方案



There are great quantities of that information around.


First, profile the code before tuning the JVM.


Second, read the JVM documentation carefully; there are a lot of sort of "urban legends" around. For example, the -server flag only helps if the JVM is staying resident and running for some time; -server "turns up" the JIT/HotSpot, and that needs to have many passes through the same path to get turned up. -server, on the other hand, slows initial execution of the JVM, as there's more setup time.

其次,仔细阅读JVM文档;周围有很多“城市传说”。例如,-server标志仅在JVM保持驻留并运行一段时间时才有用; -server“关闭”JIT / HotSpot,并且需要通过相同的路径进行多次传递才能启动。另一方面,-server减慢了JVM的初始执行速度,因为设置时间更长。

There are several good books and websites around. See, for example, http://www.javaperformancetuning.com/






Been at a Java shop. Spent entire months dedicated to running performance tests on distributed systems, the main apps being in Java. Some of which implying products developed and sold by Sun themselves (then Oracle).


I will go over the lessons I learned, some history about the JVM, some talks about the internals, a couple of parameters explained and finally some tuning. Trying to keep it to the point so you can apply it in practice.


Things are changing fast in the Java world so part of it might be already outdated since the last year I've done all that. (Is Java 10 out already?)

Java世界的情况正在快速变化,因此自从去年我完成所有这些工作以来,其中一部分可能已经过时了。 (Java 10已经出来了吗?)

Good Practices

What you SHOULD do: benchmark, Benchmark, BENCHMARK!

When you really need to know about performances, you need to perform real benchmarks, specific to your workload. There is no alternatives.


Also, you should monitor the JVM. Enable monitoring. The good applications usually provide a monitoring web page and/or an API. Otherwise there is the common Java tooling (JVisualVM, JMX, hprof, and some JVM flags).


Be aware that there is usually no performance to gain by tuning the JVM. It's more a "to crash or not to crash, finding the transition point". It's about knowing that when you give that amount of resources to your application, you can consistently expect that amount of performances in return. Knowledge is power.


Performances is mostly dictated by your application. If you want faster, you gotta write better code.


What you WILL do most of the time: Live with reliable sensitive defaults

We don't get time to optimize and tune every single application out there. Most of the time we'll simply live with sensible defaults.


The first thing to do when configuring a new application is to read the documentation. Most of the serious applications comes with a guide for performance tuning, including advice on JVM settings.


Then you can configure the application: JAVA_OPTS: -server -Xms???g -Xmx???g

然后你可以配置应用程序:JAVA_OPTS:-server -Xms ??? g -Xmx ??? g

  • -server: enable full optimizations (this flag is automatic on most JVM nowadays)
  • -server:启用完全优化(此标志现在在大多数JVM上是自动的)

  • -Xms -Xmx: set the minimum and maximum heap (always the same value for both, that's about the only optimizations to do).
  • -Xms -Xmx:设置最小和最大堆(两者总是相同的值,这是唯一要做的优化)。

Well done, you know about all the optimization parameters there is to know about the JVM, congratulations! That was simple :D


What you SHALL NOT do, EVER:

Please do NOT copy random string you found on the internet, especially when they take multiple lines like that:


-server  -Xms1g -Xmx1g  -XX:PermSize=1g -XX:MaxPermSize=256m  -Xmn256m -Xss64k  -XX:SurvivorRatio=30  -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled  -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=10  -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark  -XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -Dsun.net.inetaddr.ttl=5  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=`date`.hprof   -Dcom.sun.management.jmxremote.port=5616 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -server -Xms2g -Xmx2g -XX:MaxPermSize=256m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC

For instance, this thing found on the first page of google is plain terrible. There are arguments specified multiples times with conflicting values. Some are just forcing the JVM defaults (eventually the defaults from 2 JVM versions ago). A few are obsolete and simply ignored. And finaly at least one parameter is so invalid that it will consistently crash the JVM at startup by it's mere existence.


Actual tuning

How do you choose the memory size:

Read the guide from your application, it should give some indication. Monitor production and adjust afterwards. Perform some benchmarks if you need accuracy.


Important Note: The java process will take up to max heap PLUS 10%. The X% overhead being the heap management, not included in the heap itself.

重要说明:java进程最多需要10%的最大堆。 X%开销是堆管理,不包含在堆本身中。

All the memory is usually preallocated by the process on startup. You may see the process using max heap ALL THE TIME. It's simply not true. You need to use Java monitoring tools to see what is really being used.

所有内存通常由启动时的进程预先分配。您可以使用max heap ALL THE TIME查看进程。这根本不是真的。您需要使用Java监视工具来查看实际使用的内容。

Finding the right size:


  • If it crashes with OutOfMemoryException, it ain't enough memory
  • 如果它与OutOfMemoryException崩溃,则内存不足

  • If it doesn't crash with OutOfMemoryException, it's too much memory
  • 如果它没有与OutOfMemoryException崩溃,那就是内存太多了

  • If it's too much memory BUT the hardware got it and/or is already paid for, it's the perfect number, job done!
  • 如果内存太多,但是硬件得到它和/或已经付费,那就是完美的数字,完成工作!

JVM6 is bronze, JVM7 is gold, JVM8 is platinum...

The JVM is forever improving. Garbage Collection is a very complex thing and there are a lot of very smart people working on it. It had tremendous improvements in the past decade and it will continue to do so.


For informational purpose. They are at least 4 available Garbage Collectors in Oracle Java 7-8 (HotSpot) and OpenJDK 7-8. (Other JVM may be entirely different e.g. Android, IBM, embedded):

用于提供信息。它们是Oracle Java 7-8(HotSpot)和OpenJDK 7-8中至少4个可用的垃圾收集器。 (其他JVM可能完全不同,例如Android,IBM,嵌入式):

  • SerialGC
  • ParallelGC
  • ConcurrentMarkSweepGC
  • G1GC
  • (plus variants and settings)
  • (加上变种和设置)

[Starting from Java 7 and onward. The Oracle and OpenJDK code are partially shared. The GC should be (mostly) the same on both platforms.]

[从Java 7开始,然后继续。 Oracle和OpenJDK代码是部分共享的。 GC(在大多数情况下)应该在两个平台上都相同。

JVM >= 7 have many optimizations and pick decent defaults. It changes a bit by platform. It balances multiple things. For instance deciding to enable multicore optimizations or not whether the CPU has multiple cores. You should let it do it. Do not change or force GC settings.

JVM> = 7有很多优化并且选择了不错的默认值。它按平台改变了一点。它平衡了很多东西。例如,决定是否启用多核优化,而不是CPU是否具有多个核心。你应该让它做到。请勿更改或强制GC设置。

It's okay to let the computer takes decision for you (that's what computers are for). It's better to have the JVM settings being 95%-optimal all the time than forcing a "always 8 core aggressive collection for lower pause times" on all the boxes, half of them being t2.small in the end.


Exception: When the application comes with a performance guide and specific tuning in place. It's perfectly okay to leave the provided settings as is.


Tip: Moving to a newer JVM to benefit from the latest improvements can sometimes provide a good boost without much effort.


Special Case: -XX:+UseCompressedOops

The JVM has a special setting that forces using 32bits index internally (read: pointers-like). That allows to address 4 294 967 295 objects * 8 bytes address => 32 GB of memory. (NOT to be confused with the 4GB address space for REAL pointers).

JVM有一个特殊的设置,强制在内部使用32位索引(读取:指针式)。这允许寻址4 294 967 295个对象* 8个字节地址=> 32 GB的内存。 (不要与REAL指针的4GB地址空间混淆)。

It reduces the overall memory consumption with a potential positive impact on all caching levels.


Real life example: ElasticSearch documentation states that a running 32GB 32bits node may be equivalent to a 40GB 64bits node in terms of actual data kept in memory.

现实生活中的例子:ElasticSearch文档指出,就内存中保存的实际数据而言,运行的32GB 32位节点可能相当于40GB的64位节点。

A note on history: The flag was known to be unstable in pre-java-7 era (maybe even pre-java-6). It's been working perfectly in newer JVM for a while.


Java HotSpot™Virtual Machine Performance Enhancements

Java HotSpot™虚拟机性能增强

[...] In Java SE 7, use of compressed oops is the default for 64-bit JVM processes when -Xmx isn't specified and for values of -Xmx less than 32 gigabytes. For JDK 6 before the 6u23 release, use the -XX:+UseCompressedOops flag with the java command to enable the feature.

[...]在Java SE 7中,当未指定-Xmx且-Xmx值小于32千兆字节时,使用压缩oops是64位JVM进程的缺省值。对于6u23发行版之前的JDK 6,请使用-XX:+ UseCompressedOops标志和java命令来启用该功能。

See: Once again the JVM is lights years ahead over manual tuning. Still, it's interesting to know about it =)


Special Case: -XX:+UseNUMA

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, the memory access time depends on the memory location relative to the processor. Source: Wikipedia


Modern systems have extremely complex memory architectures with multiple layers of memory and caches, either private and shared, across cores and CPU.


Quite obviously accessing a data in the L2 cache in the current processor is A LOT faster than having to go all the way to a memory stick from another socket.


I believe that all multi-socket systems sold today are NUMA by design, while all consumers systems are NOT. Check whether your server supports NUMA with the command numactl --show on linux.

我相信今天销售的所有多插槽系统都是NUMA设计,而所有消费者系统都不是。使用linux上的命令numactl --show检查您的服务器是否支持NUMA。

The NUMA-aware flag tells the JVM to optimize memory allocations for the underlying hardware topology.


The performance boost can be substantial (i.e. two digits: +XX%). In fact someone switching from a "NOT-NUMA 10CPU 100GB" to a "NUMA 40CPU 400GB" might experience a [dramatic] loss in performances if he doesn't know about the flag.

性能提升可能很大(即两位数:+ XX%)。实际上有人从“NOT-NUMA 10CPU 100GB”切换到“NUMA 40CPU 400GB”如果他不了解旗帜,可能会遇到[戏剧性]性能损失。

Note: There are discussions to detect NUMA and set the flag automatically in the JVM http://openjdk.java.net/jeps/163


Bonus: All applications intending to run on big fat hardware (i.e. NUMA) needs to be optimized for it. It is not specific to Java applications.


Toward the future: -XX:+UseG1GC

The latest improvement in Garbage Collection is the G1 collector (read: Garbage First).

垃圾收集的最新改进是G1收集器(阅读:Garbage First)。

It is intended for high cores, high memory systems. At the absolute minimum 4 cores + 6 GB memory. It is targeted toward databases and memory intensive applications using 10 times that and beyond.

它适用于高内核,高内存系统。绝对最少4核+ 6 GB内存。它使用10次以上的数据库和内存密集型应用程序。

Short version, at these sizes the traditional GC are facing too much data to process at once and pauses are getting out of hand. The G1 splits the heap in many small sections that can be managed independently and in parallel while the application is running.

简短版本,在这些尺寸下,传统的GC面临着过多的数据需要立即处理,而暂停则失控。 G1在许多小部分中拆分堆,这些部分可以在应用程序运行时独立并行地进行管理。

The first version was available in 2013. It is mature enough for production now but it will not be going as default anytime soon. That is worth a try for large applications.


Do not touch: Generation Sizes (NewGen, PermGen...)

The GC split the memory in multiple sections. (Not getting into details, you can google "Java GC Generations".)

GC将内存分成多个部分。 (没有详细说明,你可以google“Java GC Generations”。)

The last time I've been spending a week to try 20 different combination of generations flags on an app taking 10000 hit/s. I was getting a magnificent boost ranging from -1% to +1%.

我最后一次花了一个星期的时间在一个应用程序上尝试20个不同的代组合标志,达到10000次/秒。我获得了-1%到+ 1%的惊人提升。

Java GC generations are an interesting topic to read papers on or to write one about. They are not a thing to tune unless you're part of the 1% who can devote substantial time for negligible gains among the 1% of people who really need optimizations.

Java GC代是一个有趣的主题,可以阅读论文或撰写论文。它们不是一个可以调整的东西,除非你是1%的人中的一员,他们可以在1%真正需要优化的人中投入大量时间获得微不足道的收益。


Hope this can help you. Have fun with the JVM.


Java is the best language and the best platform in the world! Go spread the love :D




Look here (or do a google search for hotspot tuning) http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html


You definitely want to profile your app before you try to tune the vm. NetBeans has a nice profiler built into it that will let you see all sorts of things.

在尝试调整虚拟机之前,您肯定想要分析您的应用。 NetBeans内置了一个很好的分析器,可以让你看到各种各样的东西。

I once had someone tell me that the GC was broken for their app - I looked at the code and found that they never closed any of their database query results so they were retaining massive amounts of byte arrays. Once we closed the results the time went from over 20 mins and a GB of memory to about 2 mins and a very small amount of memory. They were able to remove the JVM tuning parameters and things were happy.

我曾经有人告诉我,他们的应用程序中断了GC - 我查看了代码,发现他们从未关闭任何数据库查询结果,因此他们保留了大量的字节数组。一旦我们关闭结果,时间从超过20分钟和GB内存到大约2分钟和非常少量的内存。他们能够删除JVM调整参数,事情很开心。



I suggest you profile your application with CPU sampling and object allocation monitoring turned on at the same time. You will find you get very different results which can be helpful in tuning your code. Also try using the built in hprof profiler, it can give very different results as well.


In general profiling your application makes much more difference than JVM args.

一般来说,分析应用程序比JVM args有更大的不同。



The absolute best way to answer this is to perform controlled testing on the application in as close to a 'production' environment as you can create. It's quite possible that the use of -server, a reasonable starting heap size and the relatively smart behavior of recent JVMs will behave as well or better than the vast majority of settings one would normally try.


There is one specific exception to this broad generalization: in the case that you are running in a web container, there is a really high chance that you will want to increase the permanent generation settings.




Java on 32-bit windows machine, your choices are limited. In my experience, the follow parameter setting will impact the application performance:


  1. memory sizes
  2. choice of GC collectors
  3. 选择GC收集器

  4. parameters related to GC collectors
  5. 与GC收集器相关的参数



This will be highly dependent on your application and the vendor and version of the JVM. You need to be clear about what you consider to be a performance problem. Are you concerned with certain critical sections of code? Have you profiled the app yet? Is the JVM spending too much time garbage collecting?

这将高度依赖于您的应用程序以及JVM的供应商和版本。您需要明确您认为的性能问题。您是否关注代码的某些关键部分?你有没有想过应用程序? JVM是否花费太多时间进行垃圾收集?

I would probably start with the -verbose:gc JVM option to watch how garbage collecting is working. Many times, the simplest fix to just increase the max heap size with -Xmx . If you learn to interpret the -verbose:gc output, it will tell you nearly all you need to know about tuning the JVM as a whole. But doing this alone will not magically make badly tuned code just go faster. Most of the JVM tuning options are designed to improve the performance of the garbage collector and/or memory sizes.

我可能会从-verbose:gc JVM选项开始,观察垃圾收集是如何工作的。很多时候,最简单的解决方法是使用-Xmx增加最大堆大小。如果您学习解释-verbose:gc输出,它将告诉您几乎所有关于调整整个JVM的知识。但单独做这件事并不会让错误调整的代码变得更快。大多数JVM调优选项旨在提高垃圾收集器和/或内存大小的性能。

For profiling, I like yourkit.com




There are great quantities of that information around.


First, profile the code before tuning the JVM.


Second, read the JVM documentation carefully; there are a lot of sort of "urban legends" around. For example, the -server flag only helps if the JVM is staying resident and running for some time; -server "turns up" the JIT/HotSpot, and that needs to have many passes through the same path to get turned up. -server, on the other hand, slows initial execution of the JVM, as there's more setup time.

其次,仔细阅读JVM文档;周围有很多“城市传说”。例如,-server标志仅在JVM保持驻留并运行一段时间时才有用; -server“关闭”JIT / HotSpot,并且需要通过相同的路径进行多次传递才能启动。另一方面,-server减慢了JVM的初始执行速度,因为设置时间更长。

There are several good books and websites around. See, for example, http://www.javaperformancetuning.com/






Been at a Java shop. Spent entire months dedicated to running performance tests on distributed systems, the main apps being in Java. Some of which implying products developed and sold by Sun themselves (then Oracle).


I will go over the lessons I learned, some history about the JVM, some talks about the internals, a couple of parameters explained and finally some tuning. Trying to keep it to the point so you can apply it in practice.


Things are changing fast in the Java world so part of it might be already outdated since the last year I've done all that. (Is Java 10 out already?)

Java世界的情况正在快速变化,因此自从去年我完成所有这些工作以来,其中一部分可能已经过时了。 (Java 10已经出来了吗?)

Good Practices

What you SHOULD do: benchmark, Benchmark, BENCHMARK!

When you really need to know about performances, you need to perform real benchmarks, specific to your workload. There is no alternatives.


Also, you should monitor the JVM. Enable monitoring. The good applications usually provide a monitoring web page and/or an API. Otherwise there is the common Java tooling (JVisualVM, JMX, hprof, and some JVM flags).


Be aware that there is usually no performance to gain by tuning the JVM. It's more a "to crash or not to crash, finding the transition point". It's about knowing that when you give that amount of resources to your application, you can consistently expect that amount of performances in return. Knowledge is power.


Performances is mostly dictated by your application. If you want faster, you gotta write better code.


What you WILL do most of the time: Live with reliable sensitive defaults

We don't get time to optimize and tune every single application out there. Most of the time we'll simply live with sensible defaults.


The first thing to do when configuring a new application is to read the documentation. Most of the serious applications comes with a guide for performance tuning, including advice on JVM settings.


Then you can configure the application: JAVA_OPTS: -server -Xms???g -Xmx???g

然后你可以配置应用程序:JAVA_OPTS:-server -Xms ??? g -Xmx ??? g

  • -server: enable full optimizations (this flag is automatic on most JVM nowadays)
  • -server:启用完全优化(此标志现在在大多数JVM上是自动的)

  • -Xms -Xmx: set the minimum and maximum heap (always the same value for both, that's about the only optimizations to do).
  • -Xms -Xmx:设置最小和最大堆(两者总是相同的值,这是唯一要做的优化)。

Well done, you know about all the optimization parameters there is to know about the JVM, congratulations! That was simple :D


What you SHALL NOT do, EVER:

Please do NOT copy random string you found on the internet, especially when they take multiple lines like that:


-server  -Xms1g -Xmx1g  -XX:PermSize=1g -XX:MaxPermSize=256m  -Xmn256m -Xss64k  -XX:SurvivorRatio=30  -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled  -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=10  -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark  -XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -Dsun.net.inetaddr.ttl=5  -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=`date`.hprof   -Dcom.sun.management.jmxremote.port=5616 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -server -Xms2g -Xmx2g -XX:MaxPermSize=256m -XX:NewRatio=1 -XX:+UseConcMarkSweepGC

For instance, this thing found on the first page of google is plain terrible. There are arguments specified multiples times with conflicting values. Some are just forcing the JVM defaults (eventually the defaults from 2 JVM versions ago). A few are obsolete and simply ignored. And finaly at least one parameter is so invalid that it will consistently crash the JVM at startup by it's mere existence.


Actual tuning

How do you choose the memory size:

Read the guide from your application, it should give some indication. Monitor production and adjust afterwards. Perform some benchmarks if you need accuracy.


Important Note: The java process will take up to max heap PLUS 10%. The X% overhead being the heap management, not included in the heap itself.

重要说明:java进程最多需要10%的最大堆。 X%开销是堆管理,不包含在堆本身中。

All the memory is usually preallocated by the process on startup. You may see the process using max heap ALL THE TIME. It's simply not true. You need to use Java monitoring tools to see what is really being used.

所有内存通常由启动时的进程预先分配。您可以使用max heap ALL THE TIME查看进程。这根本不是真的。您需要使用Java监视工具来查看实际使用的内容。

Finding the right size:


  • If it crashes with OutOfMemoryException, it ain't enough memory
  • 如果它与OutOfMemoryException崩溃,则内存不足

  • If it doesn't crash with OutOfMemoryException, it's too much memory
  • 如果它没有与OutOfMemoryException崩溃,那就是内存太多了

  • If it's too much memory BUT the hardware got it and/or is already paid for, it's the perfect number, job done!
  • 如果内存太多,但是硬件得到它和/或已经付费,那就是完美的数字,完成工作!

JVM6 is bronze, JVM7 is gold, JVM8 is platinum...

The JVM is forever improving. Garbage Collection is a very complex thing and there are a lot of very smart people working on it. It had tremendous improvements in the past decade and it will continue to do so.


For informational purpose. They are at least 4 available Garbage Collectors in Oracle Java 7-8 (HotSpot) and OpenJDK 7-8. (Other JVM may be entirely different e.g. Android, IBM, embedded):

用于提供信息。它们是Oracle Java 7-8(HotSpot)和OpenJDK 7-8中至少4个可用的垃圾收集器。 (其他JVM可能完全不同,例如Android,IBM,嵌入式):

  • SerialGC
  • ParallelGC
  • ConcurrentMarkSweepGC
  • G1GC
  • (plus variants and settings)
  • (加上变种和设置)

[Starting from Java 7 and onward. The Oracle and OpenJDK code are partially shared. The GC should be (mostly) the same on both platforms.]

[从Java 7开始,然后继续。 Oracle和OpenJDK代码是部分共享的。 GC(在大多数情况下)应该在两个平台上都相同。

JVM >= 7 have many optimizations and pick decent defaults. It changes a bit by platform. It balances multiple things. For instance deciding to enable multicore optimizations or not whether the CPU has multiple cores. You should let it do it. Do not change or force GC settings.

JVM> = 7有很多优化并且选择了不错的默认值。它按平台改变了一点。它平衡了很多东西。例如,决定是否启用多核优化,而不是CPU是否具有多个核心。你应该让它做到。请勿更改或强制GC设置。

It's okay to let the computer takes decision for you (that's what computers are for). It's better to have the JVM settings being 95%-optimal all the time than forcing a "always 8 core aggressive collection for lower pause times" on all the boxes, half of them being t2.small in the end.


Exception: When the application comes with a performance guide and specific tuning in place. It's perfectly okay to leave the provided settings as is.


Tip: Moving to a newer JVM to benefit from the latest improvements can sometimes provide a good boost without much effort.


Special Case: -XX:+UseCompressedOops

The JVM has a special setting that forces using 32bits index internally (read: pointers-like). That allows to address 4 294 967 295 objects * 8 bytes address => 32 GB of memory. (NOT to be confused with the 4GB address space for REAL pointers).

JVM有一个特殊的设置,强制在内部使用32位索引(读取:指针式)。这允许寻址4 294 967 295个对象* 8个字节地址=> 32 GB的内存。 (不要与REAL指针的4GB地址空间混淆)。

It reduces the overall memory consumption with a potential positive impact on all caching levels.


Real life example: ElasticSearch documentation states that a running 32GB 32bits node may be equivalent to a 40GB 64bits node in terms of actual data kept in memory.

现实生活中的例子:ElasticSearch文档指出,就内存中保存的实际数据而言,运行的32GB 32位节点可能相当于40GB的64位节点。

A note on history: The flag was known to be unstable in pre-java-7 era (maybe even pre-java-6). It's been working perfectly in newer JVM for a while.


Java HotSpot™Virtual Machine Performance Enhancements

Java HotSpot™虚拟机性能增强

[...] In Java SE 7, use of compressed oops is the default for 64-bit JVM processes when -Xmx isn't specified and for values of -Xmx less than 32 gigabytes. For JDK 6 before the 6u23 release, use the -XX:+UseCompressedOops flag with the java command to enable the feature.

[...]在Java SE 7中,当未指定-Xmx且-Xmx值小于32千兆字节时,使用压缩oops是64位JVM进程的缺省值。对于6u23发行版之前的JDK 6,请使用-XX:+ UseCompressedOops标志和java命令来启用该功能。

See: Once again the JVM is lights years ahead over manual tuning. Still, it's interesting to know about it =)


Special Case: -XX:+UseNUMA

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, the memory access time depends on the memory location relative to the processor. Source: Wikipedia


Modern systems have extremely complex memory architectures with multiple layers of memory and caches, either private and shared, across cores and CPU.


Quite obviously accessing a data in the L2 cache in the current processor is A LOT faster than having to go all the way to a memory stick from another socket.


I believe that all multi-socket systems sold today are NUMA by design, while all consumers systems are NOT. Check whether your server supports NUMA with the command numactl --show on linux.

我相信今天销售的所有多插槽系统都是NUMA设计,而所有消费者系统都不是。使用linux上的命令numactl --show检查您的服务器是否支持NUMA。

The NUMA-aware flag tells the JVM to optimize memory allocations for the underlying hardware topology.


The performance boost can be substantial (i.e. two digits: +XX%). In fact someone switching from a "NOT-NUMA 10CPU 100GB" to a "NUMA 40CPU 400GB" might experience a [dramatic] loss in performances if he doesn't know about the flag.

性能提升可能很大(即两位数:+ XX%)。实际上有人从“NOT-NUMA 10CPU 100GB”切换到“NUMA 40CPU 400GB”如果他不了解旗帜,可能会遇到[戏剧性]性能损失。

Note: There are discussions to detect NUMA and set the flag automatically in the JVM http://openjdk.java.net/jeps/163


Bonus: All applications intending to run on big fat hardware (i.e. NUMA) needs to be optimized for it. It is not specific to Java applications.


Toward the future: -XX:+UseG1GC

The latest improvement in Garbage Collection is the G1 collector (read: Garbage First).

垃圾收集的最新改进是G1收集器(阅读:Garbage First)。

It is intended for high cores, high memory systems. At the absolute minimum 4 cores + 6 GB memory. It is targeted toward databases and memory intensive applications using 10 times that and beyond.

它适用于高内核,高内存系统。绝对最少4核+ 6 GB内存。它使用10次以上的数据库和内存密集型应用程序。

Short version, at these sizes the traditional GC are facing too much data to process at once and pauses are getting out of hand. The G1 splits the heap in many small sections that can be managed independently and in parallel while the application is running.

简短版本,在这些尺寸下,传统的GC面临着过多的数据需要立即处理,而暂停则失控。 G1在许多小部分中拆分堆,这些部分可以在应用程序运行时独立并行地进行管理。

The first version was available in 2013. It is mature enough for production now but it will not be going as default anytime soon. That is worth a try for large applications.


Do not touch: Generation Sizes (NewGen, PermGen...)

The GC split the memory in multiple sections. (Not getting into details, you can google "Java GC Generations".)

GC将内存分成多个部分。 (没有详细说明,你可以google“Java GC Generations”。)

The last time I've been spending a week to try 20 different combination of generations flags on an app taking 10000 hit/s. I was getting a magnificent boost ranging from -1% to +1%.

我最后一次花了一个星期的时间在一个应用程序上尝试20个不同的代组合标志,达到10000次/秒。我获得了-1%到+ 1%的惊人提升。

Java GC generations are an interesting topic to read papers on or to write one about. They are not a thing to tune unless you're part of the 1% who can devote substantial time for negligible gains among the 1% of people who really need optimizations.

Java GC代是一个有趣的主题,可以阅读论文或撰写论文。它们不是一个可以调整的东西,除非你是1%的人中的一员,他们可以在1%真正需要优化的人中投入大量时间获得微不足道的收益。


Hope this can help you. Have fun with the JVM.


Java is the best language and the best platform in the world! Go spread the love :D




Look here (or do a google search for hotspot tuning) http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html


You definitely want to profile your app before you try to tune the vm. NetBeans has a nice profiler built into it that will let you see all sorts of things.

在尝试调整虚拟机之前,您肯定想要分析您的应用。 NetBeans内置了一个很好的分析器,可以让你看到各种各样的东西。

I once had someone tell me that the GC was broken for their app - I looked at the code and found that they never closed any of their database query results so they were retaining massive amounts of byte arrays. Once we closed the results the time went from over 20 mins and a GB of memory to about 2 mins and a very small amount of memory. They were able to remove the JVM tuning parameters and things were happy.

我曾经有人告诉我,他们的应用程序中断了GC - 我查看了代码,发现他们从未关闭任何数据库查询结果,因此他们保留了大量的字节数组。一旦我们关闭结果,时间从超过20分钟和GB内存到大约2分钟和非常少量的内存。他们能够删除JVM调整参数,事情很开心。



I suggest you profile your application with CPU sampling and object allocation monitoring turned on at the same time. You will find you get very different results which can be helpful in tuning your code. Also try using the built in hprof profiler, it can give very different results as well.


In general profiling your application makes much more difference than JVM args.

一般来说,分析应用程序比JVM args有更大的不同。



The absolute best way to answer this is to perform controlled testing on the application in as close to a 'production' environment as you can create. It's quite possible that the use of -server, a reasonable starting heap size and the relatively smart behavior of recent JVMs will behave as well or better than the vast majority of settings one would normally try.


There is one specific exception to this broad generalization: in the case that you are running in a web container, there is a really high chance that you will want to increase the permanent generation settings.




Java on 32-bit windows machine, your choices are limited. In my experience, the follow parameter setting will impact the application performance:


  1. memory sizes
  2. choice of GC collectors
  3. 选择GC收集器

  4. parameters related to GC collectors
  5. 与GC收集器相关的参数



This will be highly dependent on your application and the vendor and version of the JVM. You need to be clear about what you consider to be a performance problem. Are you concerned with certain critical sections of code? Have you profiled the app yet? Is the JVM spending too much time garbage collecting?

这将高度依赖于您的应用程序以及JVM的供应商和版本。您需要明确您认为的性能问题。您是否关注代码的某些关键部分?你有没有想过应用程序? JVM是否花费太多时间进行垃圾收集?

I would probably start with the -verbose:gc JVM option to watch how garbage collecting is working. Many times, the simplest fix to just increase the max heap size with -Xmx . If you learn to interpret the -verbose:gc output, it will tell you nearly all you need to know about tuning the JVM as a whole. But doing this alone will not magically make badly tuned code just go faster. Most of the JVM tuning options are designed to improve the performance of the garbage collector and/or memory sizes.

我可能会从-verbose:gc JVM选项开始,观察垃圾收集是如何工作的。很多时候,最简单的解决方法是使用-Xmx增加最大堆大小。如果您学习解释-verbose:gc输出,它将告诉您几乎所有关于调整整个JVM的知识。但单独做这件事并不会让错误调整的代码变得更快。大多数JVM调优选项旨在提高垃圾收集器和/或内存大小的性能。

For profiling, I like yourkit.com
