We noticed occasional full GC’s with G1 garbage collector with concurrent-mark overflow. Once, there is a concurrent-mark-reset-for-overflow, this overflow will continue in the next concurrent mark phases. Eventually, it leads to the full GC since the concurrent mark seems no longer working.
我们注意到,偶尔会有完整的GC与G1垃圾收集器一起使用concurrentmark溢出。有一次,有一个concurrentmark -reset-溢出,这个溢出将继续在下一个并发标记阶段。最终,它导致了完整的GC,因为并发标记似乎不再起作用了。
We have four machines running the same Apache Storm based application with the same data traffic. Only one of the machines has this experience once in a week.
我们有4台机器运行相同的基于Apache Storm的应用程序,数据流量相同。只有一台机器在一周内会有这种体验。
Is this related to the bug: ‘G1 does not expand marking stack when mark stack overflow happens during concurrent marking’ https://bugs.openjdk.java.net/browse/JDK-8065402
这与bug有关:当在并发标记“https://bugs.openjdk.java.net/browse/JDK-8065402”时,当标记堆栈溢出时,G1不会展开标记堆栈。
According to the suggestion from the above page, we doubled the concurrent mark threads from 4 to 8 and our heap size from 8GB to 16GB. However, the full GC still happens and the only difference is that the occurrences are delayed.
根据上述页面的建议,我们将并发标记线程从4个增加到8个,而堆大小从8GB增加到16GB。但是,完整的GC仍然会发生,惟一的区别是事件会被延迟。
Any other suggestions?
还有其他的建议吗?
Here's the GC log:
GC日志:
Java HotSpot(TM) 64-Bit Server VM (25.65-b01) for linux-amd64 JRE(1.8.0_65b17),
built on Oct 6 2015 17:16:12 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 529167668k(69283408k free), swap 33554424k(33552380k free)
CommandLine flags: -XX:ConcGCThreads=8 -XX:G1ReservePercent=20 -XX:GCLogFileSize=104857600
-XX:InitialHeapSize=17179869184 -XX:InitiatingHeapOccupancyPercent=45 -XX:MaxGCPauseMillis=100
-XX:MaxHeapSize=17179869184 -XX:NumberOfGCLogFiles=10 -XX:ParallelGCThreads=30
-XX:+PrintAdaptiveSizePolicy -XX:PrintFLSStatistics=2 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
...
...
2016-04-13T22:06:37.254-0400: 19839.175: [GC concurrent-root-region-scan-start]
2016-04-13T22:06:37.313-0400: 19839.234: [GC concurrent-root-region-scan-end, 0.0592966 secs]
2016-04-13T22:06:37.313-0400: 19839.234: [GC concurrent-mark-start]
2016-04-13T22:06:38.569-0400: 19840.490: [GC concurrent-mark-reset-for-overflow]
...
2016-04-13T22:06:42.810-0400: 19844.731: [GC concurrent-mark-reset-for-overflow]
...
2016-04-13T22:11:19.253-0400: 20121.175: [GC concurrent-mark-reset-for-overflow]
...
...
...
2016-04-14T01:58:17.254-0400: 33739.176: [GC concurrent-mark-reset-for-overflow]
...
2016-04-14T01:58:36.957-0400: 33758.878: [Full GC (Allocation Failure)
1 个解决方案
#1
7
From oracle g1_gc blog:
从oracle g1_gc博客:
GC concurrent-mark-reset-for-overflow
: This indicates that the global marking stack had became full and there was an overflow of the stack. Concurrent marking detected this overflow and had to reset the data structures to start the marking again
GC concurrent-mark-reset-for-overflow:这表明全局标记堆栈已经满了,堆栈溢出。并发标记检测到此溢出,必须重新设置数据结构以重新开始标记。
So increasing -XX:MarkStackSize
is one quick win.
所以增加-XX:MarkStackSize是一个快速赢。
Few observation from your VM parameters:
您的VM参数很少观察到:
- The G1 GC is an adaptive garbage collector with defaults that enable it to work efficiently without modification. Have a quick look at oracle documentation page on G1GC
- G1 GC是一个具有默认值的自适应垃圾收集器,它可以在不修改的情况下有效地工作。快速浏览一下G1GC的oracle文档页面?
- Key parameters to set :
-XX:MaxGCPauseMillis, -XX:G1HeapRegionSize,-XX:ParallelGCThreads=n, -XX:ConcGCThreads=n
Leave everything else to default values. - 关键参数设置:-XX:MaxGCPauseMillis, -XX: g1heapsize,-XX:ParallelGCThreads=n, -XX:ConcGCThreads=n将其他所有内容保留为默认值。
- If your heap size is 16 GB, the ideal region size should be
8 MB
. Make sure that you maintain2048
regions. - 如果您的堆大小是16gb,理想的区域大小应该是8 MB,确保您保持2048个区域。
- Revisit your pause time goal.
-XX:MaxGCPauseMillis
. If200ms
is unrealistic for 16 GB heap, set this value as properly. - 重新审视你的暂停时间目标。- xx:MaxGCPauseMillis。如果200ms对于16gb的堆是不现实的,那么将这个值设置为正确。
-
Official documentation page recommends the way to set
XX:ParallelGCThreads=n, -XX:ConcGCThreads=n
depending on number of cores in your machine.官方文档页面建议设置XX:ParallelGCThreads=n, -XX:ConcGCThreads=n,这取决于您机器上的内核数。
-XX:ParallelGCThreads=n
: Sets the value of the STW worker threads. Sets the value of n to the number of logical processors. The value of n is the same as the number of logical processors up to a value of 8.并行线程=n:设置STW工作线程的值。将n的值设置为逻辑处理器的数目。n的值与逻辑处理器的值为8的值相同。
-XX:ConcGCThreads=n
:Sets the number of parallel marking threads. Sets n to approximately 1/4 of the number of parallel garbage collection threads (ParallelGCThreads).设置并行标记线程的数量。将n的数量设置为并行垃圾收集线程数的1/4 (ParallelGCThreads)。
-
Revisit
-XX:InitialHeapSize=17179869184 -XX:InitiatingHeapOccupancyPercent=45 -XX:G1ReservePercent=20
parameters. Leave them to default values unless you have pressing need to change them.重新访问-XX:InitialHeapSize=17179869184 -XX: initiatingheap% =45 -XX:G1ReservePercent=20参数。除非您迫切需要更改它们,否则将它们保留为默认值。
Visit this page for better understanding of G1GC logs.
访问此页面,以便更好地理解G1GC日志。
#1
7
From oracle g1_gc blog:
从oracle g1_gc博客:
GC concurrent-mark-reset-for-overflow
: This indicates that the global marking stack had became full and there was an overflow of the stack. Concurrent marking detected this overflow and had to reset the data structures to start the marking again
GC concurrent-mark-reset-for-overflow:这表明全局标记堆栈已经满了,堆栈溢出。并发标记检测到此溢出,必须重新设置数据结构以重新开始标记。
So increasing -XX:MarkStackSize
is one quick win.
所以增加-XX:MarkStackSize是一个快速赢。
Few observation from your VM parameters:
您的VM参数很少观察到:
- The G1 GC is an adaptive garbage collector with defaults that enable it to work efficiently without modification. Have a quick look at oracle documentation page on G1GC
- G1 GC是一个具有默认值的自适应垃圾收集器,它可以在不修改的情况下有效地工作。快速浏览一下G1GC的oracle文档页面?
- Key parameters to set :
-XX:MaxGCPauseMillis, -XX:G1HeapRegionSize,-XX:ParallelGCThreads=n, -XX:ConcGCThreads=n
Leave everything else to default values. - 关键参数设置:-XX:MaxGCPauseMillis, -XX: g1heapsize,-XX:ParallelGCThreads=n, -XX:ConcGCThreads=n将其他所有内容保留为默认值。
- If your heap size is 16 GB, the ideal region size should be
8 MB
. Make sure that you maintain2048
regions. - 如果您的堆大小是16gb,理想的区域大小应该是8 MB,确保您保持2048个区域。
- Revisit your pause time goal.
-XX:MaxGCPauseMillis
. If200ms
is unrealistic for 16 GB heap, set this value as properly. - 重新审视你的暂停时间目标。- xx:MaxGCPauseMillis。如果200ms对于16gb的堆是不现实的,那么将这个值设置为正确。
-
Official documentation page recommends the way to set
XX:ParallelGCThreads=n, -XX:ConcGCThreads=n
depending on number of cores in your machine.官方文档页面建议设置XX:ParallelGCThreads=n, -XX:ConcGCThreads=n,这取决于您机器上的内核数。
-XX:ParallelGCThreads=n
: Sets the value of the STW worker threads. Sets the value of n to the number of logical processors. The value of n is the same as the number of logical processors up to a value of 8.并行线程=n:设置STW工作线程的值。将n的值设置为逻辑处理器的数目。n的值与逻辑处理器的值为8的值相同。
-XX:ConcGCThreads=n
:Sets the number of parallel marking threads. Sets n to approximately 1/4 of the number of parallel garbage collection threads (ParallelGCThreads).设置并行标记线程的数量。将n的数量设置为并行垃圾收集线程数的1/4 (ParallelGCThreads)。
-
Revisit
-XX:InitialHeapSize=17179869184 -XX:InitiatingHeapOccupancyPercent=45 -XX:G1ReservePercent=20
parameters. Leave them to default values unless you have pressing need to change them.重新访问-XX:InitialHeapSize=17179869184 -XX: initiatingheap% =45 -XX:G1ReservePercent=20参数。除非您迫切需要更改它们,否则将它们保留为默认值。
Visit this page for better understanding of G1GC logs.
访问此页面,以便更好地理解G1GC日志。