一次OOM分析的过程

时间:2021-05-31 20:55:07

1. 现象 Java heap space

 通过系统日志查询如下:

 

2013-04-11 16:53:54,569 [ActiveMQ Transport: tcp:///202.111.10.183:8006] WARN  org.apache.activemq.transport.failover.FailoverTransport - Transport (/202.111.10.183:8006) failed to tcp://202.111.10.183:8006 , attempting to automatically reconnect due to: java.io.EOFException
2013-04-11 16:54:39,565 [ActiveMQ Task-39] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully reconnected to tcp://202.111.10.183:8006
Exception in thread "taskExecutor-400" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-401" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-402" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-403" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-404" java.lang.OutOfMemoryError: Java heap space
2013-04-11 17:07:35,627 [ActiveMQ Session Task-9506] WARN org.springframework.jms.listener.SimpleMessageListenerContainer - Execution of JMS message listener failed, and no ErrorHandler has been set.
java.lang.OutOfMemoryError: Java heap space
2013-04-11 17:07:45,947 [ActiveMQ Transport: tcp:///202.111.10.183:8006] WARN org.apache.activemq.transport.failover.FailoverTransport - Transport (/202.111.10.183:8006) failed to tcp://202.111.10.183:8006 , attempting to automatically reconnect due to: java.io.EOFException
2013-04-11 17:08:53,055 [ActiveMQ Task-42] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully reconnected to tcp://202.111.10.183:8006
2013-04-11 17:14:30,307 [ActiveMQ Session Task-9509] WARN org.springframework.jms.listener.SimpleMessageListenerContainer - Execution of JMS message listener failed, and no ErrorHandler has been set.
java.lang.OutOfMemoryError: Java heap space
2013-04-11 17:14:40,623 [ActiveMQ Transport: tcp:///202.111.10.183:8006] WARN org.apache.activemq.transport.failover.FailoverTransport - Transport (/202.111.10.183:8006) failed to tcp://202.111.10.183:8006 , attempting to automatically reconnect due to: java.io.EOFException
2013-04-11 17:14:50,943 [ActiveMQ Task-44] INFO org.apache.activemq.transport.failover.FailoverTransport - Successfully reconnected to tcp://202.111.10.183:8006
Exception in thread "taskExecutor-407" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-408" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-409" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-411" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-344" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-412" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-413" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-415" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-345" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-414" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-418" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-417" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-416" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-420" java.lang.OutOfMemoryError: Java heap space
Exception in thread "taskExecutor-419" java.lang.OutOfMemoryError: Java heap space

2. 查询出pid 为25435(ps -ef | grep XXXX)


3. 使用jinfo 查看当前JVM参数信息

Attaching to process ID 25435, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.1-b02
Java System Properties:

java.runtime.name = Java(TM) SE Runtime Environment
sun.boot.library.path = /home/kx_monitor/java/jdk1.6.0_26/jre/lib/amd64
java.vm.version = 20.1-b02
java.vm.vendor = Sun Microsystems Inc.
java.vendor.url = http://java.sun.com/
path.separator = :
java.vm.name = Java HotSpot(TM) 64-Bit Server VM
file.encoding.pkg = sun.io
sun.java.launcher = SUN_STANDARD
user.country = CN
sun.os.patch.level = unknown
java.vm.specification.name = Java Virtual Machine Specification
user.dir = /home/kx_monitor/monitor2/collector
java.runtime.version = 1.6.0_26-b03
java.awt.graphicsenv = sun.awt.X11GraphicsEnvironment
java.endorsed.dirs = /home/kx_monitor/java/jdk1.6.0_26/jre/lib/endorsed
os.arch = amd64
java.io.tmpdir = /tmp
line.separator =

java.vm.specification.vendor = Sun Microsystems Inc.
os.name = Linux
sun.jnu.encoding = GBK
java.library.path = /home/kx_monitor/java/jdk1.6.0_26/jre/lib/amd64/server:/home/kx_monitor/java/jdk1.6.0_26/jre/lib/amd64:/home/kx_monitor/java/jdk1.6.0_26/jre/../lib/amd64:/home/kx_monitor/monitor2/collector::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
java.specification.name = Java Platform API Specification
java.class.version = 50.0
sun.management.compiler = HotSpot 64-Bit Tiered Compilers
os.version = 2.6.32-220.23.1.el6.x86_64
user.home = /home/kx_monitor
user.timezone = Asia/Shanghai
java.awt.printerjob = sun.print.PSPrinterJob
file.encoding = GBK
java.specification.version = 1.6
java.class.path = .:./dependency/collector.jar
user.name = kx_monitor
java.vm.specification.version = 1.0
sun.java.command = cn.east.products.monitor.collector.Collector
java.home = /home/kx_monitor/java/jdk1.6.0_26/jre
sun.arch.data.model = 64
user.language = zh
java.specification.vendor = Sun Microsystems Inc.
java.vm.info = mixed mode
java.version = 1.6.0_26
java.ext.dirs = ./dependency
sun.boot.class.path = /home/kx_monitor/java/jdk1.6.0_26/jre/lib/resources.jar:/home/kx_monitor/java/jdk1.6.0_26/jre/lib/rt.jar:/home/kx_monitor/java/jdk1.6.0_26/jre/lib/sunrsasign.jar:/home/kx_monitor/java/jdk1.6.0_26/jre/lib/jsse.jar:/home/kx_monitor/java/jdk1.6.0_26/jre/lib/jce.jar:/home/kx_monitor/java/jdk1.6.0_26/jre/lib/charsets.jar:/home/kx_monitor/java/jdk1.6.0_26/jre/lib/modules/jdk.boot.jar:/home/kx_monitor/java/jdk1.6.0_26/jre/classes
java.vendor = Sun Microsystems Inc.
file.separator = /
java.vendor.url.bug = http://java.sun.com/cgi-bin/bugreport.cgi
sun.io.unicode.encoding = UnicodeLittle
sun.cpu.endian = little
sun.cpu.isalist =

VM Flags:

-Xms512m -Xmx2048m -XX:PermSize=128M -XX:MaxPermSize=512m -Djava.ext.dirs=./dependency


4. 通过jstat 工具检查GC统计参数


[kx_monitor@python bin]$ jstat -gcutil 25435 
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 0.00 99.45 100.00 20.86 3472 421.481 136192 375746.901 376168.382

[kx_monitor@python bin]$ jstat -gccapacity 25435
NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX OGC OC PGCMN PGCMX PGC PC YGC FGC
174720.0 699008.0 425920.0 131072.0 134976.0 152512.0 349568.0 1398144.0 1398144.0 1398144.0 131072.0 524288.0 131072.0 131072.0 3472 136608
[kx_monitor@python bin]$ jstat -gcnewcapacity 25435
NGCMN NGCMX NGC S0CMX S0C S1CMX S1C ECMX EC YGC FGC
174720.0 699008.0 425920.0 131072.0 232960.0 232960.0 134976.0 698880.0 152512.0 3472 136778
[kx_monitor@python bin]$ jstat -gcoldcapacity 25435
OGCMN OGCMX OGC OC YGC FGC FGCT GCT
349568.0 1398144.0 1398144.0 1398144.0 3472 136778 377303.741 377725.222
[kx_monitor@python bin]$ jstat -gcpermcapacity 25435
PGCMN PGCMX PGC PC YGC FGC FGCT GCT
131072.0 524288.0 131072.0 131072.0 3472 136780 377309.000 377730.480

[kx_monitor@python bin]$ jstat -gcnew 25435
S0C S1C S0U S1U TT MTT DSS EC EU YGC YGCT
131072.0 134976.0 0.0 0.0 15 15 134976.0 152512.0 152147.3 3472 421.481
[kx_monitor@python bin]$ jstat -gcold 25435
PC PU OC OU YGC FGC FGCT GCT
131072.0 27345.2 1398144.0 1398144.0 3472 136786 377324.741 377746.221


5. 使用jmap 检查


[kx_monitor@python bin]$ jmap -heap 25435
Attaching to process ID 25435, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.1-b02

using thread-local object allocation.
Parallel GC with 18 thread(s)

Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 2147483648 (2048.0MB)
NewSize = 1310720 (1.25MB)
MaxNewSize = 17592186044415 MB
OldSize = 5439488 (5.1875MB)
NewRatio = 2
SurvivorRatio = 8
PermSize = 134217728 (128.0MB)
MaxPermSize = 536870912 (512.0MB)

Heap Usage:
PS Young Generation
Eden Space:
capacity = 156172288 (148.9375MB)
used = 155907264 (148.68475341796875MB)
free = 265024 (0.25274658203125MB)
99.83030023866975% used
From Space:
capacity = 134217728 (128.0MB)
used = 0 (0.0MB)
free = 134217728 (128.0MB)
0.0% used
To Space:
capacity = 138215424 (131.8125MB)
used = 0 (0.0MB)
free = 138215424 (131.8125MB)
0.0% used
PS Old Generation
capacity = 1431699456 (1365.375MB)
used = 1431699416 (1365.3749618530273MB)
free = 40 (3.814697265625E-5MB)
99.99999720611754% used
PS Perm Generation
capacity = 134217728 (128.0MB)
used = 28001536 (26.704345703125MB)
free = 106216192 (101.295654296875MB)
20.862770080566406% used
[kx_monitor@python bin]$ jmap -histo 25435 | grep cn.east
num #instances #bytes class name
----------------------------------------------
4: 527169 71694984 cn.east.products.monitor.domain.snapshot.HttpSnapshot
5: 459066 40397808 cn.east.products.monitor.domain.snapshot.DnsSnapshot
6: 123820 24764000 cn.east.products.monitor.domain.snapshot.SnapshotResult
8: 90122 10814640 cn.east.products.monitor.domain.snapshot.PingSnapshot
33: 631 55528 cn.east.products.monitor.domain.snapshot.TcpSnapshot
38: 512 45056 cn.east.products.monitor.domain.snapshot.FtpSnapshot
329: 4 352 cn.east.products.monitor.domain.snapshot.PageGuardSnapshot
330: 11 352 cn.east.products.monitor.domain.support.MonitorItemType
390: 3 192 [Lcn.knet.products.monitor.domain.support.MonitorItemType;
461: 1 120 cn.east.products.monitor.collector.service.SnapshotResultGenerator$Cleanner
507: 3 96 cn.east.products.monitor.domain.support.Status
723: 1 40 cn.east.products.monitor.collector.service.SnapshotResultGenerator
782: 1 32 cn.east.products.monitor.collector.mq.listener.SnapshotMessageListener
783: 1 32 cn.east.products.monitor.collector.quartz.PeriodScanner
825: 2 32 cn.east.dev.modules.utils.JsonMapper
826: 1 32 [Lcn.knet.products.monitor.domain.support.Status;
884: 1 24 cn.east.products.monitor.collector.repository.TemporarySnapshotResultRepository
983: 1 24 cn.east.products.monitor.collector.service.SnapshotResultPoster
1007: 1 24 cn.east.products.monitor.collector.repository.SnapshotRepository
1014: 1 24 cn.east.products.monitor.collector.repository.SnapshotResultRepository
1079: 1 16 cn.east.products.monitor.collector.mq.converter.SnapshotMessageConverter
1338: 1 16 cn.east.products.monitor.collector.Collector
1348: 1 16 cn.east.dev.modules.redis.JedisHelper
1403: 1 16 cn.east.products.monitor.collector.Collector$ShutdownJob



Eden Space:
   capacity = 156172288 (148.9375MB)
   used     = 155907264 (148.68475341796875MB)
   free     = 265024 (0.25274658203125MB)
   99.83030023866975% used
PS Old Generation
   capacity = 1431699456 (1365.375MB)
   used     = 1431699416 (1365.3749618530273MB)
   free     = 40 (3.814697265625E-5MB)
   99.99999720611754% used


   4:        527169       71694984  cn.east.products.monitor.domain.snapshot.HttpSnapshot
   5:        459066       40397808  cn.east.products.monitor.domain.snapshot.DnsSnapshot
   6:        123820       24764000  cn.east.products.monitor.domain.snapshot.SnapshotResult
   8:         90122       10814640  cn.east.products.monitor.domain.snapshot.PingSnapshot

 
通过红色的标记看出结果:


  HttpSnapshot
  DnsSnapshot
  SnapshotResult 
  实例的数量为527169,459066,123820;
  所占用的空间为:71694984,40397808,24764000;为什么会创建这么多对象呢,真是业务量大吗?


解决办法:
1. 如果真是业务量大,通过-xms -xmx 来指定VM的大小
2. 分析检查XXXXSnapshot 数据量大的原因。