记一次CountDownLatch引发的问题

时间:2022-12-25 19:15:59

背景:使用countDownLatch,10个线程计算结果,最后通过主线程来汇总结果;

有个spring scheduler的程序,运行一段时间后,不运行了,查看日志发现CountDownLatch无法归零,导致整个主线程Hang在那里,此时如果调用其外部接口还能响应:

怀疑是不是死锁了? 查一下jstack发现没有出现死锁:

记一次CountDownLatch引发的问题记一次CountDownLatch引发的问题
"Attach Listener" daemon prio=10 tid=0x00007fc1b4001000 nid=0x162e4 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Abandoned connection cleanup thread" daemon prio=10 tid=0x00007fc1801e5800 nid=0x1e632 in Object.wait() [0x00007fc1ec79a000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
    - locked <0x0000000703f97058> (a java.lang.ref.ReferenceQueue$Lock)
    at com.mysql.jdbc.AbandonedConnectionCleanupThread.run(AbandonedConnectionCleanupThread.java:43)

"Tomcat JDBC Pool Cleaner[1485406391:1509516002697]" daemon prio=10 tid=0x00007fc1801b8000 nid=0x1e631 in Object.wait() [0x00007fc1ec89b000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.util.TimerThread.mainLoop(Timer.java:552)
    - locked <0x000000070406b8d8> (a java.util.TaskQueue)
    at java.util.TimerThread.run(Timer.java:505)

"http-nio-8119-Acceptor-0" daemon prio=10 tid=0x00007fc19ec27800 nid=0x2d11 runnable [0x00007fc1ba16b000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
    - locked <0x000000070408f7d8> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:686)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8119-ClientPoller-1" daemon prio=10 tid=0x00007fc19f9a2800 nid=0x2d10 runnable [0x00007fc1cc1af000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
    - locked <0x000000070408fbe0> (a sun.nio.ch.Util$2)
    - locked <0x000000070408fbf0> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000070408fb98> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1049)
    at java.lang.Thread.run(Thread.java:745)

"http-nio-8119-ClientPoller-0" daemon prio=10 tid=0x00007fc19d143000 nid=0x2d0f runnable [0x00007fc1cc2b0000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
    - locked <0x000000070416a268> (a sun.nio.ch.Util$2)
    - locked <0x000000070416a278> (a java.util.Collections$UnmodifiableSet)
    - locked <0x000000070416a220> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1049)
    at java.lang.Thread.run(Thread.java:745)

"NioBlockingSelector.BlockPoller-1" daemon prio=10 tid=0x00007fc19e89b000 nid=0x2d0e runnable [0x00007fc1cc3b1000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
    - locked <0x0000000703d3c3a0> (a sun.nio.ch.Util$2)
    - locked <0x0000000703d3c390> (a java.util.Collections$UnmodifiableSet)
    - locked <0x0000000703d3c258> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
    at org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:342)

"pool-1-thread-1" prio=10 tid=0x00007fc1a0749800 nid=0x2d0b waiting on condition [0x00007fc1cc4b1000]
   java.lang.Thread.State: WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007047385d8> (a java.util.concurrent.CountDownLatch$Sync)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
    at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
    at com.didi.bigdataqa.tools.HiveBaseTool.doHealthyScore(HiveBaseTool.java:139)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
    at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:81)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

"pool-2-thread-1" daemon prio=10 tid=0x00007fc1a0381000 nid=0x2ceb waiting on condition [0x00007fc1cc5b3000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007037d9338> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
    at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

"cluster-ClusterId{value='59f823ec4d62e12bc9d4a556', description='null'}-localhost:27017" daemon prio=10 tid=0x00007fc1a1734800 nid=0x2cea waiting on condition [0x00007fc1ec497000]
   java.lang.Thread.State: TIMED_WAITING (parking)
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for  <0x00000007037c1d30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
    at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForSignalOrTimeout(DefaultServerMonitor.java:224)
    at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.waitForNext(DefaultServerMonitor.java:205)
    at com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:153)
    - locked <0x00000007037c19e8> (a com.mongodb.connection.DefaultServerMonitor$ServerMonitorRunnable)
    at java.lang.Thread.run(Thread.java:745)

"commons-pool-EvictionTimer" daemon prio=10 tid=0x00007fc19ecc5800 nid=0x2cbb in Object.wait() [0x00007fc1ec396000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.util.TimerThread.mainLoop(Timer.java:552)
    - locked <0x0000000703b48458> (a java.util.TaskQueue)
    at java.util.TimerThread.run(Timer.java:505)

"container-0" prio=10 tid=0x00007fc19dfd7800 nid=0x2c95 waiting on condition [0x00007fc1ec598000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.catalina.core.StandardServer.await(StandardServer.java:407)
    at org.springframework.boot.context.embedded.tomcat.TomcatEmbeddedServletContainer$1.run(TomcatEmbeddedServletContainer.java:139)

"ContainerBackgroundProcessor[StandardEngine[Tomcat].StandardHost[localhost].StandardContext[/healthy_score]]" daemon prio=10 tid=0x00007fc199247800 nid=0x2c94 waiting on condition [0x00007fc1ec699000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
    at java.lang.Thread.sleep(Native Method)
    at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1345)
    at java.lang.Thread.run(Thread.java:745)

"DestroyJavaVM" prio=10 tid=0x00007fc208009000 nid=0x2bca waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Service Thread" daemon prio=10 tid=0x00007fc208244800 nid=0x2bdc runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007fc208242000 nid=0x2bdb waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007fc20823f000 nid=0x2bda waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007fc20823d000 nid=0x2bd9 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007fc208210000 nid=0x2bd7 in Object.wait() [0x00007fc1edb29000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
    - locked <0x000000070001e8f0> (a java.lang.ref.ReferenceQueue$Lock)
    at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
    at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

"Reference Handler" daemon prio=10 tid=0x00007fc20820e000 nid=0x2bd6 in Object.wait() [0x00007fc1edc2a000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:503)
    at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
    - locked <0x00000006fff96908> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x00007fc208209800 nid=0x2bd5 runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x00007fc20801f000 nid=0x2bcb runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x00007fc208021000 nid=0x2bcc runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x00007fc208023000 nid=0x2bcd runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x00007fc208025000 nid=0x2bce runnable 

"GC task thread#4 (ParallelGC)" prio=10 tid=0x00007fc208026800 nid=0x2bcf runnable 

"GC task thread#5 (ParallelGC)" prio=10 tid=0x00007fc208028800 nid=0x2bd0 runnable 

"GC task thread#6 (ParallelGC)" prio=10 tid=0x00007fc20802a800 nid=0x2bd1 runnable 

"GC task thread#7 (ParallelGC)" prio=10 tid=0x00007fc20802c800 nid=0x2bd2 runnable 

"GC task thread#8 (ParallelGC)" prio=10 tid=0x00007fc20802e800 nid=0x2bd3 runnable 

"GC task thread#9 (ParallelGC)" prio=10 tid=0x00007fc208030800 nid=0x2bd4 runnable 

"VM Periodic Task Thread" prio=10 tid=0x00007fc20824f800 nid=0x2bdd waiting on condition 

JNI global references: 151
View Code

多次输入jstack观察:

 

jstack 输出 nid=0x2d0f  查看这个线程ID,转换为十进制为11535

查看进程下的线程,下面这几个都是没有释放的线程

 ps -T -p 11209

 

 11209  11534 ?        00:00:15 java

 11209  11535 ?        00:00:17 java

 11209  11536 ?        00:00:17 java

 11209  11537 ?        00:00:00 java

 

在使用countDownLatch的await时,加入一个超时等待时间,这样就可以避免这个问题,但是问题的根源是什么呢?

cnt.await(1, TimeUnit.HOURS);

继续定位,怀疑是哪个地方没有设置超时机制,导致一直等待,或许哪里出现了死循环? jstack打印任务的线程都已经看不到了,只能看到进程
代码里面没看到死循环的地方,怀疑是GC的问题,导致停机了
最后再定位GC,发现有个List的超级大,最大的时候有30多w,堆设置的有点小。。。
果断加大堆内存空间得到解决