RegionServers又崩溃了,真是让人头疼。
1.日志:
2019-11-20 03:47:34,174 INFO [sync.3] : Slow sync cost: 464 ms, current pipeline: [DatanodeInfoWithStorage[125.94.213.41:50010,DS-cfd2851f-a298-4976-b0e9-f0546a472cb0,DISK], DatanodeInfoWithStorage[125.94.213.5:50010,DS-795d6f28-78f1-4e11-b0d6-7e87654a3306,DISK], DatanodeInfoWithStorage[125.94.213.48:50010,DS-5796c1b4-95a9-4fac-b588-d3166c44fe0d,DISK]]
2019-11-20 03:47:35,428 INFO [sync.0] : Slow sync cost: 210 ms, current pipeline: [DatanodeInfoWithStorage[125.94.213.41:50010,DS-cfd2851f-a298-4976-b0e9-f0546a472cb0,DISK], DatanodeInfoWithStorage[125.94.213.5:50010,DS-795d6f28-78f1-4e11-b0d6-7e87654a3306,DISK], DatanodeInfoWithStorage[125.94.213.48:50010,DS-5796c1b4-95a9-4fac-b588-d3166c44fe0d,DISK]]
2019-11-20 03:49:39,055 INFO [hdpv-014,16020,1574074564013_ChoreService_4] $CompactionChecker: Chore: CompactionChecker missed its start time
2019-11-20 03:49:39,055 INFO [hdpv-014,16020,1574074564013_ChoreService_4] $PeriodicMemstoreFlusher: Chore: hdpv-014,16020,1574074564013-MemstoreFlusherChore missed its start time
2019-11-20 03:49:39,055 INFO [hdpv-014,16020,1574074564013_ChoreService_4] $HeapMemoryTunerChore: Chore: hdpv-014,16020,1574074564013-HeapMemoryTunerChore missed its start time
2019-11-20 03:49:39,268 WARN [regionserver/hdpv-014/125.94.213.41:16020] : We slept 124370ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see /#
2019-11-20 03:49:39,269 WARN [JvmPauseMonitor] : Detected pause in JVM or host machine (eg GC): pause of approximately 121995ms
GC pool 'ParNew' had collection(s): count=1 time=120630ms
2019-11-20 03:49:39,423 INFO [RS_OPEN_REGION-hdpv-014:16020-0-SendThread(hdpv-007:2181)] : Client session timed out, have not heard from server in 157651ms for sessionid 0x16e7ce845dc02b8, closing socket connection and attempting reconnect
2019-11-20 03:49:39,423 INFO [hdpv-014,16020,1574074564013_ChoreService_4] $MovedRegionsCleaner: Chore: MovedRegionsCleaner for region hdpv-014,16020,1574074564013 missed its start time
2019-11-20 03:49:39,423 INFO [regionserver/hdpv-014/125.94.213.41:16020-SendThread(hdpv-001:2181)] : Client session timed out, have not heard from server in 157651ms for sessionid 0x26e7ce845ed0284, closing socket connection and attempting reconnect
2019-11-20 03:49:39,990 WARN [DataStreamer for file /apps/hbase/data/data/default/MIRROR_YY_ACCOUNT_GAME/c9bf3530466b67c29162e1484a18ba7d/.tmp/49c75f29a55740769ede127a4f3c986f block BP-1202337336-125.94.213.13-1419656350533:blk_1526126839_452537137] : DataStreamer Exception
: Broken pipe
at .write0(Native Method)
at (:47)
at (:93)
at (:65)
at (:471)
at $Writer.performIO(:63)
at (:142)
at (:159)
at (:117)
at (:122)
at (:107)
at (:176)
at $DataStreamer.run(:611)
2019-11-20 03:49:39,990 WARN [DataStreamer for file /apps/hbase/data/WALs/hdpv-014,16020,1574074564013/hdpv-014%2C16020%.1574192168896 block BP-1202337336-125.94.213.13-1419656350533:blk_1526124074_452534361] : DataStreamer Exception
…………
2019-11-20 03:49:49,167 ERROR [sync.2] : Error syncing, request close of WAL
(): No lease on /apps/hbase/data/oldWALs/hdpv-014%2C16020%.1574192168896 (inode 595750975): File is not open for writing. Holder DFSClient_NONMAPREDUCE_-2054740082_1 does not have any open files.
at (:3674)
at (:3574)
at (:883)
at (:526)
at $ClientNamenodeProtocol$2.callBlockingMethod()
at $Server$ProtoBufRpcInvoker.call(:640)
at $Server.call(:982)
at $Handler$1.run(:2351)
at $Handler$1.run(:2347)
at (Native Method)
at (:422)
at (:1866)
at $Handler.run(:2345)
at (:1554)
at (:1498)
at (:1398)
at $Invoker.invoke(:233)
at .$Proxy16.getAdditionalDatanode(Unknown Source)
at (:484)
at .invoke0(Native Method)
at (:62)
at (:43)
at (:498)
at (:291)
at (:203)
at (:185)
at .$Proxy17.getAdditionalDatanode(Unknown Source)
at .invoke0(Native Method)
at (:62)
at (:43)
at (:498)
at $1.invoke(:283)
at .$Proxy18.getAdditionalDatanode(Unknown Source)
at $DataStreamer.addDatanode2ExistingPipeline(:1102)
at $DataStreamer.setupPipelineForAppendOrRecovery(:1268)
at $DataStreamer.processDatanodeError(:993)
at $DataStreamer.run(:500)
…………
2019-11-20 03:50:21,676 ERROR [RS_CLOSE_REGION-hdpv-014:16020-1] : Memstore size is 76160064
2019-11-20 03:50:21,745 INFO [StoreCloserThread-YD_ONLINE_GUID,\x19,1574079589337.1645967ddd012b9a875e863266751f58.-1] : Closed USER
2019-11-20 03:50:21,785 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,785 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] : Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] : Shutting down ParallelWriterIndexCommitter because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] : Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Shutting down ParallelWriterIndexCommitter because Indexer is being stopped
2019-11-20 03:50:21,847 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,866 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed BELONG
2019-11-20 03:50:21,905 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed BIND_PHONE
2019-11-20 03:50:21,906 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed GAME_LABEL
2019-11-20 03:50:21,907 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed LOGIN
2019-11-20 03:50:21,907 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed LOST
2019-11-20 03:50:21,908 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed PAYMENT
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] : Shutting down TrackingParallelWriterIndexCommitter
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Shutting down TrackingParallelWriterIndexCommitter
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] : Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,973 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:21,976 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed PERIOD_GAME_PAYMENT
2019-11-20 03:50:21,977 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed PERIOD_LOGIN
2019-11-20 03:50:21,978 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed PERIOD_PAYMENT_AVERAGE
2019-11-20 03:50:21,979 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed PERIOD_PAYMENT_TOTAL
2019-11-20 03:50:21,980 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed PLATFORM
2019-11-20 03:50:21,980 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed REFERER
2019-11-20 03:50:21,981 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed REGISTER
2019-11-20 03:50:21,981 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed ROLE_LABEL
2019-11-20 03:50:21,982 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed ROW_UPDATE_TIME
2019-11-20 03:50:21,982 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed SOUND_LABEL
2019-11-20 03:50:21,989 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed STYLE_LABEL
2019-11-20 03:50:21,991 INFO [StoreCloserThread-MIRROR_YY_ACCOUNT,\x0F,1571210055101.cb261e03532961c77ae8233ecf2edd96.-1] : Closed SUBJECT_LABEL
2019-11-20 03:50:22,118 INFO [RS_CLOSE_REGION-hdpv-014:16020-2] : Closed YD_ONLINE_GUID,\x19,1574079589337.1645967ddd012b9a875e863266751f58.
2019-11-20 03:50:22,128 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Closed MIRROR_SQW_ACCOUNT,\x09,1573025684248.4707732dccb430927c79c82eae116dd1.
2019-11-20 03:50:22,128 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed BELONG
2019-11-20 03:50:22,148 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] : Closed DEVICE
2019-11-20 03:50:22,165 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] : Closed GAME
2019-11-20 03:50:22,210 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] : Closed TIME
2019-11-20 03:50:22,374 INFO [StoreCloserThread-YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.-1] : Closed USER
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Shutting down ParallelWriterIndexCommitter because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Shutting down TrackingParallelWriterIndexCommitter
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Shutting down task runner because Indexer is being stopped
2019-11-20 03:50:22,375 INFO [RS_CLOSE_REGION-hdpv-014:16020-1] : Closed YD_ONLINE_GUID,\x14,1574079589337.c6b2933c34bdbcfd106f72aadf62a3d6.
2019-11-20 03:50:22,376 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] : Closed BELONG
2019-11-20 03:50:22,421 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed BIND_PHONE
2019-11-20 03:50:22,421 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] : Closed BIND_PHONE
2019-11-20 03:50:22,422 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed GAME_LABEL
2019-11-20 03:50:22,422 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] : Closed GAME_LABEL
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed LOGIN
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] : Closed LOGIN
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed LOST
2019-11-20 03:50:22,423 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] : Closed LOST
2019-11-20 03:50:22,425 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed PAYMENT
2019-11-20 03:50:22,425 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x04,1571122122524.0a3100486babb7562610ae1d9990a94a.-1] : Closed PAYMENT
2019-11-20 03:50:22,585 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed PERIOD_GAME_PAYMENT
2019-11-20 03:50:22,586 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed PERIOD_LOGIN
2019-11-20 03:50:22,587 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed PERIOD_PAYMENT_AVERAGE
2019-11-20 03:50:22,588 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed PERIOD_PAYMENT_TOTAL
2019-11-20 03:50:22,588 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed PLATFORM
2019-11-20 03:50:22,588 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed REFERER
2019-11-20 03:50:22,589 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed REGISTER
2019-11-20 03:50:22,589 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed ROLE_LABEL
2019-11-20 03:50:22,590 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed ROW_UPDATE_TIME
2019-11-20 03:50:22,590 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed SOUND_LABEL
2019-11-20 03:50:22,591 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed STYLE_LABEL
2019-11-20 03:50:22,591 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed SUBJECT_LABEL
2019-11-20 03:50:22,592 INFO [StoreCloserThread-MIRROR_XZL_ACCOUNT,\x16,1571122122524.357a6c75728c4c204253bf7235ee34a5.-1] : Closed VIP
…………
2019-11-20 03:50:37,468 WARN [regionserver/hdpv-014/125.94.213.41:16020] : regionserver:16020-0x26e7ce845ed0283, quorum=hdpv-001:2181,hdpv-003:2181,hdpv-005:2181,hdpv-007:2181,hdpv-009:2181, baseZNode=/hbase-unsecure Unable to list children of znode /hbase-unsecure/replication/rs/hdpv-014,16020,1574074564013
$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/replication/rs/hdpv-014,16020,1574074564013
at (:127)
at (:51)
at (:1472)
at (:292)
at (:455)
at (:483)
at (:1462)
at (:1384)
at (:1266)
at (:196)
at (:302)
at (:202)
at (:194)
at (:2269)
at (:1118)
at (:745)
2019-11-20 03:50:37,739 ERROR [regionserver/hdpv-014/125.94.213.41:16020] : regionserver:16020-0x26e7ce845ed0283, quorum=hdpv-001:2181,hdpv-003:2181,hdpv-005:2181,hdpv-007:2181,hdpv-009:2181, baseZNode=/hbase-unsecure Received unexpected KeeperException, re-throwing exception
$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/replication/rs/hdpv-014,16020,1574074564013
at (:127)
at (:51)
at (:1472)
at (:292)
at (:455)
at (:483)
at (:1462)
at (:1384)
at (:1266)
at (:196)
at (:302)
at (:202)
at (:194)
at (:2269)
at (:1118)
at (:745)
2019-11-20 03:50:37,840 INFO [regionserver/hdpv-014/125.94.213.41:16020] : Stopping server on 16020
2019-11-20 03:50:37,840 INFO [,port=16020] : ,port=16020: stopping
2019-11-20 03:50:37,984 INFO [] : : stopped
2019-11-20 03:50:37,984 INFO [] : : stopping
2019-11-20 03:50:38,901 WARN [regionserver/hdpv-014/125.94.213.41:16020] : Failed deleting my ephemeral node
$SessionExpiredException: KeeperErrorCode = Session expired for /hbase-unsecure/rs/hdpv-014,16020,1574074564013
at (:127)
at (:51)
at (:873)
at (:178)
at (:1222)
at (:1211)
at (:1528)
at (:1126)
at (:745)
2019-11-20 03:50:39,198 INFO [regionserver/hdpv-014/125.94.213.41:16020] : stopping server hdpv-014,16020,1574074564013; zookeeper connection closed.
2019-11-20 03:50:39,198 INFO [regionserver/hdpv-014/125.94.213.41:16020] : regionserver/hdpv-014/125.94.213.41:16020 exiting
2019-11-20 03:50:40,717 ERROR [main] : Region server exiting
: HRegionServer Aborted
at (:68)
at (:87)
at (:76)
at (:126)
at (:2801)
2019-11-20 03:50:42,062 INFO [pool-4-thread-1] : Shutdown hook starting; =true; fsShutdownHook=$Cache$ClientFinalizer@344344fa
2019-11-20 03:50:42,062 INFO [pool-4-thread-1] : Starting fs shutdown hook thread.
2019-11-20 03:50:42,063 ERROR [Thread-9022] : Failed to close inode 595756106
(): BP-1202337336-125.94.213.13-1419656350533:blk_1526126839_452537137 does not exist or is not under Constructionnull
at (:6683)
at (:6751)
at (:930)
at (:966)
at $ClientNamenodeProtocol$2.callBlockingMethod()
at $Server$ProtoBufRpcInvoker.call(:640)
at $Server.call(:982)
at $Handler$1.run(:2351)
at $Handler$1.run(:2347)
at (Native Method)
at (:422)
at (:1866)
at $Handler.run(:2345)
at (:1554)
at (:1498)
at (:1398)
at $Invoker.invoke(:233)
at .$Proxy16.updateBlockForPipeline(Unknown Source)
at (:948)
at .invoke0(Native Method)
at (:62)
at (:43)
at (:498)
at (:291)
at (:203)
at (:185)
at .$Proxy17.updateBlockForPipeline(Unknown Source)
at .invoke0(Native Method)
at (:62)
at (:43)
at (:498)
at $1.invoke(:283)
at .$Proxy18.updateBlockForPipeline(Unknown Source)
at $DataStreamer.setupPipelineForAppendOrRecovery(:1281)
at $DataStreamer.processDatanodeError(:993)
at $DataStreamer.run(:500)
2019-11-20 03:50:42,534 INFO [pool-4-thread-1] : Shutdown hook finished.
日志:
2019-11-20T03:32:24.081+0800: 117381.659: [GC (Allocation Failure) 2019-11-20T03:32:24.082+0800: 117381.659: [ParNew: 1445776K->120613K(1504064K), 21.4794441 secs] 3237704K->1942467K(8221504K), 21.4797147 secs] [Times: user=59.65 sys=0.34, real=21.48 secs]
2019-11-20T03:33:13.014+0800: 117430.592: [GC (Allocation Failure) 2019-11-20T03:33:13.037+0800: 117430.614: [ParNew: 1457573K->117558K(1504064K), 3.8057408 secs] 3279427K->1954686K(8221504K), 3.8282921 secs] [Times: user=13.91 sys=0.05, real=3.83 secs]
2019-11-20T03:33:21.314+0800: 117438.892: [GC (Allocation Failure) 2019-11-20T03:33:21.314+0800: 117438.892: [ParNew: 1454518K->94887K(1504064K), 7.5804580 secs] 3291646K->1948109K(8221504K), 7.5807123 secs] [Times: user=7.80 sys=0.08, real=7.58 secs]
2019-11-20T03:42:44.684+0800: 118002.262: [GC (Allocation Failure) 2019-11-20T03:42:44.790+0800: 118002.368: [ParNew: 1431847K->167104K(1504064K), 16.3961827 secs] 3285069K->2024750K(8221504K), 16.5019800 secs] [Times: user=16.41 sys=0.18, real=16.50 secs]
2019-11-20T03:47:36.965+0800: 118294.543: [GC (Allocation Failure) 2019-11-20T03:47:38.419+0800: 118295.997: [ParNew: 1504064K->106763K(1504064K), 120.6296245 secs] 3361710K->2097638K(8221504K), 122.0840424 secs] [Times: user=171.54 sys=2.40, real=122.06 secs]
2019-11-20T03:50:38.743+0800: 118476.321: [GC (Allocation Failure) 2019-11-20T03:50:38.743+0800: 118476.321: [ParNew: 1443723K->33648K(1504064K), 0.1142207 secs] 3434598K->2024523K(8221504K), 0.1143976 secs] [Times: user=0.39 sys=0.00, real=0.11 secs]
Heap
par new generation total 1504064K, used 1150286K [0x00000005c0000000, 0x0000000626000000, 0x0000000626000000)
eden space 1336960K, 83% used [0x00000005c0000000, 0x0000000604277878, 0x00000006119a0000)
from space 167104K, 20% used [0x00000006119a0000, 0x0000000613a7c1b8, 0x000000061bcd0000)
to space 167104K, 0% used [0x000000061bcd0000, 0x000000061bcd0000, 0x0000000626000000)
concurrent mark-sweep generation total 6717440K, used 1990874K [0x0000000626000000, 0x00000007c0000000, 0x00000007c0000000)
Metaspace used 166258K, capacity 181568K, committed 181636K, reserved 1206272K
class space used 20691K, capacity 24674K, committed 24740K, reserved 1048576K
3.原因分析:
在03:47分时,程序进行了一次GC,并且耗时比较长,达到了122秒。GC过程中程序是停止的,称之为“stop the world”。而zk的超时时间是120秒,GC结束后,发现zk连接超时了,region master已经认为它挂掉,把它从集群服务里剔除了,让其它regionserver负它的工作。接替的regionserver会读取wal进行恢复工作,并继续处理,完成后删除wal文件。从GC恢复过来的regionserver,发现找不到wal了,所以报“: Error syncing, request close of WAL”,并且得知自己被集群剔除了,就主动关闭自已。
4.解决方案:
regionserver的垃圾回收改成G1,zk的超时120秒已经够长了,就不调整。