oracle 监听异常崩溃,报错Linux Error: 32: Broken pipe

时间:2021-10-02 15:57:41

oracle 10.2.0.4

一生产系统监听异常停止了,listener.log中报出如下错误:

TNS-12518: TNS:listener could not hand off client connection
 TNS-12547: TNS:lost contact
  TNS-12560: TNS:protocol adapter error
   TNS-00517: Lost contact
    Linux Error: 32: Broken pipe

并且操作系统日志/var/log/messages中抛出类似如下错误:

tnslsnr[5841]: segfault at 0000000000000018 rip 0000003eab66854d rsp 0000007fbfff9230 error 4 

且有一些cpu和内存负载过高,自己kill进程的信息

在metalink上有这篇文档:549932.1

版本:

Oracle Net Services - Version 10.2.0.1 to 11.1.0.7 [Release 10.2 to 11.1]
Generic UNIX
***Checked for relevance on 22-MAR-2013***

问题现象:

  • There may be heavy load on the CPU shooting up to 100%.
  • The number of sessions in the database is well below the upper or maximum limit defined in the parameter  file.
  • The listener crashes suddenly during  this heavy CPU load generating the core.
  • (Optional) Listener.Ora has SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF.
导致原因:

Extensive paging/swapping activity is a clear  indication that the system is running out of the physical memory.

解决方法:

1. Increase the physical memory of the system. 
                             OR 
2. Apply the Patch 6139856 for unpublished Bug 6139856 if available for your platform.

                             OR

3. Configure Hugepages on the OS. Ref : Note 361323.1

--------------------------------------------------------------------------------------------------------------

算是oracle bug问题了,当操作系统物理内存不足,swap/page 耗尽,将会导致listener异常崩溃。

而且从操作系统日志中,可以看到linux自己kill 进程的信息(由于事后总结,且信息在内网内,权限有限,贴不出日志内容)。

所以我的理解就是,当操作系统物理内存居高不下,操作系统会自己杀掉一些他认为的空闲进程之类,而不巧,杀掉的恰好是oracle的监听进程,

从而导致监听异常崩溃。

之所以说恰好是监听进程,是因为在/var/log/messages中,看到之前也有杀掉oracle进程的信息,但当时监听并未停掉,所以怀疑当时杀掉的并不是oracle监听进程,可能是其他非本地进程。