关于WCF服务在高并发情况下报目标积极拒绝的异常处理

时间:2022-07-01 16:06:31

最近弄了个wcf的监控服务,偶尔监控到目标服务会报一个目标积极拒绝的错误。一开始以为服务停止了,上服务器检查目标服务好好的活着。于是开始查原因。

一般来说目标积极拒绝(TCP 10061)的异常主要是2种可能:

1:服务器关机或者服务关闭

2:Client调用的端口错误或者服务器防火墙没开相应的端口

但是我们的服务本身是可以调用的,只是偶尔报这个错误,说明并不是这2个问题造成的。继续google,在*上看到这样一篇:传送门

If this happens always, it literally means that the machine exists but that it has no services listening on the specified port, or there is a firewall stopping you.

If it happens occasionally - you used the word "sometimes" - and retrying succeeds, it is likely because the server has a full 'backlog'.

When you are waiting to be accepted on a listening socket, you are placed in a backlog. This backlog is finite and quite short - values of 1, 2 or 3 are not unusual - and so the OS might be unable to queue your request for the 'accept' to consume.

The backlog is a parameter on the listen function - all languages and platforms have basically the same API in this regard, even the C# one. This parameter is often configurable if you control the server, and is likely read from some settings file or the registry. Investigate how to configure your server.

If you wrote the server, you might have heavy processing in the accept of your socket, and this can be better moved to a separate worker-thread so your accept is always ready to receive connections. There are various architecture choices you can explore that mitigate queuing up clients and processing them sequentially.

Regardless of whether you can increase the server backlog, you do need retry logic in your client code to cope with this issue - as even with a long backlog the server might be receiving lots of other requests on that port at that time.

There is a rare possibility where a NAT router would give this error should it's ports for mappings be exhausted. I think we can discard this possibility as too much of a long shot though, since the router has 64K simultaneous connections to the same destination address/port before exhaustion.

大概意思就是如果这个错误是一直发生的那么可能是服务器或者防火墙的问题,如果这个问题是“Sometime”发生的,那么可能是backlog的问题。backlog是tcp层面的请求队列,当你调用socket发起请求的时候服务端会排成一个队列,在高并发情况下服务端来不及处理请求,那么有些请求就被直接被丢弃,于是就报了目标积极拒绝TCP10061的异常。

有了backlog于是继续google关键字“WCF backlog”发现wcf binding配置确实有一个listenBacklog的项目,默认值是10,于是把服务的listenBacklog改成100,问题搞定。

对了添加listenBacklog属性的时候有个注意的是一定要移除一个默认的endpoint      <endpoint address="mex" binding="mexTcpBinding" bindingConfiguration="" contract="IMetadataExchange" />这个endpoint是用来给vs等发现元数据用的,如果这个不移走启动服务的时候会报端口已经被监听的错误。

参考:

http://*.com/questions/2972600/no-connection-could-be-made-because-the-target-machine-actively-refused-it

https://msdn.microsoft.com/en-us/library/ee377061(v=bts.10).aspx