Sockets aren't closed immediately. A TCP socket, when closed, will enter a TIME_WAIT state which allows time for any data that has not been transmitted to be sent before closing off the connection (remember, TCP tries to ensure reliable data transfer, so TIME_WAIT gives the protocol an opportunity to make sure the last bit of data is reliably transfered.)
Bind: Address Already in Use
Or How to Avoid this Error when Closing TCP Connections
Normal Closure
In order for a network connection to close, both ends have to send FIN (final) packets, which indicate they will not send any additional data, and both ends must ACK (acknowledge) each other's FIN packets. The FIN packets are initiated by the application performing a close(), a shutdown(), or an exit(). The ACKs are handled by the kernel after the close()has completed. Because of this, it is possible for the process to complete before the kernel has released the associated network resource, and this port cannot be bound to another process until the kernel has decided that it is done.
Figure 1
Figure 1 shows all of the possible states that can occur during anormal closure, depending on the order in which things happen.Note that if you initiate closure, there is a TIME_WAITstate that is absent from the other side. This TIME_WAIT is necessaryin case the ACK you sent wasn't received, or in case spuriouspackets show up for other reasons. I'm really not sure why thisstate isn't necessary on the other side, when the remote end initiatesclosure, but this is definitely the case. TIME_WAITis the state that typicallyties up the port for several minutes after the process has completed.The length of the associated timeout varies on different operating systems,and may be dynamic on some operating systems, however typical values arein the range of one to four minutes.
If both ends send a FIN before either end receives it,both ends will have to go through TIME_WAIT.
Normal Closure of Listen Sockets
A socket which is listening for connections can be closed immediatelyif there are no connections pending, and the state proceeds directlyto CLOSED. If connections are pending however, FIN_WAIT_1is entered, and a TIME_WAITis inevitable.
Note that it is impossible to completely guarantee a clean closure here.While you can check the connections using a select() callbefore closure, a tiny but real possibility exists that a connectioncould arrive after the select() but before the close().
Abnormal Closure
If the remote application dies unexpectedly while the connection isestablished, the local end will have to initiate closure. In this caseTIME_WAIT is unavoidable. If the remote end disappears dueto a network failure, or the remote machine reboots (both are rare),the local portwill be tied up until each state times out. Worse, some older operatingsystemsdo not implement a timeout for FIN_WAIT_2, and it is possible toget stuck there forever, in which case restarting your server could requirea reboot.
If the local application dies while a connection is active, the port willbe tied up in TIME_WAIT. This is also true if the application dieswhile a connection is pending.
Strategies for Avoidance
SO_REUSEADDR
You can use setsockopt() to set the SO_REUSEADDRsocket option, which explicitly allows a process to bind to a port whichremains in TIME_WAIT(it still only allows a single process tobe bound to that port). This is the both the simplest and the most effectiveoption for reducing the "address already in use" error.
Oddly, using SO_REUSEADDR can actually lead to more difficult "addressalready in use" errors. SO_REUSADDR permits you to use a port thatis stuck in TIME_WAIT, but you still can not use that port toestablish a connection to the last place it connected to. What?Suppose I pick local port 1010, and connect to foobar.com port 300, andthen close locally, leaving that port in TIME_WAIT. I can reuse localport 1010 right away to connect to anywhere except for foobar.comport 300.
A situation where this might be a problem is if my program istrying to find a reserved localport (< 1024) to connect to some service which likes reserved ports. IfI used SO_REUSADDR, then each time I run the program on mymachine, I'llkeep getting the same local reserved port, even if it is stuck inTIME_WAIT, and I risk getting a "connect: Address already in use" errorif I go back to any place I've been to in the last few minutes. Thesolution here is to avoid SO_REUSEADDR.
Some folks don't like SO_REUSEADDR because it has a securitystigma attached to it. On some operating systemsit allows the same port to be used with a differentaddress on the same machine by different processes at the same time.This is a problem because most servers bind to the port, but they don'tbind to a specific address, instead they use INADDR_ANY (this iswhy things show up in netstat output as*.8080). So if the server is bound to *.8080, another malicioususer on the local machine can bind to local-machine.8080, which will interceptall of your connections since it is more specific. This is only a problem onmulti-user machines that don't have restricted logins, it is NOT avulnerability from outside the machine. And it is easily avoided by bindingyour server to the machine's address.
Additionally, others don't like that a busy server may have hundreds orthousands of these TIME_WAIT sockets stacking up and using kernelresources.For these reasons, there's another option for avoiding this problem.
Client Closes First
Looking at the diagram above, it is clear that TIME_WAIT can beavoided if the remote end initiates the closure. So the server canavoid problems by letting the client close first. The application protocolmust be designed so that the client knows when to close. The server cansafely close in response to an EOFfrom the client, however it willalsoneed to set a timeout when it is expecting an EOF in case the client has leftthe network ungracefully.In many cases simply waiting a few seconds before the server closes willbe adequate.
It probably makes more sense to call this method "Remote Closes First",because otherwise it depends on what you are calling the client and theserver. If you are developing some system where a cluster of client programs siton one machine and contact a variety of different servers, then you wouldwant to foist the responsibility for closure onto the servers, to protectthe resources on the client.
For example, I wrote a script that uses rsh to contact all of themachines on our network, and it does it in parallel, keeping some number ofconnections open at all times. rsh source ports are arbitraryavailable ports less than 1024. I initially used "rsh -n", which itturns out causes the local end to close first. After a few tests, everysingle free port less than 1024 was stuck in TIME_WAIT and I couldn'tproceed. Removing the "-n" option causes the remote (server) end toclose first (understanding why is left as an exercise for the reader),and should've eliminated the TIME_WAIT problem. However, withoutthe -n, rsh can hang waiting for input. And, if you close input at thelocal end, this can again result in the port going into TIME_WAIT.I ended up avoiding the system-installed rsh program, and developing myown implementation in perl. My current implementation,multi-rsh, is available for download
Reduce Timeout
If (for whatever reason) neither of these options works for you, it mayalso be possible to shorten the timeout associated with TIME_WAIT.Whether this is possible and how it should be accomplished depends on theoperating system you are using. Also, making this timeout too short couldhave negative side-effects, particularly in lossy or congested networks.
1 问题
问题起源:很多时候,server端如果重启或者崩溃,会遇到“ Address already in use”。过几分钟,就可以重新启动了。
下面是问题:
A)为什么会出现这种情况?
B) 如何解决,使得服务器能够马上启动?
2 分析
原来,Server端如果重启或者遇到崩溃,会进入TIME_WAIT状态,并且会等待2MSL的时间,在这个时间内,是不允许服务器重启的。
那为什么Server端会是TIME_WAIT状态,而不是Close状态。这就涉及到TCP连接关闭的问题。
2.1 TCP连接关闭流程
TCP中,执行主动关闭的一方会进入TIME_WAIT的状态,图中的例子是Client进入TIME_WAIT状态。
进入 TIME_WAIT状态之后,会等待2MSL(Max Segment Lifetime,最大段生存时间,MSL为2min,1min,30s,根据不同的实现决定,RFC 793 建议为2min)。
作为参考,下面是TCP连接状态转换图。
2.2 TIME_WAIT的作用
TIME_WAIT有2个作用:
1)当主动关闭方发送最后的ACK消息丢失时,会导致另一方重新发送FIN消息。 TIME-WAIT 状态用于维护连接状态。
–如果主动关闭方直接关闭连接,当重传的FIN消息到达时,因为TCP已经不再有连接的信息了,所以它就用RST(重新启动)消息应答,这样会导致对等方进入错误状态而不是有序的终止状态。
2.3 如何结束TIME_WAIT状态呢
有种说法,叫做TIME_WAIT Assassination,就是TIME_WAIT暗杀。有2种情况会导致TIME_WAIT Assassination.
A) 意外终止。
如下图所示,当有个延时的MSG发送过来的时候,执行主动关闭的HOST1处于TIME_WAIT,因为这个延时的MSG的序列号不在当前能处理的窗口范 围之内,HOST1会发送一个ACK包,告诉对方说,我HOST1能收的序列号是多少。而对方已经关闭,处于Close状态,收到一个ACK包,就会回复 一个RST包给HOST1。导致HOST1立即结束。
TIME_WAIT给Assassinate掉了。这种情况有没有办法避免呢?
有的,有的实现这么处理:当处于TIME_WAIT状态时不处理RST包即可。
B) 人为造成。
可以调用setsockopt,设置SO_LINGER,就可以不进行结束连接的4次握手,不进入TIME_WAIT,而直接关闭连接。
关于SO_LINGER
–应用程序关闭连接时,close或者closesocket调用会操立即返回,如果有数据残留在套接口缓冲区中则系统将试着将这些数据发送给对方,但是应用程序并不知道递交是否成功。
2.4 关于TIME_WAIT状态的结论
健壮的应用程序永远不应该干涉TIME-WAIT状态----它是TCP可靠性机制的一个重要部分。
3 Server问题分析
4 如何马上重启Server
在调用bind函数之前,设置SO_REUSEADDR就可以了。
说到这里,好像应该结束了,但是,我们刚刚介绍过,TIME_WAIT的作用有2个,那这里Server重用这个地址,有没有可能导致问题呢。
答案是肯定的,有这个可能。只要满足4元组相同,并且delay的数据包的序列号在新的连接可接受的窗口之内,就可能导致问题。
在*上,有人问过这个问题:
Using SO_REUSEADDR - What happens to previously open socket?
答案就是:The SO_REUSEADDR
option overrides that behavior, allowing you to reuse the port immediately.
Effectively, you're saying: "I understand the risks and would like to use the port anyway."
今天遇到一个端口问题。socket编程中,值得注意的是,调用close(sock_id)函数sock_id套接口不会立即释放。这是TCP协议的特性,主要是为了让双方有足够的时候进行“四次信号”关闭。
因此,当调用close()函数之后,套接口状态由原来的ESTABLISHED状态变成TIME_WAIT状态,这段时间端口未被释放,这段时间 内调用bind()函数,绑定这个端口,将会出错“can’t bind server socket :address already in use”。可以修改协议保持TIME_WAIT状态的时间,具体修改办法,可以参考
下面一段代码应用自:linux下解决大量的TIME_WAIT
[root@web02 ~]# vi /etc/sysctl.conf
新增如下内容:
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies=1
使 内核参数生效:
[root@web02 ~]# sysctl -p
readme:
net.ipv4.tcp_syncookies=1 打开TIME-WAIT套接字重用功能,对于存在大量连接的Web服务器非常有效。
net.ipv4.tcp_tw_recyle=1
net.ipv4.tcp_tw_reuse=1 减少处于FIN-WAIT-2连接状态的时间,使系统可以处理更多的连接。
net.ipv4.tcp_fin_timeout=30 减少TCP KeepAlive连接侦测的时间,使系统可以处理更多的连接。
net.ipv4.tcp_keepalive_time=1800 增加TCP SYN队列长度,使系统可以处理更多的并发连接。
net.ipv4.tcp_max_syn_backlog=8192
1. 如果还是想执行bind()函数,可以绕过TIME_WAIT状态,使用setsockopt函数,重用端口,这样bind()的时候就不会出错。例如:
int sock,opt=1;//opt=0则为禁止重用
sock=sock(....);
setsockopt(sock,SOL_SOCKET,SO_REUSEADDR,&opt,sizeof(opt));
bind(...);
具体setsockopt函数的操作可以参考int setsockopt(int socket, int level, int option_name, const void *option_value, socklen_t option_len);
2. 禁止TIME_WAIT状态。
setsockopt函数的SO_LINGER参数可以设置是否延迟关闭套接口。
struct linger {
int l_onoff; /* 0 = off, nozero = on */
int l_linger; /* linger time */
};
int server_fd;
server_fd=socket(AF_INET,SOCK_STREAM,0);
int opt=1;
setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct linger li;
li.l_onoff = 1;
li.l_linger = 0;
setsockopt (server_fd,SOL_SOCKET, SO_LINGER,(const char *)&li,sizeof (li));
经过我的测试以上代码可以实现立即关闭暴力套接口,在终端执行 sudo netstat -anp | grep 8080 之后8080端口不会是TIME_WAIT状态。
如果你的程序是C/S模式的。此时,通过以上代码,当服务器关闭套接口后,对方不会出现 peer reset而自动退出。我判断,当禁止掉延迟关闭套接口之后,并没有执行“四次信号”结束,服务器自己断开了,没有通知客户端。
Fine's Home | Send Me Email |