GCE网络负载均衡器在一段时间不活动后挂起

I am setting up a new service on Google Cloud with a "Network Load Balancer". I am only using one backend server during setup. After a period of inactivity the first request to the load balancer address doesn't appear to be delivered to the back end server.

我正在使用“网络负载均衡器”在Google Cloud上设置新服务。我在安装过程中只使用一个后端服务器。在一段时间不活动之后,对负载均衡器地址的第一个请求似乎没有传递到后端服务器。

The load balancer reports the backend server healthy.

负载均衡器报告后端服务器运行状况良好。

I see health check requests coming into the server ever 5sec on the backend server access log.

我在后端服务器访问日志中看到健康检查请求进入服务器的时间为5秒。

If I make a request directly to the backend server's health check URL I get a successful response.

如果我直接向后端服务器的运行状况检查URL发出请求,我会得到一个成功的响应。

If I make a request do the health check URL (or any other URL) through the load balancer the browser hangs indefinately and it does not appear that any request is ever made to the backend server.

如果我通过负载均衡器发出请求,请执行运行状况检查URL(或任何其他URL),浏览器将无限期挂起,并且看起来不会向后端服务器发出任何请求。

If I then refresh the browser the page loads fine and further requests continue to load fine until some period of inactivity (hours?) has passed.

如果我然后刷新浏览器页面加载正常,进一步的请求继续加载正常,直到一段时间不活动(小时?)已经过去。

This doesn't seem like it would be a problem in production, but any unexplained issue like this makes me uncomfortable using the GCE load balancer for a live site.

这看起来似乎不是生产中的问题,但是这样的任何无法解释的问题都让我对现场使用GCE负载均衡器感到不舒服。

This question is half help request and half bug report. Is this a know issue or am I doing something wrong?

这个问题是半帮助请求和半错误报告。这是一个知道问题还是我做错了什么?

Thanks for any help.

谢谢你的帮助。

UPDATE

Looking back at the server logs I see that the hung request does show up in the access log, but not until I refresh the browser and it is showing a 0 sec response time.

回顾服务器日志,我看到挂起的请求确实出现在访问日志中,但直到我刷新浏览器并显示0秒的响应时间。

Here are the two requests. The first is the hung request. The second is the refresh.

这是两个请求。第一个是挂起的请求。第二是刷新。

173.51.253.242 - devuser [09/May/2015:19:15:27 +0000] "GET /api/ HTTP/1.1" 499 0 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36" "-"
173.51.253.242 - devuser [09/May/2015:19:15:27 +0000] "GET /api/ HTTP/1.1" 200 5 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36" "-"

Notice both requests come in in the same second (when I refresh), the failed request has response code 499 and response time 0. 499 means the connection was closed by the client so I take these to mean that the request was not initiated by the loadbalancer until I hit refresh at which time it made both connections to the backend server.

请注意,两个请求都在同一秒内(当我刷新时),失败的请求有响应代码499和响应时间0. 499表示连接已被客户端关闭所以我认为这意味着请求不是由loadbalancer直到我点击刷新,此时它连接到后端服务器。

1 个解决方案

#1

Updating this with an actual answer as mensi's comment above seems to have gotten to the root of the problem.

如上面的meni评论那样用实际答案更新这个似乎已经成为问题的根源。

Adding the TCP keepalive to the balanced hosts solved the problem.

将TCP keepalive添加到平衡主机解决了这个问题。

#1