在水平扩展的WebSocket服务器上加载平衡套接字?

时间:2021-02-14 18:04:04

Every few months when thinking through a personal project that involves sockets I find myself having the question of "How would you properly load balance sockets on a dynamic horizontally scaling WebSocket server?"

每隔几个月,在考虑涉及套接字的个人项目时,我发现自己有一个问题:“如何在动态水平扩展的WebSocket服务器上正确加载平衡套接字?”

I understand the theory behind horizontally scaling the WebSockets and using pub/sub models to get data to the right server that holds the socket connection for a specific user. I think I understand ways to effectively identify the server with the fewest current socket connections that I would want to route a new socket connection too. What I don't understand is how to effectively route new socket connections to the server you've picked with low socket count.

我理解水平扩展WebSockets并使用pub / sub模型将数据传递到保存特定用户的套接字连接的正确服务器背后的理论。我想我理解用最少的当前套接字连接来有效识别服务器的方法,我也想要路由新的套接字连接。我不明白的是如何有效地将新套接字连接路由到您使用低套接字计数选择的服务器。

I don't imagine this answer would be tied to a specific server implementation, but rather could be applied to most servers. I could easily see myself implementing this with vert.x, node.js, or even perfect.

我不认为这个答案会与特定的服务器实现相关联,而是可以应用于大多数服务器。我可以很容易地看到自己用vert.x,node.js甚至完美来实现它。

2 个解决方案

#1


5  

First off, you need to define the bounds of the problem you're asking about. If you're truly talking about dynamic horizontal scaling where you spin up and down servers based on total load, then that's an even more involved problem than just figuring out where to route the latest incoming new socket connection.

首先,您需要定义您询问的问题的界限。如果你真的在谈论动态水平扩展,你可以根据总负载上下调整服务器,那么这就是一个更加复杂的问题,而不仅仅是找出路由最新传入的新套接字连接的位置。

To solve that problem, you have to have a way of "moving" a socket from one host to another so you can clear connections from a host that you want to spin down (I'm assuming here that true dynamic scaling goes both up and down). The usual way I've seen that done is by engaging a cooperating client where you tell the client to reconnect and when it reconnects it is load balanced onto a different server so you can clear off the one you wanted to spin down. If your client has auto-reconnect logic already (like socket.io does), you can just have the server close the connection and the client will automatically re-connect.

要解决这个问题,你必须有一种方法可以将套接字从一个主机“移动”到另一个主机,这样你就可以清除你想要关闭的主机的连接(我假设真正的动态扩展同时发生了下)。我见过这种方式的常用方法是通过聘请合作客户端告诉客户重新连接,重新连接时将其负载平衡到不同的服务器上,这样就可以清除想要关闭的服务器。如果您的客户端已经具有自动重新连接逻辑(如socket.io那样),您可以让服务器关闭连接,客户端将自动重新连接。

As for load balancing the incoming client connections, you have to decide what load metric you want to use. Ultimately, you need a score for each server process that tells you how "busy" you think it is so you can put new connections on the least busy server. A rudimentary score would just be number of current connections. If you have large numbers of connections per server process (tens of thousands) and there's no particular reason in your app that some might be lots more busy than others, then the law of large numbers probably averages out the load so you could get away with just how many connections each server has. If the use of connections is not that fair or even, then you may have to also factor in some sort of time moving average of the CPU load along with the total number of connections.

对于传入客户端连接的负载平衡,您必须决定要使用的负载指标。最终,您需要为每个服务器进程分数,告诉您您认为它是多么“繁忙”,这样您就可以在最不繁忙的服务器上建立新连接。最初的分数只是当前连接的数量。如果你的每个服务器进程有大量的连接(成千上万)并且你的应用中没有特别的原因,有些可能比其他人更繁忙,那么大数定律可能会平均负载,所以你可以逃脱每个服务器有多少个连接。如果连接的使用不公平或不均匀,那么您可能还必须考虑CPU负载的某种时间移动平均值以及连接总数。

If you're going to load balance across multiple physical servers, then you will need a load balancer or proxy service that everyone connects to initially and that proxy can look at the metrics for all currently running servers in the pool and assign the connection to the one with the most lowest current score. That can either be done with a proxy scheme or (more scalable) via a redirect so the proxy gets out of the way after the initial assignment.

如果您要在多个物理服务器之间进行负载均衡,那么您将需要一个每个人最初连接的负载均衡器或代理服务,该代理可以查看池中所有当前运行的服务器的度量标准,并将连接分配给目前得分最低的一个。这可以通过代理方案完成,也可以通过重定向实现(更具可伸缩性),这样代理就可以在初始分配后完成。

You could then also have a process that regularly examines your load score (however you decided to calculate it) on all the servers in the cluster and decides when to spin a new server up or when to spin one down or when things are too far out of balance on a given server and that server needs to be told to kick several connections off, forcing them to rebalance.

然后,您还可以在群集中的所有服务器上定期检查您的负载分数(但是您决定计算它),并决定何时旋转新服务器或何时旋转一个服务器或什么时候太远在给定服务器上的平衡,并且需要告知该服务器关闭多个连接,迫使它们重新平衡。

What I don't understand is how to effectively route new socket connections to the server you've picked with low socket count.

我不明白的是如何有效地将新套接字连接路由到您使用低套接字计数选择的服务器。

As described above, you either use a proxy scheme or a redirect scheme. At a slightly higher cost at connection time, I favor the redirect scheme because it's more scalable when running and creates fewer points of failure for an existing connection. All clients connect to your incoming connection gateway server which is responsible for knowing the current load score for each of the servers in the farm and based on that, it assigns an incoming connection to the host with the lowest score and this new connection is then redirected to reconnect to one of the specific servers in your farm.

如上所述,您要么使用代理方案,要么使用重定向方案。在连接时成本略高,我赞成重定向方案,因为它在运行时更具可扩展性,并为现有连接创建更少的故障点。所有客户端都连接到您的传入连接网关服务器,该服务器负责了解服务器场中每个服务器的当前负载分数,并根据该服务器,它为具有最低分数的主机分配传入连接,然后重定向此新连接重新连接到服务器场中的某个特定服务器。


I have also seen load balancing done purely by a custom DNS implementation. Client requests IP address for farm.somedomain.com and that custom DNS server gives them the IP address of the host it wants them assigned to. Each client that looks up the IP address for farm.somedomain.com may get a different IP address. You spin hosts up or down by adding or removing them from the custom DNS server and it is that custom DNS server that has to contain the logic for knowing the load balancing logic and the current load scores of all the running hosts.

我还看到了纯粹由自定义DNS实现完成的负载平衡。客户端请求farm.somedomain.com的IP地址,并且该自定义DNS服务器为它们提供它希望分配给它的主机的IP地址。查找farm.somedomain.com的IP地址的每个客户端可能会获得不同的IP地址。您可以通过在自定义DNS服务器中添加或删除主机来向上或向下旋转主机,并且该自定义DNS服务器必须包含用于了解负载平衡逻辑和所有正在运行的主机的当前负载分数的逻辑。

#2


0  

Route the websocket requests to a load balancer that makes the decision about where to send the connections.

将websocket请求路由到负载均衡器,该负载均衡器决定将连接发送到何处。

As an example, HAProxy has a leastconn method for long connections that picks the least recently used server with the lowest connection count.

例如,HAProxy具有用于长连接的最小化方法,该方法选择具有最低连接数的最近最少使用的服务器。

The HAProxy backend server weightings can also be modified by external inputs, @jfriend00 detailed the technicalities of weighting in their answer.

HAProxy后端服务器权重也可以通过外部输入进行修改,@ jfriend00在其答案中详细说明了权重的技术性。

#1


5  

First off, you need to define the bounds of the problem you're asking about. If you're truly talking about dynamic horizontal scaling where you spin up and down servers based on total load, then that's an even more involved problem than just figuring out where to route the latest incoming new socket connection.

首先,您需要定义您询问的问题的界限。如果你真的在谈论动态水平扩展,你可以根据总负载上下调整服务器,那么这就是一个更加复杂的问题,而不仅仅是找出路由最新传入的新套接字连接的位置。

To solve that problem, you have to have a way of "moving" a socket from one host to another so you can clear connections from a host that you want to spin down (I'm assuming here that true dynamic scaling goes both up and down). The usual way I've seen that done is by engaging a cooperating client where you tell the client to reconnect and when it reconnects it is load balanced onto a different server so you can clear off the one you wanted to spin down. If your client has auto-reconnect logic already (like socket.io does), you can just have the server close the connection and the client will automatically re-connect.

要解决这个问题,你必须有一种方法可以将套接字从一个主机“移动”到另一个主机,这样你就可以清除你想要关闭的主机的连接(我假设真正的动态扩展同时发生了下)。我见过这种方式的常用方法是通过聘请合作客户端告诉客户重新连接,重新连接时将其负载平衡到不同的服务器上,这样就可以清除想要关闭的服务器。如果您的客户端已经具有自动重新连接逻辑(如socket.io那样),您可以让服务器关闭连接,客户端将自动重新连接。

As for load balancing the incoming client connections, you have to decide what load metric you want to use. Ultimately, you need a score for each server process that tells you how "busy" you think it is so you can put new connections on the least busy server. A rudimentary score would just be number of current connections. If you have large numbers of connections per server process (tens of thousands) and there's no particular reason in your app that some might be lots more busy than others, then the law of large numbers probably averages out the load so you could get away with just how many connections each server has. If the use of connections is not that fair or even, then you may have to also factor in some sort of time moving average of the CPU load along with the total number of connections.

对于传入客户端连接的负载平衡,您必须决定要使用的负载指标。最终,您需要为每个服务器进程分数,告诉您您认为它是多么“繁忙”,这样您就可以在最不繁忙的服务器上建立新连接。最初的分数只是当前连接的数量。如果你的每个服务器进程有大量的连接(成千上万)并且你的应用中没有特别的原因,有些可能比其他人更繁忙,那么大数定律可能会平均负载,所以你可以逃脱每个服务器有多少个连接。如果连接的使用不公平或不均匀,那么您可能还必须考虑CPU负载的某种时间移动平均值以及连接总数。

If you're going to load balance across multiple physical servers, then you will need a load balancer or proxy service that everyone connects to initially and that proxy can look at the metrics for all currently running servers in the pool and assign the connection to the one with the most lowest current score. That can either be done with a proxy scheme or (more scalable) via a redirect so the proxy gets out of the way after the initial assignment.

如果您要在多个物理服务器之间进行负载均衡,那么您将需要一个每个人最初连接的负载均衡器或代理服务,该代理可以查看池中所有当前运行的服务器的度量标准,并将连接分配给目前得分最低的一个。这可以通过代理方案完成,也可以通过重定向实现(更具可伸缩性),这样代理就可以在初始分配后完成。

You could then also have a process that regularly examines your load score (however you decided to calculate it) on all the servers in the cluster and decides when to spin a new server up or when to spin one down or when things are too far out of balance on a given server and that server needs to be told to kick several connections off, forcing them to rebalance.

然后,您还可以在群集中的所有服务器上定期检查您的负载分数(但是您决定计算它),并决定何时旋转新服务器或何时旋转一个服务器或什么时候太远在给定服务器上的平衡,并且需要告知该服务器关闭多个连接,迫使它们重新平衡。

What I don't understand is how to effectively route new socket connections to the server you've picked with low socket count.

我不明白的是如何有效地将新套接字连接路由到您使用低套接字计数选择的服务器。

As described above, you either use a proxy scheme or a redirect scheme. At a slightly higher cost at connection time, I favor the redirect scheme because it's more scalable when running and creates fewer points of failure for an existing connection. All clients connect to your incoming connection gateway server which is responsible for knowing the current load score for each of the servers in the farm and based on that, it assigns an incoming connection to the host with the lowest score and this new connection is then redirected to reconnect to one of the specific servers in your farm.

如上所述,您要么使用代理方案,要么使用重定向方案。在连接时成本略高,我赞成重定向方案,因为它在运行时更具可扩展性,并为现有连接创建更少的故障点。所有客户端都连接到您的传入连接网关服务器,该服务器负责了解服务器场中每个服务器的当前负载分数,并根据该服务器,它为具有最低分数的主机分配传入连接,然后重定向此新连接重新连接到服务器场中的某个特定服务器。


I have also seen load balancing done purely by a custom DNS implementation. Client requests IP address for farm.somedomain.com and that custom DNS server gives them the IP address of the host it wants them assigned to. Each client that looks up the IP address for farm.somedomain.com may get a different IP address. You spin hosts up or down by adding or removing them from the custom DNS server and it is that custom DNS server that has to contain the logic for knowing the load balancing logic and the current load scores of all the running hosts.

我还看到了纯粹由自定义DNS实现完成的负载平衡。客户端请求farm.somedomain.com的IP地址,并且该自定义DNS服务器为它们提供它希望分配给它的主机的IP地址。查找farm.somedomain.com的IP地址的每个客户端可能会获得不同的IP地址。您可以通过在自定义DNS服务器中添加或删除主机来向上或向下旋转主机,并且该自定义DNS服务器必须包含用于了解负载平衡逻辑和所有正在运行的主机的当前负载分数的逻辑。

#2


0  

Route the websocket requests to a load balancer that makes the decision about where to send the connections.

将websocket请求路由到负载均衡器,该负载均衡器决定将连接发送到何处。

As an example, HAProxy has a leastconn method for long connections that picks the least recently used server with the lowest connection count.

例如,HAProxy具有用于长连接的最小化方法,该方法选择具有最低连接数的最近最少使用的服务器。

The HAProxy backend server weightings can also be modified by external inputs, @jfriend00 detailed the technicalities of weighting in their answer.

HAProxy后端服务器权重也可以通过外部输入进行修改,@ jfriend00在其答案中详细说明了权重的技术性。