将工作扩展到多台机器的最佳方法是什么?

时间:2022-10-24 20:16:44

We're developing a .NET app that must make up to tens of thousands of small webservice calls to a 3rd party webservice. We would prefer a more 'chunky' call, but the 3rd party does not support it. We've designed the client to use a configurable number of worker threads, and through testing have code that is fairly well optimized for one multicore machine. However, we still want to improve the speed, and are looking at spreading the work accross multiple machines. We're well versed in typical client/server/database apps, but new to designing for multiple machines. So, a few questions related to that:

我们正在开发一个.NET应用程序,它必须为第三方Web服务提供数万个小型Web服务调用。我们宁愿选择更“粗俗”的电话,但第三方不支持。我们已经设计了客户端以使用可配置数量的工作线程,并且通过测试,代码可以针对一台多核机器进行相当优化。但是,我们仍然希望提高速度,并且正在考虑在多台机器上分散工作。我们精通典型的客户端/服务器/数据库应用程序,但对多台机器的设计不熟悉。所以,有几个问题与此有关:

  • Is there any other client-side optimization, besides multithreading, that we should look at that could improve speed of a http request/response? (I should note this is a non-standard webservice, so is implemented using WebClient, not a WCF or SOAP client)
  • 除了多线程之外,是否还有其他客户端优化,我们应该考虑哪些可以提高http请求/响应的速度? (我应该注意这是一个非标准的Web服务,所以使用WebClient实现,而不是WCF或SOAP客户端)

  • Our current thinking is to use WCF to publish chunks of work to MSMQ, and run clients on one or more machines to pull work off of the queue. We have experience with WCF + MSMQ, but want to be sure we're not missing better options. Are there other, better ways to do this today?
  • 我们当前的想法是使用WCF将工作块发布到MSMQ,并在一台或多台计算机上运行客户端以从队列中取出工作。我们有WCF + MSMQ的经验,但我们确信我们不会错过更好的选择。今天还有其他更好的方法吗?

  • I've seen some 3rd party tools like DigiPede and Microsoft's HPC offerings, but these seem like overkill. Any experience with those products or reasons we should consider them over roll-our-own?
  • 我见过一些第三方工具,比如DigiPede和微软的HPC产品,但这些看起来有些过分。有这些产品或原因的任何经验,我们应该考虑他们自己?

4 个解决方案

#1


Sounds like your goal is to execute all these web service calls as quickly as you can, and get the results tabulated. Given that, your greatest efficiency control is going to be through scaling the number of concurrent requests you can make.

听起来您的目标是尽可能快地执行所有这些Web服务调用,并将结果列表。鉴于此,您最大的效率控制将通过扩展您可以进行的并发请求的数量来实现。

Be sure to look at your client-side connection limits. By default, I think the system default is 2 connections. I haven't tried this myself, but by upping the number of connections with this property, you should theoretically see a multiplier effect in terms of generating more requests by generating more connections from a single machine. There's more info on MS forums.

请务必查看客户端连接限制。默认情况下,我认为系统默认为2个连接。我自己没有尝试过,但是通过增加与此属性的连接数,理论上应该通过从单个机器生成更多连接来生成更多请求的乘数效应。有关MS论坛的更多信息。

The MSMQ option works well. I'm running that configuration myself. ActiveMQ is also a fine solution, but MSMQ is already on the server.

MSMQ选项运行良好。我自己正在运行该配置。 ActiveMQ也是一个很好的解决方案,但MSMQ已经在服务器上。

You have a good starting point. Get that in operation, then move on to performance and throughput.

你有一个很好的起点。将其投入运行,然后继续进行性能和吞吐量。

#2


At CodeMash this year, Wesley Faler did an interesting presentation on this sort of problem. His solution was to store "jobs" in a DB, then use clients to pull down work and mark status when complete.

在今年的CodeMash上,Wesley Faler就这类问题做了一个有趣的演讲。他的解决方案是将“作业”存储在数据库中,然后使用客户端完成工作并标记状态。

He then pushed the whole infrastructure up to Amazon's EC2.

然后,他将整个基础设施推到亚马逊的EC2。

Here's his slides from the presentation - they should give you the basic idea:

这是演示文稿中的幻灯片 - 他们应该给你基本的想法:

I've done something similar w/ multiple PC's locally - the basics of managing the workload were similar to Faler's approach.

我已经在本地完成了类似的多个PC - 管理工作负载的基础知识与Faler的方法类似。

#3


If you have optimized the code, you could look into optimizing the network side to minimize the number of packets sent:

如果您已优化代码,则可以考虑优化网络端以最小化发送的数据包数量:

  • reuse HTTP sessions (i.e.: multiple transactions into one session by keeping the connection open, reduces TCP overhead)
  • 重用HTTP会话(即:通过保持连接打开,减少TCP开销,将多个事务放入一个会话)

  • reduce the number of HTTP headers to the minimum in the request to save bandwidth
  • 在请求中将HTTP标头数量减少到最小,以节省带宽

  • if supported by server, use gzip to compress the body of the request (need to balance CPU usage to do the compression, and the bandwidth you save)
  • 如果服务器支持,使用gzip压缩请求的主体(需要平衡CPU使用率来进行压缩,以及你节省的带宽)

#4


You might want to consider Rhino Service Bus instead of MSMQ. The source is available here.

您可能需要考虑Rhino Service Bus而不是MSMQ。来源可在此处获得。

#1


Sounds like your goal is to execute all these web service calls as quickly as you can, and get the results tabulated. Given that, your greatest efficiency control is going to be through scaling the number of concurrent requests you can make.

听起来您的目标是尽可能快地执行所有这些Web服务调用,并将结果列表。鉴于此,您最大的效率控制将通过扩展您可以进行的并发请求的数量来实现。

Be sure to look at your client-side connection limits. By default, I think the system default is 2 connections. I haven't tried this myself, but by upping the number of connections with this property, you should theoretically see a multiplier effect in terms of generating more requests by generating more connections from a single machine. There's more info on MS forums.

请务必查看客户端连接限制。默认情况下,我认为系统默认为2个连接。我自己没有尝试过,但是通过增加与此属性的连接数,理论上应该通过从单个机器生成更多连接来生成更多请求的乘数效应。有关MS论坛的更多信息。

The MSMQ option works well. I'm running that configuration myself. ActiveMQ is also a fine solution, but MSMQ is already on the server.

MSMQ选项运行良好。我自己正在运行该配置。 ActiveMQ也是一个很好的解决方案,但MSMQ已经在服务器上。

You have a good starting point. Get that in operation, then move on to performance and throughput.

你有一个很好的起点。将其投入运行,然后继续进行性能和吞吐量。

#2


At CodeMash this year, Wesley Faler did an interesting presentation on this sort of problem. His solution was to store "jobs" in a DB, then use clients to pull down work and mark status when complete.

在今年的CodeMash上,Wesley Faler就这类问题做了一个有趣的演讲。他的解决方案是将“作业”存储在数据库中,然后使用客户端完成工作并标记状态。

He then pushed the whole infrastructure up to Amazon's EC2.

然后,他将整个基础设施推到亚马逊的EC2。

Here's his slides from the presentation - they should give you the basic idea:

这是演示文稿中的幻灯片 - 他们应该给你基本的想法:

I've done something similar w/ multiple PC's locally - the basics of managing the workload were similar to Faler's approach.

我已经在本地完成了类似的多个PC - 管理工作负载的基础知识与Faler的方法类似。

#3


If you have optimized the code, you could look into optimizing the network side to minimize the number of packets sent:

如果您已优化代码,则可以考虑优化网络端以最小化发送的数据包数量:

  • reuse HTTP sessions (i.e.: multiple transactions into one session by keeping the connection open, reduces TCP overhead)
  • 重用HTTP会话(即:通过保持连接打开,减少TCP开销,将多个事务放入一个会话)

  • reduce the number of HTTP headers to the minimum in the request to save bandwidth
  • 在请求中将HTTP标头数量减少到最小,以节省带宽

  • if supported by server, use gzip to compress the body of the request (need to balance CPU usage to do the compression, and the bandwidth you save)
  • 如果服务器支持,使用gzip压缩请求的主体(需要平衡CPU使用率来进行压缩,以及你节省的带宽)

#4


You might want to consider Rhino Service Bus instead of MSMQ. The source is available here.

您可能需要考虑Rhino Service Bus而不是MSMQ。来源可在此处获得。