为什么EventMachine比Node慢得多?

时间:2022-06-08 15:38:38

In my specific case, at least. Not trying to make general statements here.

在我的具体情况下,至少。不试图在这里做一般性陈述。

I've got this web crawler that I wrote in Node.js. I'd love to use Ruby instead, so I re-wrote it in EventMachine. Since the original was in CoffeeScript, it was actually surprisingly easy, and the code is very much the same, except that in EventMachine I can actually trap and recover from exceptions (since I'm using fibers).

我有这个我在Node.js写的网络爬虫。我喜欢使用Ruby,所以我在EventMachine中重写了它。由于原版是在CoffeeScript中,它实际上非常简单,并且代码非常相似,除了在EventMachine中我实际上可以捕获并从异常中恢复(因为我使用的是光纤)。

The problem is that tests that run in under 20 seconds on the Node.js code take up to and over 5 minutes on EventMachine. When I watch the connection count it almost looks like they are not even running in parallel (they queue up into the hundreds, then very slowly work their way down), though logging shows that the code points are hit in parallel.

问题是Node.js代码在20秒内运行的测试在EventMachine上花费的时间超过5分钟。当我观察连接计数时,它几乎看起来甚至没有并行运行(它们排队成数百个,然后非常缓慢地向下运行),尽管日志记录显示代码点是并行命中的。

I realize that without code you can't really know what exactly is going on, but I was just wondering if there is some kind of underlying difference and I should give up, or if they really should be able to run about as fast (a small slowdown is fine) and I should keep trying to figure out what the issue is.

我意识到没有代码你就不能确切地知道到底发生了什么,但我只是想知道是否存在某种潜在的差异,我应该放弃,或者他们是否真的应该能够快速运行(a小幅减速很好)我应该继续试图找出问题所在。

I did the following, but it didn't really seem to have any effect:

我做了以下,但它似乎没有任何影响:

puts "Running with ulimit: " + EM.set_descriptor_table_size(60000).to_s
EM.set_effective_user('nobody')
EM.kqueue

Oh, and I'm very sure that I don't have any blocking calls in EventMachine. I've combed through every line about 10 times looking for anything that could be blocking. All my network calls are EM::HttpRequest.

哦,我非常确定我在EventMachine中没有任何阻塞调用。我已经梳理了每一行大约10次,寻找任何可能阻塞的东西。我所有的网络调用都是EM :: HttpRequest。

1 个解决方案

#1


13  

The problem is that tests that run in under 20 seconds on the Node.js code take up to and over 5 minutes on EventMachine. When I watch the connection count it almost looks like they are not even running in parallel (they queue up into the hundreds, then very slowly work their way down), though logging shows that the code points are hit in parallel.

问题是Node.js代码在20秒内运行的测试在EventMachine上花费的时间超过5分钟。当我观察连接计数时,它几乎看起来甚至没有并行运行(它们排队成数百个,然后非常缓慢地向下运行),尽管日志记录显示代码点是并行命中的。

If they're not running in parallel then it's not asynchronous. So you're blocking.

如果它们没有并行运行,那么它不是异步的。所以你阻止了。

Basically you need to figure out what blocking IO call you've made in the standard Ruby library and remove that and replace it with an EventMachine non blocking IO call.

基本上,您需要弄清楚在标准Ruby库中阻止IO调用的内容并将其删除并将其替换为EventMachine非阻塞IO调用。

Your code may not have any blocking calls but are you using 3rd party code that is not your own or not from EM ? They may block. Even something as simple as a debug print / log can block.

您的代码可能没有任何阻止调用,但您使用的是不属于您自己的第三方代码吗?他们可能会阻止。即使像调试打印/日志这样简单的东西也可以阻止。

All my network calls are EM::HttpRequest.

我所有的网络调用都是EM :: HttpRequest。

What about file IO, what about TCP ? What about anything else that can block. What about 3rd party libraries.

怎么样的文件IO,TCP怎么样?什么其他可以阻止的东西呢。第三方图书馆怎么样?

We really need to see some code here. Either to identify a bottle neck in your code or a blocking call.

我们真的需要在这里看到一些代码。要么在代码中识别瓶颈,要么阻止调用。

node.js should not be more than an order of magnitude faster then EM.

node.js不应该比EM快一个数量级。

#1


13  

The problem is that tests that run in under 20 seconds on the Node.js code take up to and over 5 minutes on EventMachine. When I watch the connection count it almost looks like they are not even running in parallel (they queue up into the hundreds, then very slowly work their way down), though logging shows that the code points are hit in parallel.

问题是Node.js代码在20秒内运行的测试在EventMachine上花费的时间超过5分钟。当我观察连接计数时,它几乎看起来甚至没有并行运行(它们排队成数百个,然后非常缓慢地向下运行),尽管日志记录显示代码点是并行命中的。

If they're not running in parallel then it's not asynchronous. So you're blocking.

如果它们没有并行运行,那么它不是异步的。所以你阻止了。

Basically you need to figure out what blocking IO call you've made in the standard Ruby library and remove that and replace it with an EventMachine non blocking IO call.

基本上,您需要弄清楚在标准Ruby库中阻止IO调用的内容并将其删除并将其替换为EventMachine非阻塞IO调用。

Your code may not have any blocking calls but are you using 3rd party code that is not your own or not from EM ? They may block. Even something as simple as a debug print / log can block.

您的代码可能没有任何阻止调用,但您使用的是不属于您自己的第三方代码吗?他们可能会阻止。即使像调试打印/日志这样简单的东西也可以阻止。

All my network calls are EM::HttpRequest.

我所有的网络调用都是EM :: HttpRequest。

What about file IO, what about TCP ? What about anything else that can block. What about 3rd party libraries.

怎么样的文件IO,TCP怎么样?什么其他可以阻止的东西呢。第三方图书馆怎么样?

We really need to see some code here. Either to identify a bottle neck in your code or a blocking call.

我们真的需要在这里看到一些代码。要么在代码中识别瓶颈,要么阻止调用。

node.js should not be more than an order of magnitude faster then EM.

node.js不应该比EM快一个数量级。