在Javascript中以空格分割文本文件的最快方法

时间:2023-01-22 21:10:59

I'm looking at doing some text processing in the browser and am trying to get a rough idea of whether I am going to be CPU bound or I/O bound. To test the speed on the CPU side of the equation, I am seeing how quickly I can split a piece of text (~8.9MB - it's Project Gutenberg's Sherlock Holmes repeated a number of times over) in Javascript once it is in memory. At the moment I'm simply doing:

我正在寻找在浏览器中进行一些文本处理,并试图大致了解我是否将受到CPU绑定或I / O限制。为了测试方程式CPU方面的速度,我看到一旦它在内存中,我可以多快地在Javascript中分割一段文本(~8.9MB - 这是Project Gutenberg的Sherlock Holmes重复了多次)。目前我只是在做:

pieces = theText.split(" ");

and executing it 100 times and taking the average. On a 2011 Macbook Pro i5, the average split in Firefox takes 92.81ms and in Chrome 237.27ms. So 1000/92.81ms * 8.9MB = 95.8MBps on the CPU, which is probably a little faster than the harddisk I/O, but not by much.

并执行100次并取平均值。在2011款Macbook Pro i5上,Firefox平均分为92.81ms,Chrome为237.27ms。因此CPU上1000 / 92.81ms * 8.9MB = 95.8MBps,这可能比硬盘I / O快一点,但不是很多。

So my question is really three parts:

所以我的问题实际上是三个部分:

  • Are there Javascript alternatives to split() that tend to be faster when doing simple text processing (e.g. splitting at spaces, newlines, etc. etc.)?
  • 是否有split()的Javascript替代方法,在进行简单的文本处理时(例如在空格,换行符等处拆分)往往更快?
  • Are the lackluster CPU results I'm seeing here likely due to fundamental string matching/algorithmic constraints, or is the Javascript execution just slow?
  • 我在这里看到的乏味的CPU结果可能是由于基本的字符串匹配/算法约束,还是Javascript执行速度很慢?
  • If you think Javascript is likely the limiting factor, can you demonstrate substantially better performance on a comparable machine/comparable text in any other programming language?
  • 如果您认为Javascript可能是限制因素,您是否可以在任何其他编程语言的同类机器/类似文本上表现出更好的性能?

Edit: I also suspect this could be sped up with WebWorkers, though for now am primarily interested in single-threaded approaches.

编辑:我也怀疑这可以通过WebWorkers加速,但现在我主要对单线程方法感兴趣。

1 个解决方案

#1


2  

As far as i know split with for loop is the fastest way to do simple text processing in javascript. It is faster than regex, here is the link to jsperf http://jsperf.com/query-str-parsing-regex-vs-split/2

据我所知,用for循环拆分是在javascript中进行简单文本处理的最快方法。它比正则表达式快,这里是jsperf的链接http://jsperf.com/query-str-parsing-regex-vs-split/2

#1


2  

As far as i know split with for loop is the fastest way to do simple text processing in javascript. It is faster than regex, here is the link to jsperf http://jsperf.com/query-str-parsing-regex-vs-split/2

据我所知,用for循环拆分是在javascript中进行简单文本处理的最快方法。它比正则表达式快,这里是jsperf的链接http://jsperf.com/query-str-parsing-regex-vs-split/2