我怎样才能加速我的正则表达式?

时间:2021-02-17 23:39:53

I'm writing a script to change all the urls of my content over to a new place.

我正在编写一个脚本来将我的内容的所有网址更改为新的地方。

var regex = /.*cloudfront.net/
var pDistro = "newDistro.cloudfront.net/"

for(var i=0;i<strings.length;i++){
    strings[i] = strings[i].replace(regex,pDistro);
}

The strings I'm doing replace on average about 140 characters each. They're urls that follow the format: https://[thing to replace].cloudfront.net/[something]/[something]/[something]

我正在做的字符串平均每个替换大约140个字符。他们的网址遵循以下格式:https:// [要替换的东西] .cloudfront.net / [something] / [something] / [something]

But this operation is terribly slow, taking about 4.5 seconds to process an average-sized array.

但是这个操作非常慢,大约需要4.5秒来处理一个平均大小的阵列。

Why is this so slow? How can I make this faster?

为什么这么慢?我怎样才能让它更快?

If this question would be better suited to the codereview stack exchange, or some other site, let me know and I'll move it there.

如果这个问题更适合代码回放堆栈交换或其他一些网站,请告诉我,我会把它移到那里。

EDIT:

The data, as it appeared in the db I was pulling from appeared to be 140 characters. During the pull process, some virtualization happened and appended 400 more characters onto the string, so no wonder the regex takes so long.

我在数据库中出现的数据似乎是140个字符。在拉取过程中,发生了一些虚拟化并在字符串上添加了400多个字符,因此难怪正则表达式需要这么长时间。

The 140-character-string loop takes considerably less time, as others have pointed out.

正如其他人所指出的那样,140字符串循环所花费的时间要少得多。

The moral of the story: "Make sure the data you have is what you expect it to be" and "If your regex is taking too long, use smaller strings and a more specific regex (i.e. no wildcard)"

故事的寓意:“确保你拥有的数据是你所期望的”和“如果你的正则表达式花了太长时间,使用更小的字符串和更具体的正则表达式(即没有通配符)”

2 个解决方案

#1


6  

Perhaps it would run a little faster like this:

也许它会像这样运行得快一点:

https:\/\/[a-zA-Z0-9]+\.cloudfront\.net

Generally, the more exclusive your character sets are the faster the regular expression will run.

通常,您的字符集越独特,正则表达式运行得越快。


Thanks to @sbedulin for providing a jsperf link

感谢@sbedulin提供jsperf链接

#2


4  

For such a simple replacement, a regex is likely not the fastest search and replace. For example, if you replace the search with .indexOf() and then use .slice() to do the replacement, you can speed it up 12-50x (depending upon browser).

对于这种简单的替换,正则表达式可能不是最快的搜索和替换。例如,如果用.indexOf()替换搜索,然后使用.slice()进行替换,则可以将其加速12-50倍(取决于浏览器)。

I wasn't sure of the exact replacement logic you want to simulate, but here's a non-regex method that is a lot faster:

我不确定你想要模拟的确切替换逻辑,但是这里的非正则表达方法要快得多:

var pos, str, target = "cloudfront.net/";
var pDistro = "https://newDistro.cloudfront.net/"
for(var i = 0; i < urls.length; i++){
    str = urls[i];
    pos = str.indexOf(target);
    if (pos !== -1) {
        results[i] = pDistro + str.slice(pos + target.length);
    }
}

Adding in the more intelligent regex replacement suggested by others, here's a comparison. The more intelligent regex definitely helps the regex, but it is still slower than just using .indexOf() and .slice() and the difference is the most pronounced in Firefox:

添加其他人建议的更智能的正则表达式替换,这是一个比较。更加智能的正则表达式肯定有助于正则表达式,但它仍然比仅使用.indexOf()和.slice()更慢,而且差异是Firefox中最明显的:

See jsperf here: http://jsperf.com/fast-replacer

请参阅jsperf:http://jsperf.com/fast-replacer

我怎样才能加速我的正则表达式?

#1


6  

Perhaps it would run a little faster like this:

也许它会像这样运行得快一点:

https:\/\/[a-zA-Z0-9]+\.cloudfront\.net

Generally, the more exclusive your character sets are the faster the regular expression will run.

通常,您的字符集越独特,正则表达式运行得越快。


Thanks to @sbedulin for providing a jsperf link

感谢@sbedulin提供jsperf链接

#2


4  

For such a simple replacement, a regex is likely not the fastest search and replace. For example, if you replace the search with .indexOf() and then use .slice() to do the replacement, you can speed it up 12-50x (depending upon browser).

对于这种简单的替换,正则表达式可能不是最快的搜索和替换。例如,如果用.indexOf()替换搜索,然后使用.slice()进行替换,则可以将其加速12-50倍(取决于浏览器)。

I wasn't sure of the exact replacement logic you want to simulate, but here's a non-regex method that is a lot faster:

我不确定你想要模拟的确切替换逻辑,但是这里的非正则表达方法要快得多:

var pos, str, target = "cloudfront.net/";
var pDistro = "https://newDistro.cloudfront.net/"
for(var i = 0; i < urls.length; i++){
    str = urls[i];
    pos = str.indexOf(target);
    if (pos !== -1) {
        results[i] = pDistro + str.slice(pos + target.length);
    }
}

Adding in the more intelligent regex replacement suggested by others, here's a comparison. The more intelligent regex definitely helps the regex, but it is still slower than just using .indexOf() and .slice() and the difference is the most pronounced in Firefox:

添加其他人建议的更智能的正则表达式替换,这是一个比较。更加智能的正则表达式肯定有助于正则表达式,但它仍然比仅使用.indexOf()和.slice()更慢,而且差异是Firefox中最明显的:

See jsperf here: http://jsperf.com/fast-replacer

请参阅jsperf:http://jsperf.com/fast-replacer

我怎样才能加速我的正则表达式?