JavaScript多久会在函数中重新编译正则表达式?

时间:2022-02-02 13:44:43

Given this function:

鉴于这种功能:

function doThing(values,things){
  var thatRegex = /^http:\/\//i; // is this created once or on every execution?
  if (values.match(thatRegex)) return values;
  return things;
}

How often does the JavaScript engine have to create the regex? Once per execution or once per page load/script parse?

JavaScript引擎需要多久创建一次regex?每次执行一次还是每次页面加载/脚本解析一次?

To prevent needless answers or comments, I personally favor putting the regex outside the function, not inside. The question is about the behavior of the language, because I'm not sure where to look this up, or if this is an engine issue.

为了避免不必要的回答或评论,我个人倾向于把正则表达式放在函数之外,而不是内部。问题是关于语言的行为,因为我不确定在哪里查找,或者这是否是一个引擎问题。


EDIT:

I was reminded I didn't mention that this was going to be used in a loop. My apologies:

我被提醒,我没有提到这将在循环中使用。我的道歉:

var newList = [];
foreach(item1 in ListOfItems1){ 
  foreach(item2 in ListOfItems2){ 
    newList.push(doThing(item1, item2));
  }
}

So given that it's going to be used many times in a loop, it makes sense to define the regex outside the function, but so that's the idea.

既然它将在循环中被多次使用,那么在函数之外定义regex是有意义的,这就是我们的想法。

also note the script is rather genericized for the purpose of examining only the behavior and cost of the regex creation

还请注意,该脚本相当一般化,只用于检查regex创建的行为和成本

4 个解决方案

#1


3  

There are two "regular expression" type objects in javascript. Regular expression instances and the RegExp object.

javascript中有两个“正则表达式”类型的对象。正则表达式实例和RegExp对象。

Also, there are two ways to create regular expression instances:

另外,有两种方法可以创建正则表达式实例:

  1. using the /regex/ syntax and
  2. 使用/regex/语法和。
  3. using new RegExp('regex');
  4. 使用新的正则表达式(regex);

Each of these will create new regular expression instance each time.

每一个都将每次创建新的正则表达式实例。

However there is only ONE global RegExp object.

然而,只有一个全局RegExp对象。

var input = 'abcdef';
var r1 = /(abc)/;
var r2 = /(def)/;
r1.exec(input);
alert(RegExp.$1); //outputs 'abc'
r2.exec(input);
alert(RegExp.$1); //outputs 'def'

The actual pattern is compiled as the script is loaded when you use Syntax 1

当您使用语法1时,实际的模式被编译为脚本。

The pattern argument is compiled into an internal format before use. For Syntax 1, pattern is compiled as the script is loaded. For Syntax 2, pattern is compiled just before use, or when the compile method is called.

模式参数在使用之前被编译成内部格式。对于语法1,模式是在加载脚本时编译的。对于语法2,模式是在使用之前或调用编译方法时编译的。

But you still could get different regular expression instances each method call. Test in chrome vs firefox

但是您仍然可以获得不同的正则表达式实例,每个方法调用。在chrome和firefox中进行测试

function testregex() {
    var localreg = /abc/;
    if (testregex.reg != null){
        alert(localreg === testregex.reg);
    };
    testregex.reg = localreg;
}
testregex();
testregex();

It's VERY little overhead, but if you wanted exactly one regex, its safest to only create one instance outside of your function

它的开销非常小,但是如果您只想要一个regex,那么在函数之外只创建一个实例是最安全的

#2


13  

From Mozilla's JavaScript Guide on regular expressions:

来自Mozilla关于正则表达式的JavaScript指南:

Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.

正则表达式文字提供了对脚本求值时正则表达式的编译。当正则表达式保持不变时,使用它来获得更好的性能。

And from the ECMA-262 spec, §7.8.5 Regular Expression Literals:

和ecma - 262规范,§7.8.5正则表达式文本:

A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated.

正则表达式字面量是一个输入元素,它被转换为RegExp对象(见15.10),每次都要对文字进行评估。

In other words, it's compiled once when it's evaluated as a script is first parsed.

换句话说,当第一次解析脚本时,它被编译一次。

It's worth noting also, from the ES5 spec, that two literals will compile to two distinct instances of RegExp, even if the literals themselves are the same. Thus if a given literal appears twice within your script, it will be compiled twice, to two distinct instances:

同样值得注意的是,在ES5规范中,两个常量会编译成RegExp的两个不同实例,即使它们本身是相同的。因此,如果一个给定的文字在您的脚本中出现两次,它将被编译两次,到两个不同的实例:

Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical.

程序中的两个正则表达式常量对正则表达式对象进行评估,即使这两个常量的内容是相同的,它们也不会相互比较为=== =。

...

... each time the literal is evaluated, a new object is created as if by the expression new RegExp(Pattern, Flags) where RegExp is the standard built-in constructor with that name.

…每次计算文字时,就像通过表达式new RegExp(模式,标志)创建一个新对象,其中RegExp是具有该名称的标准内置构造函数。

#3


5  

The regex will be compiled every time you call the function if it's not in literal form.
Since you are including it in a literal form, you've got nothing to worry about.

每次调用函数时,如果它不是按字面形式,那么regex将被编译。既然你把它以文字的形式包含进来,你就没有什么可担心的了。

Here's a quote from websina.com:

以下是来自websina.com的一句话:

Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.

正则表达式文字提供了对脚本求值时正则表达式的编译。当正则表达式保持不变时,使用它来获得更好的性能。

Calling the constructor function of the RegExp object, as follows:
re = new RegExp("ab+c")

调用RegExp对象的构造函数,如下所示:re = new RegExp(“ab+c”)

Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.

使用构造函数提供正则表达式的运行时编译。当您知道正则表达式模式将会发生变化时,使用构造函数函数,或者您不知道该模式,并从另一个源(如用户输入)获取它。

#4


4  

The provided answers don't clearly distinguish between two different processes behind the scene: regexp compilation and regexp object creation when hitting regexp object creation expression.

所提供的答案并不能清楚地区分场景背后的两个不同进程:regexp编译和在执行regexp对象创建表达式时创建regexp对象。

Yes, using regexp literal syntax, you're gaining the performance benefit of one time regexp compilation.

是的,使用regexp文字语法,您将获得一次regexp编译的性能好处。

But if your code executes in ES5+ environment, every time the code path enters the doThing() function in your example, it actually creates a new RegExp object, though, without need to compile the regexp again and again.

但是,如果您的代码在ES5+环境中执行,那么每当代码路径进入示例中的doThing()函数时,它实际上会创建一个新的RegExp对象,而不需要一次又一次地编译RegExp。

In ES5, literal syntax produces a new RegExp object every time code path hits expression that creates a regexp via literal:

在ES5中,每次代码路径遇到通过文字创建RegExp的表达式时,文字语法都会生成一个新的RegExp对象:

function getRE() {
    var re = /[a-z]/;
    re.foo = "bar";
    return re;
}

var reg = getRE(),
    re2 = getRE();

console.log(reg === re2); // false
reg.foo = "baz";
console.log(re2.foo); // "bar"

To illustrate the above statements from the point of actual numbers, take a look at the performance difference between storedRegExp and inlineRegExp tests in this jsperf.

为了从实际数字的角度说明上述语句,请查看一下在这个jsperf中的storedRegExp和inlineRegExp测试之间的性能差异。

storedRegExp would be about 5 - 20% percent faster across browsers than inlineRegExp - the overhead of creating (and garbage collecting) a new RegExp object every time.

与inlineRegExp(每次创建一个新的RegExp对象(和垃圾收集)的开销)相比,storedRegExp在不同浏览器之间的速度要快5%到20%。

Conslusion:
If you're heavily using your literal regexps, consider caching them outside the scope where they are needed, so that they are not only be compiled once, but actual regexp objects for them would be created once as well.

Conslusion:如果你大量使用你的文字regexp,考虑在需要它们的范围之外缓存它们,这样它们不仅会被编译一次,而且它们的实际regexp对象也会被创建一次。

#1


3  

There are two "regular expression" type objects in javascript. Regular expression instances and the RegExp object.

javascript中有两个“正则表达式”类型的对象。正则表达式实例和RegExp对象。

Also, there are two ways to create regular expression instances:

另外,有两种方法可以创建正则表达式实例:

  1. using the /regex/ syntax and
  2. 使用/regex/语法和。
  3. using new RegExp('regex');
  4. 使用新的正则表达式(regex);

Each of these will create new regular expression instance each time.

每一个都将每次创建新的正则表达式实例。

However there is only ONE global RegExp object.

然而,只有一个全局RegExp对象。

var input = 'abcdef';
var r1 = /(abc)/;
var r2 = /(def)/;
r1.exec(input);
alert(RegExp.$1); //outputs 'abc'
r2.exec(input);
alert(RegExp.$1); //outputs 'def'

The actual pattern is compiled as the script is loaded when you use Syntax 1

当您使用语法1时,实际的模式被编译为脚本。

The pattern argument is compiled into an internal format before use. For Syntax 1, pattern is compiled as the script is loaded. For Syntax 2, pattern is compiled just before use, or when the compile method is called.

模式参数在使用之前被编译成内部格式。对于语法1,模式是在加载脚本时编译的。对于语法2,模式是在使用之前或调用编译方法时编译的。

But you still could get different regular expression instances each method call. Test in chrome vs firefox

但是您仍然可以获得不同的正则表达式实例,每个方法调用。在chrome和firefox中进行测试

function testregex() {
    var localreg = /abc/;
    if (testregex.reg != null){
        alert(localreg === testregex.reg);
    };
    testregex.reg = localreg;
}
testregex();
testregex();

It's VERY little overhead, but if you wanted exactly one regex, its safest to only create one instance outside of your function

它的开销非常小,但是如果您只想要一个regex,那么在函数之外只创建一个实例是最安全的

#2


13  

From Mozilla's JavaScript Guide on regular expressions:

来自Mozilla关于正则表达式的JavaScript指南:

Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.

正则表达式文字提供了对脚本求值时正则表达式的编译。当正则表达式保持不变时,使用它来获得更好的性能。

And from the ECMA-262 spec, §7.8.5 Regular Expression Literals:

和ecma - 262规范,§7.8.5正则表达式文本:

A regular expression literal is an input element that is converted to a RegExp object (see 15.10) each time the literal is evaluated.

正则表达式字面量是一个输入元素,它被转换为RegExp对象(见15.10),每次都要对文字进行评估。

In other words, it's compiled once when it's evaluated as a script is first parsed.

换句话说,当第一次解析脚本时,它被编译一次。

It's worth noting also, from the ES5 spec, that two literals will compile to two distinct instances of RegExp, even if the literals themselves are the same. Thus if a given literal appears twice within your script, it will be compiled twice, to two distinct instances:

同样值得注意的是,在ES5规范中,两个常量会编译成RegExp的两个不同实例,即使它们本身是相同的。因此,如果一个给定的文字在您的脚本中出现两次,它将被编译两次,到两个不同的实例:

Two regular expression literals in a program evaluate to regular expression objects that never compare as === to each other even if the two literals' contents are identical.

程序中的两个正则表达式常量对正则表达式对象进行评估,即使这两个常量的内容是相同的,它们也不会相互比较为=== =。

...

... each time the literal is evaluated, a new object is created as if by the expression new RegExp(Pattern, Flags) where RegExp is the standard built-in constructor with that name.

…每次计算文字时,就像通过表达式new RegExp(模式,标志)创建一个新对象,其中RegExp是具有该名称的标准内置构造函数。

#3


5  

The regex will be compiled every time you call the function if it's not in literal form.
Since you are including it in a literal form, you've got nothing to worry about.

每次调用函数时,如果它不是按字面形式,那么regex将被编译。既然你把它以文字的形式包含进来,你就没有什么可担心的了。

Here's a quote from websina.com:

以下是来自websina.com的一句话:

Regular expression literals provide compilation of the regular expression when the script is evaluated. When the regular expression will remain constant, use this for better performance.

正则表达式文字提供了对脚本求值时正则表达式的编译。当正则表达式保持不变时,使用它来获得更好的性能。

Calling the constructor function of the RegExp object, as follows:
re = new RegExp("ab+c")

调用RegExp对象的构造函数,如下所示:re = new RegExp(“ab+c”)

Using the constructor function provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don't know the pattern and are getting it from another source, such as user input.

使用构造函数提供正则表达式的运行时编译。当您知道正则表达式模式将会发生变化时,使用构造函数函数,或者您不知道该模式,并从另一个源(如用户输入)获取它。

#4


4  

The provided answers don't clearly distinguish between two different processes behind the scene: regexp compilation and regexp object creation when hitting regexp object creation expression.

所提供的答案并不能清楚地区分场景背后的两个不同进程:regexp编译和在执行regexp对象创建表达式时创建regexp对象。

Yes, using regexp literal syntax, you're gaining the performance benefit of one time regexp compilation.

是的,使用regexp文字语法,您将获得一次regexp编译的性能好处。

But if your code executes in ES5+ environment, every time the code path enters the doThing() function in your example, it actually creates a new RegExp object, though, without need to compile the regexp again and again.

但是,如果您的代码在ES5+环境中执行,那么每当代码路径进入示例中的doThing()函数时,它实际上会创建一个新的RegExp对象,而不需要一次又一次地编译RegExp。

In ES5, literal syntax produces a new RegExp object every time code path hits expression that creates a regexp via literal:

在ES5中,每次代码路径遇到通过文字创建RegExp的表达式时,文字语法都会生成一个新的RegExp对象:

function getRE() {
    var re = /[a-z]/;
    re.foo = "bar";
    return re;
}

var reg = getRE(),
    re2 = getRE();

console.log(reg === re2); // false
reg.foo = "baz";
console.log(re2.foo); // "bar"

To illustrate the above statements from the point of actual numbers, take a look at the performance difference between storedRegExp and inlineRegExp tests in this jsperf.

为了从实际数字的角度说明上述语句,请查看一下在这个jsperf中的storedRegExp和inlineRegExp测试之间的性能差异。

storedRegExp would be about 5 - 20% percent faster across browsers than inlineRegExp - the overhead of creating (and garbage collecting) a new RegExp object every time.

与inlineRegExp(每次创建一个新的RegExp对象(和垃圾收集)的开销)相比,storedRegExp在不同浏览器之间的速度要快5%到20%。

Conslusion:
If you're heavily using your literal regexps, consider caching them outside the scope where they are needed, so that they are not only be compiled once, but actual regexp objects for them would be created once as well.

Conslusion:如果你大量使用你的文字regexp,考虑在需要它们的范围之外缓存它们,这样它们不仅会被编译一次,而且它们的实际regexp对象也会被创建一次。