在当前抛出一个execption语句的正则表达式中处理potential +

时间:2022-03-21 13:14:59

I have a list of keywords (that are dynamically generated) that are iterated over and inserted into a regular expression.

我有一个关键字列表(这些关键字是动态生成的),它们被迭代并插入到正则表达式中。

For example:

例如:

const keywords = ['Pascal', 'Elixir', 'A++', 'Elm'];

keywords.forEach((keyword) => {
  const reg = new RegExp(`\\b(${keyword})\\b`, 'ig');

  ... do stuff with regex here ...
});

This pattern is fine except in the case of A++ (or C++ or any other potential comparable variation), where I get the following error:

除了A++(或C++或任何其他可能的类似变化)的情况外,此模式没有问题,其中我得到以下错误:

Uncaught SyntaxError: Invalid regular expression: /\b($a++)\b/: Nothing to repeat

Given that I don't know what the specific values are at runtime, how would I handle this edge case?

既然我不知道运行时的具体值是多少,我该如何处理这个边的情况呢?

2 个解决方案

#1


2  

You must escape regular expression special characters.

必须转义正则表达式特殊字符。

In your case it is A++

在你的例子里是+

where you need to A\+\+

你在哪里需要一个\+\+

As you don't know what is going to be there in the list so before you use any of those dynamic string as a part of regex you may use some sanitizer to escape special characters.

因为你不知道列表中会出现什么,所以在你使用这些动态字符串作为regex的一部分之前,你可以使用一些消毒剂来逃避特殊字符。

For a head start you may use something like this:

首先,你可以使用以下方法:

function escSpecialChars(str){
  return str.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1");
}

const keywords = ['Pascal', 'Elixir', 'A++', 'Elm'];

keywords.forEach((keyword) => {
  //console.log(keyword);// remove this later
  var tmpReg=escSpecialChars(keyword);
  //console.log(tmpReg); // remove this later
  var pattern = "\\b"+tmpReg+"(?=\\s|$)";
  var re = new RegExp(pattern);
  
  console.log(re.exec("Pascal Elixir cascel Elm A++"));
  
  
  // do your stuff here

});

List of special meaning characters of regular expression can be found here

正则表达式的特殊含义列表可以在这里找到。

#2


1  

The problem is that + is a special character in a regex and it means the previous character, 1 or more times. Two + characters in a row is a syntax error. If you are dynamically populating a regex with strings you need to make that those strings are valid regex strings.

问题是+是regex中的一个特殊字符,它表示前面的字符,次数为1或更多。行中的两个+字符是一个语法错误。如果使用字符串动态填充regex,则需要使这些字符串是有效的regex字符串。

You can either dynamically escape them (look through the string, find special characters and prefix them with a \), or you can make sure that they are valid regex strings before they get to your program.

您可以动态地转义它们(查看字符串,查找特殊字符,并在它们前面加上一个\),或者在它们到达您的程序之前确保它们是有效的正则表达式字符串。

Also, you use the \b word boundary character in your regex. A + character is not a word character so the regex \bA\+\+\b will not match the string A++. There is no word boundary after a + character. You might have to rethink your main regex.

此外,您还可以在regex中使用\b字边界字符。A+字符不是一个单词字符,所以regex \bA\+\ b将不会匹配字符串A++。在+字符之后没有单词边界。您可能需要重新考虑您的主regex。

#1


2  

You must escape regular expression special characters.

必须转义正则表达式特殊字符。

In your case it is A++

在你的例子里是+

where you need to A\+\+

你在哪里需要一个\+\+

As you don't know what is going to be there in the list so before you use any of those dynamic string as a part of regex you may use some sanitizer to escape special characters.

因为你不知道列表中会出现什么,所以在你使用这些动态字符串作为regex的一部分之前,你可以使用一些消毒剂来逃避特殊字符。

For a head start you may use something like this:

首先,你可以使用以下方法:

function escSpecialChars(str){
  return str.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1");
}

const keywords = ['Pascal', 'Elixir', 'A++', 'Elm'];

keywords.forEach((keyword) => {
  //console.log(keyword);// remove this later
  var tmpReg=escSpecialChars(keyword);
  //console.log(tmpReg); // remove this later
  var pattern = "\\b"+tmpReg+"(?=\\s|$)";
  var re = new RegExp(pattern);
  
  console.log(re.exec("Pascal Elixir cascel Elm A++"));
  
  
  // do your stuff here

});

List of special meaning characters of regular expression can be found here

正则表达式的特殊含义列表可以在这里找到。

#2


1  

The problem is that + is a special character in a regex and it means the previous character, 1 or more times. Two + characters in a row is a syntax error. If you are dynamically populating a regex with strings you need to make that those strings are valid regex strings.

问题是+是regex中的一个特殊字符,它表示前面的字符,次数为1或更多。行中的两个+字符是一个语法错误。如果使用字符串动态填充regex,则需要使这些字符串是有效的regex字符串。

You can either dynamically escape them (look through the string, find special characters and prefix them with a \), or you can make sure that they are valid regex strings before they get to your program.

您可以动态地转义它们(查看字符串,查找特殊字符,并在它们前面加上一个\),或者在它们到达您的程序之前确保它们是有效的正则表达式字符串。

Also, you use the \b word boundary character in your regex. A + character is not a word character so the regex \bA\+\+\b will not match the string A++. There is no word boundary after a + character. You might have to rethink your main regex.

此外,您还可以在regex中使用\b字边界字符。A+字符不是一个单词字符,所以regex \bA\+\ b将不会匹配字符串A++。在+字符之后没有单词边界。您可能需要重新考虑您的主regex。