I have a list of keywords (that are dynamically generated) that are iterated over and inserted into a regular expression.
我有一个关键字列表(这些关键字是动态生成的),它们被迭代并插入到正则表达式中。
For example:
例如:
const keywords = ['Pascal', 'Elixir', 'A++', 'Elm'];
keywords.forEach((keyword) => {
const reg = new RegExp(`\\b(${keyword})\\b`, 'ig');
... do stuff with regex here ...
});
This pattern is fine except in the case of A++ (or C++ or any other potential comparable variation), where I get the following error:
除了A++(或C++或任何其他可能的类似变化)的情况外,此模式没有问题,其中我得到以下错误:
Uncaught SyntaxError: Invalid regular expression: /\b($a++)\b/: Nothing to repeat
Given that I don't know what the specific values are at runtime, how would I handle this edge case?
既然我不知道运行时的具体值是多少,我该如何处理这个边的情况呢?
2 个解决方案
#1
2
You must escape regular expression special characters.
必须转义正则表达式特殊字符。
In your case it is A++
在你的例子里是+
where you need to A\+\+
你在哪里需要一个\+\+
As you don't know what is going to be there in the list so before you use any of those dynamic string as a part of regex you may use some sanitizer to escape special characters.
因为你不知道列表中会出现什么,所以在你使用这些动态字符串作为regex的一部分之前,你可以使用一些消毒剂来逃避特殊字符。
For a head start you may use something like this:
首先,你可以使用以下方法:
function escSpecialChars(str){
return str.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1");
}
const keywords = ['Pascal', 'Elixir', 'A++', 'Elm'];
keywords.forEach((keyword) => {
//console.log(keyword);// remove this later
var tmpReg=escSpecialChars(keyword);
//console.log(tmpReg); // remove this later
var pattern = "\\b"+tmpReg+"(?=\\s|$)";
var re = new RegExp(pattern);
console.log(re.exec("Pascal Elixir cascel Elm A++"));
// do your stuff here
});
List of special meaning characters of regular expression can be found here
正则表达式的特殊含义列表可以在这里找到。
#2
1
The problem is that +
is a special character in a regex and it means the previous character, 1 or more times
. Two +
characters in a row is a syntax error. If you are dynamically populating a regex with strings you need to make that those strings are valid regex strings.
问题是+是regex中的一个特殊字符,它表示前面的字符,次数为1或更多。行中的两个+字符是一个语法错误。如果使用字符串动态填充regex,则需要使这些字符串是有效的regex字符串。
You can either dynamically escape them (look through the string, find special characters and prefix them with a \
), or you can make sure that they are valid regex strings before they get to your program.
您可以动态地转义它们(查看字符串,查找特殊字符,并在它们前面加上一个\),或者在它们到达您的程序之前确保它们是有效的正则表达式字符串。
Also, you use the \b
word boundary character in your regex. A +
character is not a word character so the regex \bA\+\+\b
will not match the string A++
. There is no word boundary after a +
character. You might have to rethink your main regex.
此外,您还可以在regex中使用\b字边界字符。A+字符不是一个单词字符,所以regex \bA\+\ b将不会匹配字符串A++。在+字符之后没有单词边界。您可能需要重新考虑您的主regex。
#1
2
You must escape regular expression special characters.
必须转义正则表达式特殊字符。
In your case it is A++
在你的例子里是+
where you need to A\+\+
你在哪里需要一个\+\+
As you don't know what is going to be there in the list so before you use any of those dynamic string as a part of regex you may use some sanitizer to escape special characters.
因为你不知道列表中会出现什么,所以在你使用这些动态字符串作为regex的一部分之前,你可以使用一些消毒剂来逃避特殊字符。
For a head start you may use something like this:
首先,你可以使用以下方法:
function escSpecialChars(str){
return str.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1");
}
const keywords = ['Pascal', 'Elixir', 'A++', 'Elm'];
keywords.forEach((keyword) => {
//console.log(keyword);// remove this later
var tmpReg=escSpecialChars(keyword);
//console.log(tmpReg); // remove this later
var pattern = "\\b"+tmpReg+"(?=\\s|$)";
var re = new RegExp(pattern);
console.log(re.exec("Pascal Elixir cascel Elm A++"));
// do your stuff here
});
List of special meaning characters of regular expression can be found here
正则表达式的特殊含义列表可以在这里找到。
#2
1
The problem is that +
is a special character in a regex and it means the previous character, 1 or more times
. Two +
characters in a row is a syntax error. If you are dynamically populating a regex with strings you need to make that those strings are valid regex strings.
问题是+是regex中的一个特殊字符,它表示前面的字符,次数为1或更多。行中的两个+字符是一个语法错误。如果使用字符串动态填充regex,则需要使这些字符串是有效的regex字符串。
You can either dynamically escape them (look through the string, find special characters and prefix them with a \
), or you can make sure that they are valid regex strings before they get to your program.
您可以动态地转义它们(查看字符串,查找特殊字符,并在它们前面加上一个\),或者在它们到达您的程序之前确保它们是有效的正则表达式字符串。
Also, you use the \b
word boundary character in your regex. A +
character is not a word character so the regex \bA\+\+\b
will not match the string A++
. There is no word boundary after a +
character. You might have to rethink your main regex.
此外,您还可以在regex中使用\b字边界字符。A+字符不是一个单词字符,所以regex \bA\+\ b将不会匹配字符串A++。在+字符之后没有单词边界。您可能需要重新考虑您的主regex。