I have to build a RegExp obejct, that will search words from an array, and will find only whole words match.
我必须构建一个RegExp obejct,它将从一个数组中搜索单词,并且只能找到整个单词匹配。
e.g. I have a words array ('יל','ילד'), and I want the RegExp to find 'a' or 'יל' or 'ילד', but not 'ילדד'.
例如我有一个单词数组('ל','ל''),我希望RegExp找到'a'或'ל'或'ל'',但不是'יל''。
This is my code:
这是我的代码:
var text = 'ילד ילדדד יל';
var matchWords = ['יל','ילד'];
text = text.replace(/\n$/g, '\n\n').replace(new RegExp('\\b(' + matchWords.join('|') + ')\\b','g'), '<mark>$&</mark>');
console.log(text);
What I have tried:
我试过的:
I tried this code:
我试过这段代码:
new RegExp('(יל|ילד)','g');
It works well, but it find also words like "ילדדדד", I have to match only the whole words.
它运作良好,但它也发现像“ילדדדד”这样的词,我只能匹配整个单词。
I tried also this code:
我也试过这段代码:
new RegExp('\\b(יל|ילד)\\b','g');
but this regular expression doesn't find any word!
但是这个正则表达式找不到任何单词!
How should I build my RegExp?
我该如何建立我的RegExp?
2 个解决方案
#1
1
The word boundary \b
is not Unicode aware. Use XRegExp
to build a Unicode word boundary:
边界\ b这个词不是Unicode识别的。使用XRegExp构建Unicode字边界:
var text = 'ילד ילדדד יל';
var matchWords = ['יל','ילד'];
re = XRegExp('(^|[^_0-9\\pL])(' + matchWords.join('|') + ')(?![_0-9\\pL])','ig');
text = XRegExp.replace(text.replace(/\n$/g, '\n\n'), re, '$1<mark>$2</mark>');
console.log(text);
<script src="http://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Here, (^|[^_0-9\\pL])
is a capturing group with ID=1 that matches either the string start or any char other than a Unicode letter, ASCII digit or _
(a leading word boundary) and (?![_0-9\\pL])
fails the match if the word is followed with _
, ASCII digit or a Unicode letter.
这里,(^ | [^ _ 0-9 \\ pL])是ID = 1的捕获组,它匹配字符串start或除Unicode字母,ASCII数字或_(前导词边界)之外的任何字符和( ?![_ 0-9 \\ pL])如果单词后跟_,ASCII数字或Unicode字母,则匹配失败。
#2
1
//Words to join
var words = ['apes', 'cats', 'bazooka'];
//String to search
var str = 'it\'s good that cats and dogs dont wear bazookas';
//End at start of line, end of line or whitespace
var end = '(^|$|\\s)';
//Regular expression string
var regex = end + "(" + words.join('|') + ")" + end;
//Build RegExp
var re = new RegExp(regex, "gi");
//Log results
console.log(str.match(re));
Or as function
或作为功能
var findWholeWordInString = (function() {
//End at start of line, end of line or whitespace
var end = '(^|$|\\s)';
//The actual function
return function(str, words) {
//Regular expression string
var regex = end + "(" + words.join('|') + ")" + end;
//Build RegExp
var re = new RegExp(regex, "gi");
//Return results
return str.match(re);
};
})();
//Run test
console.log(findWholeWordInString('it\'s good that cats and dogs dont wear bazookas', ['apes', 'cats', 'bazooka']));
#1
1
The word boundary \b
is not Unicode aware. Use XRegExp
to build a Unicode word boundary:
边界\ b这个词不是Unicode识别的。使用XRegExp构建Unicode字边界:
var text = 'ילד ילדדד יל';
var matchWords = ['יל','ילד'];
re = XRegExp('(^|[^_0-9\\pL])(' + matchWords.join('|') + ')(?![_0-9\\pL])','ig');
text = XRegExp.replace(text.replace(/\n$/g, '\n\n'), re, '$1<mark>$2</mark>');
console.log(text);
<script src="http://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Here, (^|[^_0-9\\pL])
is a capturing group with ID=1 that matches either the string start or any char other than a Unicode letter, ASCII digit or _
(a leading word boundary) and (?![_0-9\\pL])
fails the match if the word is followed with _
, ASCII digit or a Unicode letter.
这里,(^ | [^ _ 0-9 \\ pL])是ID = 1的捕获组,它匹配字符串start或除Unicode字母,ASCII数字或_(前导词边界)之外的任何字符和( ?![_ 0-9 \\ pL])如果单词后跟_,ASCII数字或Unicode字母,则匹配失败。
#2
1
//Words to join
var words = ['apes', 'cats', 'bazooka'];
//String to search
var str = 'it\'s good that cats and dogs dont wear bazookas';
//End at start of line, end of line or whitespace
var end = '(^|$|\\s)';
//Regular expression string
var regex = end + "(" + words.join('|') + ")" + end;
//Build RegExp
var re = new RegExp(regex, "gi");
//Log results
console.log(str.match(re));
Or as function
或作为功能
var findWholeWordInString = (function() {
//End at start of line, end of line or whitespace
var end = '(^|$|\\s)';
//The actual function
return function(str, words) {
//Regular expression string
var regex = end + "(" + words.join('|') + ")" + end;
//Build RegExp
var re = new RegExp(regex, "gi");
//Return results
return str.match(re);
};
})();
//Run test
console.log(findWholeWordInString('it\'s good that cats and dogs dont wear bazookas', ['apes', 'cats', 'bazooka']));