This question already has an answer here:
这个问题在这里已有答案:
- How to ignore acute accent in a javascript regex match? 3 answers
- ignore accent in regex [duplicate] 3 answers
如何在javascript正则表达式匹配中忽略急性重音? 3个答案
在正则表达式中忽略重音[重复] 3个答案
I have to check for forbidden words in a text area when a user tries to validate. The forbidden words list is stored in the jsBlackList
array, and this is part of my code so far :
当用户尝试验证时,我必须检查文本区域中的禁用词。禁止词列表存储在jsBlackList数组中,这是我的代码到目前为止的一部分:
var fieldValue = value;
var hasForbiddenWord = false;
for (i = 0; i < jsBlackList.length; i++) {
var regex = new RegExp("\\b"+jsBlackList[i]+"\\b","gi");
fieldValue = fieldValue.replace(regex, '***');
hasForbiddenWord = hasForbiddenWord || fieldValue.match(regex);
}
value = fieldValue;
But the problem is, jsBlackList
has some accented characters, while the user could write without accent (for example, jsBlackList can have "déjà", and the user has typed "deja", "déja" or "dejà").
但问题是,jsBlackList有一些带重音的字符,而用户可以没有重音写入(例如,jsBlackList可以有“déjà”,用户输入“deja”,“déja”或“dejà”)。
How can I check for missing accents ?
如何检查缺失的重音?
NB about "Marked as duplicate" : the duplicate questions are about "regexp without accent to check text with accents", mine was "regexp with accent to check text with potential missing accents".
关于“标记为重复”的注意事项:重复的问题是关于“没有重音检查带有重音的文本的正则表达式”,我的是“带有重音的正则表达式,用于检查带有潜在缺失重音的文本”。
3 个解决方案
#1
One way to accomplish this i to change Your black list a bit:
一种方法来实现这一点我改变你的黑名单:
Replace all characters with accent by same alternation.
通过相同的交替替换所有带重音的字符。
For example: "déjà"
to: "d(é|e)j(à|a)"
例如:“déjà”到:“d(é| e)j(à| a)”
If Your blacklist is big, than probably You want to automate this replacements, but at the end it is convenient to have black list written like this.
如果您的黑名单很大,可能您希望自动执行此替换,但最后可以方便地将黑名单写成这样。
#2
You need to create a list of equivalences and in your regex OR all the equivalences:
您需要在正则表达式或所有等价项中创建等价列表:
dé|ejà|a
#3
I think your best bet is to:
我认为你最好的选择是:
- remove all accented chars in the blacklist,
- process text to replace accented chars with their non-accented equivalent
删除黑名单中的所有重音字符,
处理文本以用非重音等效替换重音字符
Then you can compare without bothering for accents.
然后你可以比较而不用打扰口音。
#1
One way to accomplish this i to change Your black list a bit:
一种方法来实现这一点我改变你的黑名单:
Replace all characters with accent by same alternation.
通过相同的交替替换所有带重音的字符。
For example: "déjà"
to: "d(é|e)j(à|a)"
例如:“déjà”到:“d(é| e)j(à| a)”
If Your blacklist is big, than probably You want to automate this replacements, but at the end it is convenient to have black list written like this.
如果您的黑名单很大,可能您希望自动执行此替换,但最后可以方便地将黑名单写成这样。
#2
You need to create a list of equivalences and in your regex OR all the equivalences:
您需要在正则表达式或所有等价项中创建等价列表:
dé|ejà|a
#3
I think your best bet is to:
我认为你最好的选择是:
- remove all accented chars in the blacklist,
- process text to replace accented chars with their non-accented equivalent
删除黑名单中的所有重音字符,
处理文本以用非重音等效替换重音字符
Then you can compare without bothering for accents.
然后你可以比较而不用打扰口音。