用Javascript正则表达式匹配重音字符

时间:2021-07-19 22:53:07

Here's a fun snippet I ran into today:

下面是我今天遇到的一个有趣的片段:

/\ba/.test("a") --> true
/\bà/.test("à") --> false

However,

然而,

/à/.test("à") --> true

Firstly, wtf?

首先,wtf ?

Secondly, if I want to match an accented character at the start of a word, how can I do that? (I'd really like to avoid using over-the-top selectors like /(?:^|\s|'|\(\) ....)

其次,如果我想在一个单词的开头匹配一个重读字符,我该怎么做呢?(我真的想避免使用过多的选择器/(?:^ | \年代| | \ \ ....)

3 个解决方案

#1


57  

This worked for me:

这工作对我来说:

/^[a-z\u00E0-\u00FC]+$/i

With help from here

在这里的帮助下

#2


38  

The reason why /\bà/.test("à") doesn't match is because "à" is not a word character. The escape sequence \b matches only between a boundary of word character and a non word character. /\ba/.test("a") matches because "a" is a word character. Because of that, there is a boundary between the beginning of the string (which is not a word character) and the letter "a" which is a word character.

为什么/\ ba /.test(“a”)不匹配是因为“a”不是一个单词字符。转义序列\b只匹配字字符边界和非字字符边界。因为“a”是一个单词字符。正因为如此,在字符串的开头(不是一个单词字符)和字母a(一个单词字符)之间有一个边界。

Word characters in JavaScript's regex is defined as [a-zA-Z0-9_].

JavaScript regex中的字字符定义为[a-zA-Z0-9_]。

To match an accented character at the start of a string, just use the ^ character at the beginning of the regex (e.g. /^à/). That character means the beginning of the string (unlike \b which matches at any word boundary within the string). It's most basic and standard regular expression, so it's definitely not over the top.

匹配一个重音字符在字符串的开始,只使用regex ^字符开头的(例如/ ^ /)。该字符表示字符串的开头(不像\b那样匹配字符串中的任何单词边界)。它是最基本、最标准的正则表达式,所以它绝对不会超出上限。

#3


2  

Stack Overflow had also an issue with non ASCII characters in regex, you can find it here. They are not coping with word boundaries, but maybe gives you anyway useful hints.

Stack Overflow在regex中也有一个非ASCII字符的问题,您可以在这里找到它。他们没有处理词语的界限,但可能会给你有用的提示。

There is another page, but he wants to match strings and not words.

有另一页,但他想要匹配字符串而不是单词。

I don't know, and did not find now, an anchor for your problem, but when I see what monster regexes in my first link are used, your group, that you want to avoid, is not over the top and to my opinion your solution.

我不知道,现在也没有找到解决你问题的方法,但是当我看到我的第一个链接中使用了什么怪物regexes时,你想要避免的组,并没有超出我的观点,你的解决方案。

#1


57  

This worked for me:

这工作对我来说:

/^[a-z\u00E0-\u00FC]+$/i

With help from here

在这里的帮助下

#2


38  

The reason why /\bà/.test("à") doesn't match is because "à" is not a word character. The escape sequence \b matches only between a boundary of word character and a non word character. /\ba/.test("a") matches because "a" is a word character. Because of that, there is a boundary between the beginning of the string (which is not a word character) and the letter "a" which is a word character.

为什么/\ ba /.test(“a”)不匹配是因为“a”不是一个单词字符。转义序列\b只匹配字字符边界和非字字符边界。因为“a”是一个单词字符。正因为如此,在字符串的开头(不是一个单词字符)和字母a(一个单词字符)之间有一个边界。

Word characters in JavaScript's regex is defined as [a-zA-Z0-9_].

JavaScript regex中的字字符定义为[a-zA-Z0-9_]。

To match an accented character at the start of a string, just use the ^ character at the beginning of the regex (e.g. /^à/). That character means the beginning of the string (unlike \b which matches at any word boundary within the string). It's most basic and standard regular expression, so it's definitely not over the top.

匹配一个重音字符在字符串的开始,只使用regex ^字符开头的(例如/ ^ /)。该字符表示字符串的开头(不像\b那样匹配字符串中的任何单词边界)。它是最基本、最标准的正则表达式,所以它绝对不会超出上限。

#3


2  

Stack Overflow had also an issue with non ASCII characters in regex, you can find it here. They are not coping with word boundaries, but maybe gives you anyway useful hints.

Stack Overflow在regex中也有一个非ASCII字符的问题,您可以在这里找到它。他们没有处理词语的界限,但可能会给你有用的提示。

There is another page, but he wants to match strings and not words.

有另一页,但他想要匹配字符串而不是单词。

I don't know, and did not find now, an anchor for your problem, but when I see what monster regexes in my first link are used, your group, that you want to avoid, is not over the top and to my opinion your solution.

我不知道,现在也没有找到解决你问题的方法,但是当我看到我的第一个链接中使用了什么怪物regexes时,你想要避免的组,并没有超出我的观点,你的解决方案。