以简单的方式包含重音字符有什么好的正则表达式？

Right now my regex is something like this:

现在我的正则表达式是这样的:

[a-zA-Z0-9] but it does not include accented characters like I would want to. I would also like - ' , to be included.

[a-zA-Z0-9]但它不包括我想要的重音字符。我也想 - ',包括在内。

3 个解决方案

#1

Accented Characters: DIY Character Range Subtraction

重音字符:DIY字符范围减法

If your regex engine allows it (and many will), this will work:

如果你的正则表达式引擎允许它(很多人会),这将有效:

(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])+$

Please see the demo (you can add characters to test).

请参阅演示(您可以添加要测试的字符)。

Explanation

(?i) sets case-insensitive mode

(?i)设置不区分大小写的模式

The ^ anchor asserts that we are at the beginning of the string

^ anchor断言我们在字符串的开头

(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ]) matches one character...

(?:(?![×Þß÷þø])[ - '0-9a-zÀ-ÿ])匹配一个字符......

The lookahead (?![×Þß÷þø]) asserts that the char is not one of those in the brackets

前瞻(?![×Þß÷þø])断言char不是括号中的一个

[-'0-9a-zÀ-ÿ] allows dash, apostrophe, digits, letters, and chars in a wide accented range, from which we need to subtract

[-'0-9a-zÀ-ÿ]允许在宽重音范围内使用短划线,撇号,数字,字母和字符,我们需要从中减去

The + matches that one or more times

+匹配一次或多次

The $ anchor asserts that we are at the end of the string

$ anchor断言我们在字符串的末尾

Reference

Extended ASCII Table

扩展ASCII表

#2

Use a POSIX character class (http://www.regular-expressions.info/posixbrackets.html):

使用POSIX字符类(http://www.regular-expressions.info/posixbrackets.html):

[-'[:alpha:]0-9] or [-'[:alnum:]]

[ - '[:alpha:] 0-9]或[ - '[:alnum:]]

The [:alpha:] character class matches whatever is considered "alphabetic characters" in your locale.

[:alpha:]字符类匹配您的语言环境中被视为“字母字符”的内容。

#3

A version without the exclusion rules:

没有排除规则的版本:

^[-'a-zA-ZÀ-ÖØ-öø-ÿ]+$

Explanation

The ^ anchor asserts that we are at the beginning of the string

^ anchor断言我们在字符串的开头

[...] allows dash, apostrophe, digits, letters, and chars in a wide accented range,

[...]允许在宽重音范围内使用短划线,撇号,数字,字母和字符,

The + matches that one or more times

+匹配一次或多次

The $ anchor asserts that we are at the end of the string

$ anchor断言我们在字符串的末尾

Reference

Extended ASCII Table

扩展ASCII表

#1

Accented Characters: DIY Character Range Subtraction

重音字符:DIY字符范围减法

If your regex engine allows it (and many will), this will work:

如果你的正则表达式引擎允许它(很多人会),这将有效:

(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])+$

Please see the demo (you can add characters to test).

请参阅演示(您可以添加要测试的字符)。

Explanation

(?i) sets case-insensitive mode

(?i)设置不区分大小写的模式

The ^ anchor asserts that we are at the beginning of the string

^ anchor断言我们在字符串的开头

(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ]) matches one character...

(?:(?![×Þß÷þø])[ - '0-9a-zÀ-ÿ])匹配一个字符......

The lookahead (?![×Þß÷þø]) asserts that the char is not one of those in the brackets

前瞻(?![×Þß÷þø])断言char不是括号中的一个

[-'0-9a-zÀ-ÿ] allows dash, apostrophe, digits, letters, and chars in a wide accented range, from which we need to subtract

[-'0-9a-zÀ-ÿ]允许在宽重音范围内使用短划线,撇号,数字,字母和字符,我们需要从中减去

The + matches that one or more times

+匹配一次或多次

The $ anchor asserts that we are at the end of the string

$ anchor断言我们在字符串的末尾

Reference

Extended ASCII Table

扩展ASCII表

#2

Use a POSIX character class (http://www.regular-expressions.info/posixbrackets.html):

使用POSIX字符类(http://www.regular-expressions.info/posixbrackets.html):

[-'[:alpha:]0-9] or [-'[:alnum:]]

[ - '[:alpha:] 0-9]或[ - '[:alnum:]]

The [:alpha:] character class matches whatever is considered "alphabetic characters" in your locale.

[:alpha:]字符类匹配您的语言环境中被视为“字母字符”的内容。

#3

A version without the exclusion rules:

没有排除规则的版本:

^[-'a-zA-ZÀ-ÖØ-öø-ÿ]+$

Explanation

The ^ anchor asserts that we are at the beginning of the string

^ anchor断言我们在字符串的开头

[...] allows dash, apostrophe, digits, letters, and chars in a wide accented range,

[...]允许在宽重音范围内使用短划线,撇号,数字,字母和字符,

The + matches that one or more times

+匹配一次或多次

The $ anchor asserts that we are at the end of the string

$ anchor断言我们在字符串的末尾

Reference

Extended ASCII Table

扩展ASCII表

秒客网

以简单的方式包含重音字符有什么好的正则表达式？

3 个解决方案

#1

#2

#3

#1

#2

#3

相关文章