I am looking for a way to match all the possible special characters in a string. I have a list of cities in the world and many of the names of those cities contain special characters and accented characters. So I am looking for a regular expression that will return TRUE for any kind of special characters. All the ones I found only match some, but I need one for every possible special character out there, spaces at the begin of the string included. Is this possible?
我正在寻找一种匹配字符串中所有可能的特殊字符的方法。我有一个世界上的城市列表,这些城市的许多名称都包含特殊字符和重音字符。所以我正在寻找一个正则表达式,它将为任何类型的特殊字符返回TRUE。我找到的所有那些只匹配一些,但我需要一个每个可能的特殊字符,包括字符串开头的空格。这可能吗?
This is the one I found, but does not match all the different and possible characters I may encounter in the name of a city:
这是我找到的那个,但不符合我在城市名称中遇到的所有不同和可能的角色:
preg_match('/[#$%^&*()+=\-\[\]\';,.\/{}|":<>?~\\\\]/', $string);
4 个解决方案
#1
1
You're going to need the UTF8 mode "#pattern#u": http://nl3.php.net/manual/en/reference.pcre.pattern.modifiers.php
您将需要UTF8模式“#pattern #u”:http://nl3.php.net/manual/en/reference.pcre.pattern.modifiers.php
Then you can use the Unicode escape sequences: http://nl3.php.net/manual/en/regexp.reference.unicode.php
然后您可以使用Unicode转义序列:http://nl3.php.net/manual/en/regexp.reference.unicode.php
So that preg_match("#\p{L}*#u", "København", $match) will match.
所以preg_match(“#\ p {L} * #u”,“København”,$ match)将匹配。
#2
0
Use unicode properties:
使用unicode属性:
\pL
stands for any letter
\ pL代表任何字母
To match a city names, i'd do (I suppose -
and space are valid characters) :
要匹配城市名称,我会这样做(我想 - 并且空格是有效字符):
preg_match('/\s*[\pL-\s]/u', $string);
#3
0
You can just reverse your pattern... to match everything what is not "a-Z09-_" you would use
你可以改变你的模式......以匹配你不会使用的所有“a-Z09-_”
preg_match('/[^-_a-z0-9.]/iu', $string);
The ^ in the character class reverses it.
字符类中的^反转它。
#4
0
I had the same problem where I wanted to split nameparts which also contained special characters:
我有同样的问题,我想分割也包含特殊字符的名称部分:
For example if you want to split a bunch of names containing:
例如,如果要拆分包含以下内容的一组名称:
<lastname>,<forename(s)> <initial(s)> <suffix(es)>
fornames and suffix are separated with (white)space(s)
initials are separated with a . and with maximum of 6 initials
fornames和suffix用(white)空格分隔,首字母用a分隔。最多6个首字母
you could use
你可以用
$nameparts=preg_split("/(\w*),((?:\w+[\s\-]*)*)((?:\w\.){1,6})(?:\s*)(.*)/u",$displayname,null,PREG_SPLIT_DELIM_CAPTURE);
//first and last part are always empty
array_splice($naamdelen, 5, 1);
array_splice($naamdelen, 0, 1);
print_r($nameparts);
Input:Powers,Björn B.A. van der
Output:Array ( [0] => Powers[1] => Björn [2] => B.A. [3] => van der)
输入:权力,BjörnB。范德输出:数组([0] =>权力[1] =>Björn[2] => B.A. [3] =>范德)
Tip: the regular expression looks like from outer space but regex101.com to the rescue!
提示:正则表达式看起来像从外太空但regex101.com到救援!
#1
1
You're going to need the UTF8 mode "#pattern#u": http://nl3.php.net/manual/en/reference.pcre.pattern.modifiers.php
您将需要UTF8模式“#pattern #u”:http://nl3.php.net/manual/en/reference.pcre.pattern.modifiers.php
Then you can use the Unicode escape sequences: http://nl3.php.net/manual/en/regexp.reference.unicode.php
然后您可以使用Unicode转义序列:http://nl3.php.net/manual/en/regexp.reference.unicode.php
So that preg_match("#\p{L}*#u", "København", $match) will match.
所以preg_match(“#\ p {L} * #u”,“København”,$ match)将匹配。
#2
0
Use unicode properties:
使用unicode属性:
\pL
stands for any letter
\ pL代表任何字母
To match a city names, i'd do (I suppose -
and space are valid characters) :
要匹配城市名称,我会这样做(我想 - 并且空格是有效字符):
preg_match('/\s*[\pL-\s]/u', $string);
#3
0
You can just reverse your pattern... to match everything what is not "a-Z09-_" you would use
你可以改变你的模式......以匹配你不会使用的所有“a-Z09-_”
preg_match('/[^-_a-z0-9.]/iu', $string);
The ^ in the character class reverses it.
字符类中的^反转它。
#4
0
I had the same problem where I wanted to split nameparts which also contained special characters:
我有同样的问题,我想分割也包含特殊字符的名称部分:
For example if you want to split a bunch of names containing:
例如,如果要拆分包含以下内容的一组名称:
<lastname>,<forename(s)> <initial(s)> <suffix(es)>
fornames and suffix are separated with (white)space(s)
initials are separated with a . and with maximum of 6 initials
fornames和suffix用(white)空格分隔,首字母用a分隔。最多6个首字母
you could use
你可以用
$nameparts=preg_split("/(\w*),((?:\w+[\s\-]*)*)((?:\w\.){1,6})(?:\s*)(.*)/u",$displayname,null,PREG_SPLIT_DELIM_CAPTURE);
//first and last part are always empty
array_splice($naamdelen, 5, 1);
array_splice($naamdelen, 0, 1);
print_r($nameparts);
Input:Powers,Björn B.A. van der
Output:Array ( [0] => Powers[1] => Björn [2] => B.A. [3] => van der)
输入:权力,BjörnB。范德输出:数组([0] =>权力[1] =>Björn[2] => B.A. [3] =>范德)
Tip: the regular expression looks like from outer space but regex101.com to the rescue!
提示:正则表达式看起来像从外太空但regex101.com到救援!