Can anybody explain to me how to use preg_split() function? I didn't understand the pattern parameter like this "/[\s,]+/"
.
有人能给我解释一下如何使用preg_split()函数吗?我不理解像这样的模式参数“/[\s,]+/”。
for example:
例如:
I have this subject: is is.
and I want the results to be:
我有一个主题:is is。我希望结果是:
array (
0 => 'is',
1 => 'is',
)
so it will ignore the space and the full-stop, how I can do that?
所以它会忽略空间和全站,我怎么能做到呢?
4 个解决方案
#1
27
preg
means Pcre REGexp", which is kind of redundant, since the "PCRE" means "Perl Compatible Regexp".
preg的意思是Pcre REGexp,这有点多余,因为“Pcre”的意思是“Perl兼容的REGexp”。
Regexps are a nightmare to the beginner. I still don’t fully understand them and I’ve been working with them for years.
regexp对于初学者来说是一场噩梦。我仍然不能完全理解他们,而且我已经和他们共事多年了。
Basically the example you have there, broken down is:
基本上你在那里的例子是:
"/[\s,]+/"
/ = start or end of pattern string
[ ... ] = grouping of characters
+ = one or more of the preceeding character or group
\s = Any whitespace character (space, tab).
, = the literal comma character
So you have a search pattern that is "split on any part of the string that is at least one whitespace character and/or one or more commas".
因此,您有一个搜索模式,它“在字符串的任意部分上分割,至少一个空格字符和/或一个或多个逗号”。
Other common characters are:
其他常见的字符是:
. = any single character
* = any number of the preceeding character or group
^ (at start of pattern) = The start of the string
$ (at end of pattern) = The end of the string
^ (inside [...]) = "NOT" the following character
For PHP there is good information in the official documentation.
对于PHP,官方文档中有很好的信息。
#2
6
This should work:
这应该工作:
$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY);
echo '<pre>';
print_r($words);
echo '</pre>';
The output would be:
的输出是:
Array
(
[0] => is
[1] => is
)
Before I explain the regex, just an explanation on PREG_SPLIT_NO_EMPTY
. That basically means only return the results of preg_split
if the results are not empty. This assures you the data returned in the array $words
truly has data in it & not just empty values which can happen when dealing with regex patterns & mixed data sources.
在我解释regex之前,只需要解释PREG_SPLIT_NO_EMPTY。这基本上意味着如果结果不为空,则只返回preg_split的结果。这将向您保证在数组$words中返回的数据确实具有数据,而不仅仅是在处理regex模式和混合数据源时可能发生的空值。
And the explanation of that regex can be broken down like this using this tool:
对于regex的解释可以用这个工具来分解:
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[!?.]* any character of: '!', '?', '.' (0 or more
times (matching the most amount possible))
An nicer explanation can be found by entering the full regex pattern of /(?<=\w)\b\s*[!?.]*/
in this other other tool:
一个更好的解释可以通过输入/(?<=\w)\b\s*[!?*/在此其他工具中:
-
(?<=\w)
Positive Lookbehind - Assert that the regex below can be matched - (?<=\w)正面的向后看-断言下面的regex可以匹配。
-
\w
match any word character[a-zA-Z0-9_]
- 匹配任何文字字符[a-zA-Z0-9_]
-
\b
assert position at a word boundary(^\w|\w$|\W\w|\w\W)
- 字边界\ b断言位置(^ \ w w $ | | \ \ w \ w | \ w \ w)
-
\s*
match any white space character[\r\n\t\f ]
- \s*匹配任何空格字符[\ \ \ \ \ \ \ \
- Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
- 量词:在零和无限次之间,尽可能多地给予(贪婪)
-
!?.
a single character in the list!?.
literally - ! ?。名单上的一个人!?字面上的
That last regex explanation can be boiled down by a human—also known as me—as the following:
最后的regex解释可以被一个人——也称为me——归结为:
Match—and split—any word character that comes before a word boundary that can have multiple spaces and the punctuation marks of !?.
.
在一个单词边界之前出现的任何单词字符,可以有多个空格和标点符号!
#3
1
PHP's str_word_count
may be a better choice here.
PHP的str_word_count可能是更好的选择。
str_word_count($string, 2)
will output an array of all words in the string, including duplicates.
str_word_count($string, 2)将输出字符串中所有单词的数组,包括重复。
#4
1
Documentation says:
文档中说:
The preg_split() function operates exactly like split(), except that regular expressions are accepted as input parameters for pattern.
preg_split()函数的操作与split()完全相同,只不过正则表达式被接受为模式的输入参数。
So, the following code...
所以,下面的代码…
<?php
$ip = "123 ,456 ,789 ,000";
$iparr = preg_split ("/[\s,]+/", $ip);
print "$iparr[0] <br />";
print "$iparr[1] <br />" ;
print "$iparr[2] <br />" ;
print "$iparr[3] <br />" ;
?>
This will produce following result.
这将产生以下结果。
123
456
789
000
So, if have this subject: is is
and you want: array ( 0 => 'is', 1 => 'is', )
因此,如果有这个主题:is是,你想要:数组(0 => 'is', 1 => 'is',)
you need to modify your regex to "/[\s]+/"
你需要修改你的正则表达式来“/[\s]+/”
Unless you have is ,is
you need the regex you already have "/[\s,]+/"
除非你有,否则你是否需要你已经拥有的正则表达式。
#1
27
preg
means Pcre REGexp", which is kind of redundant, since the "PCRE" means "Perl Compatible Regexp".
preg的意思是Pcre REGexp,这有点多余,因为“Pcre”的意思是“Perl兼容的REGexp”。
Regexps are a nightmare to the beginner. I still don’t fully understand them and I’ve been working with them for years.
regexp对于初学者来说是一场噩梦。我仍然不能完全理解他们,而且我已经和他们共事多年了。
Basically the example you have there, broken down is:
基本上你在那里的例子是:
"/[\s,]+/"
/ = start or end of pattern string
[ ... ] = grouping of characters
+ = one or more of the preceeding character or group
\s = Any whitespace character (space, tab).
, = the literal comma character
So you have a search pattern that is "split on any part of the string that is at least one whitespace character and/or one or more commas".
因此,您有一个搜索模式,它“在字符串的任意部分上分割,至少一个空格字符和/或一个或多个逗号”。
Other common characters are:
其他常见的字符是:
. = any single character
* = any number of the preceeding character or group
^ (at start of pattern) = The start of the string
$ (at end of pattern) = The end of the string
^ (inside [...]) = "NOT" the following character
For PHP there is good information in the official documentation.
对于PHP,官方文档中有很好的信息。
#2
6
This should work:
这应该工作:
$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY);
echo '<pre>';
print_r($words);
echo '</pre>';
The output would be:
的输出是:
Array
(
[0] => is
[1] => is
)
Before I explain the regex, just an explanation on PREG_SPLIT_NO_EMPTY
. That basically means only return the results of preg_split
if the results are not empty. This assures you the data returned in the array $words
truly has data in it & not just empty values which can happen when dealing with regex patterns & mixed data sources.
在我解释regex之前,只需要解释PREG_SPLIT_NO_EMPTY。这基本上意味着如果结果不为空,则只返回preg_split的结果。这将向您保证在数组$words中返回的数据确实具有数据,而不仅仅是在处理regex模式和混合数据源时可能发生的空值。
And the explanation of that regex can be broken down like this using this tool:
对于regex的解释可以用这个工具来分解:
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
[!?.]* any character of: '!', '?', '.' (0 or more
times (matching the most amount possible))
An nicer explanation can be found by entering the full regex pattern of /(?<=\w)\b\s*[!?.]*/
in this other other tool:
一个更好的解释可以通过输入/(?<=\w)\b\s*[!?*/在此其他工具中:
-
(?<=\w)
Positive Lookbehind - Assert that the regex below can be matched - (?<=\w)正面的向后看-断言下面的regex可以匹配。
-
\w
match any word character[a-zA-Z0-9_]
- 匹配任何文字字符[a-zA-Z0-9_]
-
\b
assert position at a word boundary(^\w|\w$|\W\w|\w\W)
- 字边界\ b断言位置(^ \ w w $ | | \ \ w \ w | \ w \ w)
-
\s*
match any white space character[\r\n\t\f ]
- \s*匹配任何空格字符[\ \ \ \ \ \ \ \
- Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
- 量词:在零和无限次之间,尽可能多地给予(贪婪)
-
!?.
a single character in the list!?.
literally - ! ?。名单上的一个人!?字面上的
That last regex explanation can be boiled down by a human—also known as me—as the following:
最后的regex解释可以被一个人——也称为me——归结为:
Match—and split—any word character that comes before a word boundary that can have multiple spaces and the punctuation marks of !?.
.
在一个单词边界之前出现的任何单词字符,可以有多个空格和标点符号!
#3
1
PHP's str_word_count
may be a better choice here.
PHP的str_word_count可能是更好的选择。
str_word_count($string, 2)
will output an array of all words in the string, including duplicates.
str_word_count($string, 2)将输出字符串中所有单词的数组,包括重复。
#4
1
Documentation says:
文档中说:
The preg_split() function operates exactly like split(), except that regular expressions are accepted as input parameters for pattern.
preg_split()函数的操作与split()完全相同,只不过正则表达式被接受为模式的输入参数。
So, the following code...
所以,下面的代码…
<?php
$ip = "123 ,456 ,789 ,000";
$iparr = preg_split ("/[\s,]+/", $ip);
print "$iparr[0] <br />";
print "$iparr[1] <br />" ;
print "$iparr[2] <br />" ;
print "$iparr[3] <br />" ;
?>
This will produce following result.
这将产生以下结果。
123
456
789
000
So, if have this subject: is is
and you want: array ( 0 => 'is', 1 => 'is', )
因此,如果有这个主题:is是,你想要:数组(0 => 'is', 1 => 'is',)
you need to modify your regex to "/[\s]+/"
你需要修改你的正则表达式来“/[\s]+/”
Unless you have is ,is
you need the regex you already have "/[\s,]+/"
除非你有,否则你是否需要你已经拥有的正则表达式。