如何在php中使用preg_split() ?

时间:2021-02-20 22:09:24

Can anybody explain to me how to use preg_split() function? I didn't understand the pattern parameter like this "/[\s,]+/".

有人能给我解释一下如何使用preg_split()函数吗?我不理解像这样的模式参数“/[\s,]+/”。

for example:

例如:

I have this subject: is is. and I want the results to be:

我有一个主题:is is。我希望结果是:

array (
  0 => 'is',
  1 => 'is',
)

so it will ignore the space and the full-stop, how I can do that?

所以它会忽略空间和全站,我怎么能做到呢?

4 个解决方案

#1


27  

preg means Pcre REGexp", which is kind of redundant, since the "PCRE" means "Perl Compatible Regexp".

preg的意思是Pcre REGexp,这有点多余,因为“Pcre”的意思是“Perl兼容的REGexp”。

Regexps are a nightmare to the beginner. I still don’t fully understand them and I’ve been working with them for years.

regexp对于初学者来说是一场噩梦。我仍然不能完全理解他们,而且我已经和他们共事多年了。

Basically the example you have there, broken down is:

基本上你在那里的例子是:

"/[\s,]+/"

/ = start or end of pattern string
[ ... ] = grouping of characters
+ = one or more of the preceeding character or group
\s = Any whitespace character (space, tab).
, = the literal comma character

So you have a search pattern that is "split on any part of the string that is at least one whitespace character and/or one or more commas".

因此,您有一个搜索模式,它“在字符串的任意部分上分割,至少一个空格字符和/或一个或多个逗号”。

Other common characters are:

其他常见的字符是:

. = any single character
* = any number of the preceeding character or group
^ (at start of pattern) = The start of the string
$ (at end of pattern) = The end of the string
^ (inside [...]) = "NOT" the following character

For PHP there is good information in the official documentation.

对于PHP,官方文档中有很好的信息。

#2


6  

This should work:

这应该工作:

$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY);

echo '<pre>';
print_r($words);
echo '</pre>';

The output would be:

的输出是:

Array
(
    [0] => is
    [1] => is
)

Before I explain the regex, just an explanation on PREG_SPLIT_NO_EMPTY. That basically means only return the results of preg_split if the results are not empty. This assures you the data returned in the array $words truly has data in it & not just empty values which can happen when dealing with regex patterns & mixed data sources.

在我解释regex之前,只需要解释PREG_SPLIT_NO_EMPTY。这基本上意味着如果结果不为空,则只返回preg_split的结果。这将向您保证在数组$words中返回的数据确实具有数据,而不仅仅是在处理regex模式和混合数据源时可能发生的空值。

And the explanation of that regex can be broken down like this using this tool:

对于regex的解释可以用这个工具来分解:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    \w                       word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  [!?.]*                   any character of: '!', '?', '.' (0 or more
                           times (matching the most amount possible))

An nicer explanation can be found by entering the full regex pattern of /(?<=\w)\b\s*[!?.]*/ in this other other tool:

一个更好的解释可以通过输入/(?<=\w)\b\s*[!?*/在此其他工具中:

  • (?<=\w) Positive Lookbehind - Assert that the regex below can be matched
  • (?<=\w)正面的向后看-断言下面的regex可以匹配。
  • \w match any word character [a-zA-Z0-9_]
  • 匹配任何文字字符[a-zA-Z0-9_]
  • \b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
  • 字边界\ b断言位置(^ \ w w $ | | \ \ w \ w | \ w \ w)
  • \s* match any white space character [\r\n\t\f ]
  • \s*匹配任何空格字符[\ \ \ \ \ \ \ \
  • Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
  • 量词:在零和无限次之间,尽可能多地给予(贪婪)
  • !?. a single character in the list !?. literally
  • ! ?。名单上的一个人!?字面上的

That last regex explanation can be boiled down by a human—also known as me—as the following:

最后的regex解释可以被一个人——也称为me——归结为:

Match—and split—any word character that comes before a word boundary that can have multiple spaces and the punctuation marks of !?..

在一个单词边界之前出现的任何单词字符,可以有多个空格和标点符号!

#3


1  

PHP's str_word_count may be a better choice here.

PHP的str_word_count可能是更好的选择。

str_word_count($string, 2) will output an array of all words in the string, including duplicates.

str_word_count($string, 2)将输出字符串中所有单词的数组,包括重复。

#4


1  

Documentation says:

文档中说:

The preg_split() function operates exactly like split(), except that regular expressions are accepted as input parameters for pattern.

preg_split()函数的操作与split()完全相同,只不过正则表达式被接受为模式的输入参数。

So, the following code...

所以,下面的代码…

<?php

$ip = "123 ,456 ,789 ,000"; 
$iparr = preg_split ("/[\s,]+/", $ip); 
print "$iparr[0] <br />";
print "$iparr[1] <br />" ;
print "$iparr[2] <br />"  ;
print "$iparr[3] <br />"  ;

?>

This will produce following result.

这将产生以下结果。

123
456
789
000 

So, if have this subject: is is and you want: array ( 0 => 'is', 1 => 'is', )

因此,如果有这个主题:is是,你想要:数组(0 => 'is', 1 => 'is',)

you need to modify your regex to "/[\s]+/"

你需要修改你的正则表达式来“/[\s]+/”

Unless you have is ,is you need the regex you already have "/[\s,]+/"

除非你有,否则你是否需要你已经拥有的正则表达式。

#1


27  

preg means Pcre REGexp", which is kind of redundant, since the "PCRE" means "Perl Compatible Regexp".

preg的意思是Pcre REGexp,这有点多余,因为“Pcre”的意思是“Perl兼容的REGexp”。

Regexps are a nightmare to the beginner. I still don’t fully understand them and I’ve been working with them for years.

regexp对于初学者来说是一场噩梦。我仍然不能完全理解他们,而且我已经和他们共事多年了。

Basically the example you have there, broken down is:

基本上你在那里的例子是:

"/[\s,]+/"

/ = start or end of pattern string
[ ... ] = grouping of characters
+ = one or more of the preceeding character or group
\s = Any whitespace character (space, tab).
, = the literal comma character

So you have a search pattern that is "split on any part of the string that is at least one whitespace character and/or one or more commas".

因此,您有一个搜索模式,它“在字符串的任意部分上分割,至少一个空格字符和/或一个或多个逗号”。

Other common characters are:

其他常见的字符是:

. = any single character
* = any number of the preceeding character or group
^ (at start of pattern) = The start of the string
$ (at end of pattern) = The end of the string
^ (inside [...]) = "NOT" the following character

For PHP there is good information in the official documentation.

对于PHP,官方文档中有很好的信息。

#2


6  

This should work:

这应该工作:

$words = preg_split("/(?<=\w)\b\s*[!?.]*/", 'is is.', -1, PREG_SPLIT_NO_EMPTY);

echo '<pre>';
print_r($words);
echo '</pre>';

The output would be:

的输出是:

Array
(
    [0] => is
    [1] => is
)

Before I explain the regex, just an explanation on PREG_SPLIT_NO_EMPTY. That basically means only return the results of preg_split if the results are not empty. This assures you the data returned in the array $words truly has data in it & not just empty values which can happen when dealing with regex patterns & mixed data sources.

在我解释regex之前,只需要解释PREG_SPLIT_NO_EMPTY。这基本上意味着如果结果不为空,则只返回preg_split的结果。这将向您保证在数组$words中返回的数据确实具有数据,而不仅仅是在处理regex模式和混合数据源时可能发生的空值。

And the explanation of that regex can be broken down like this using this tool:

对于regex的解释可以用这个工具来分解:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    \w                       word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  [!?.]*                   any character of: '!', '?', '.' (0 or more
                           times (matching the most amount possible))

An nicer explanation can be found by entering the full regex pattern of /(?<=\w)\b\s*[!?.]*/ in this other other tool:

一个更好的解释可以通过输入/(?<=\w)\b\s*[!?*/在此其他工具中:

  • (?<=\w) Positive Lookbehind - Assert that the regex below can be matched
  • (?<=\w)正面的向后看-断言下面的regex可以匹配。
  • \w match any word character [a-zA-Z0-9_]
  • 匹配任何文字字符[a-zA-Z0-9_]
  • \b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
  • 字边界\ b断言位置(^ \ w w $ | | \ \ w \ w | \ w \ w)
  • \s* match any white space character [\r\n\t\f ]
  • \s*匹配任何空格字符[\ \ \ \ \ \ \ \
  • Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
  • 量词:在零和无限次之间,尽可能多地给予(贪婪)
  • !?. a single character in the list !?. literally
  • ! ?。名单上的一个人!?字面上的

That last regex explanation can be boiled down by a human—also known as me—as the following:

最后的regex解释可以被一个人——也称为me——归结为:

Match—and split—any word character that comes before a word boundary that can have multiple spaces and the punctuation marks of !?..

在一个单词边界之前出现的任何单词字符,可以有多个空格和标点符号!

#3


1  

PHP's str_word_count may be a better choice here.

PHP的str_word_count可能是更好的选择。

str_word_count($string, 2) will output an array of all words in the string, including duplicates.

str_word_count($string, 2)将输出字符串中所有单词的数组,包括重复。

#4


1  

Documentation says:

文档中说:

The preg_split() function operates exactly like split(), except that regular expressions are accepted as input parameters for pattern.

preg_split()函数的操作与split()完全相同,只不过正则表达式被接受为模式的输入参数。

So, the following code...

所以,下面的代码…

<?php

$ip = "123 ,456 ,789 ,000"; 
$iparr = preg_split ("/[\s,]+/", $ip); 
print "$iparr[0] <br />";
print "$iparr[1] <br />" ;
print "$iparr[2] <br />"  ;
print "$iparr[3] <br />"  ;

?>

This will produce following result.

这将产生以下结果。

123
456
789
000 

So, if have this subject: is is and you want: array ( 0 => 'is', 1 => 'is', )

因此,如果有这个主题:is是,你想要:数组(0 => 'is', 1 => 'is',)

you need to modify your regex to "/[\s]+/"

你需要修改你的正则表达式来“/[\s]+/”

Unless you have is ,is you need the regex you already have "/[\s,]+/"

除非你有,否则你是否需要你已经拥有的正则表达式。