PHP:如何用破折号拆分字符串以及括号之间的所有内容。 (preg_split或preg_match)

时间:2020-12-31 22:07:53

I've been wrapping my head around this for days now, but nothing seems to give the desired result.

我已经在这周围绕着我好几天了,但似乎没有什么能给出理想的结果。

Example:

$var = "Some Words - Other Words (More Words) Dash-Binded-Word";

Desired result:

array(
[0] => Some Words
[1] => Other Words
[2] => More Words
[3] => Dash-Bound-Word
)

I was able to get this all working using preg_match_all, but then the "Dash-Bound-Word" was broken up as well. Trying to match it with surrounding spaces didn't work as it would break all the words except the dash bound ones.

我能够使用preg_match_all完成所有工作,但随后“Dash-Bound-Word”也被分解了。试图将它与周围的空间匹配不起作用,因为它会破坏除了破折号之外的所有单词。

The preg_match_all statement I used (which broke up the dash bound words too) is this:

我使用的preg_match_all语句(它也破坏了破折号的单词)是这样的:

preg_match_all('#\(.*?\)|\[.*?\]|[^?!\-|\(|\[]+#', $var, $array);

I'm certainly no expert on preg_match, preg_split so any help here would be greatly appreciated.

我当然不是preg_match,preg_split的专家,所以任何帮助都会非常感激。

4 个解决方案

#1


4  

You can use a simple preg_match_all:

你可以使用一个简单的preg_match_all:

\w+(?:[- ]\w+)*

See demo

  • \w+ - 1 or more alphanumeric or underscore
  • \ w + - 1个或更多字母数字或下划线

  • (?:[- ]\w+)* - 0 or more sequences of...
    • [- ] - a hyphen or space (you may change space to \s to match any whitespace)
    • [ - ] - 连字符或空格(您可以将空格更改为\ s以匹配任何空格)

    • \w+ - 1 or more alphanumeric or underscore
    • \ w + - 1个或更多字母数字或下划线

  • (?:[ - ] \ w +)* - 0或更多序列... [ - ] - 连字符或空格(您可以将空格更改为\ s以匹配任何空格)\ w + - 1个或更多字母数字或下划线

IDEONE demo:

$re = '/\w+(?:[- ]\w+)*/'; 
$str = "Some Words - Other Words (More Words) Dash-Binded-Word"; 
preg_match_all($re, $str, $matches);
print_r($matches[0]);

Result:

Array
(
    [0] => Some Words
    [1] => Other Words
    [2] => More Words
    [3] => Dash-Binded-Word
)

#2


2  

You can split by:

您可以拆分:

/\s*(?<!\w(?=.\w))[\-[\]()]\s*/

Explanation:

  1. The match is attempted against the character class [\-[\]()] (matches any of those characters). You could also add any char you want to that character class.
  2. 尝试对字符类[\ - [\]()]匹配(匹配任何这些字符)。您还可以将所需的任何字符添加到该字符类中。

  3. It's using a negative lookbehind (?<!\w) for the condition: "not preceded by a word character".
  4. 它使用负面的lookbehind(?

  5. And it also has a nested lookahead (?=.\w) that checks for: "if the first condition is met, it shouldn't be followed by any char -the one used to split- and a word character".
  6. 它还有一个嵌套的前瞻(?=。\ w),它检查:“如果满足第一个条件,则不应该跟随任何字符 - 用于拆分的字符和单词字符”。

  7. \s* at the beggining and the end is to trim whitespaces.
  8. \ * *在开始和结束是修剪空格。

Code:

$input_line = "Some Words - Other Words (More Words) Dash-Binded-Word";
$result = preg_split("/\s*(?<!\w(?=.\w))[\-[\]()]\s*/", $input_line);
var_dump($result);

Output:

array(4) {
  [0]=>
  string(10) "Some Words"
  [1]=>
  string(11) "Other Words"
  [2]=>
  string(10) "More Words"
  [3]=>
  string(16) "Dash-Binded-Word"
}

Run this code here

在此处运行此代码

Capturing parens

As stated in another comment, if you want to also capture parentheses:

如另一条评论中所述,如果您还要捕获括号:

$result = preg_split("/\s*(?:(?<!\w)-(?!\w)|(\(.*?\)|\[.*?]))\s*/", $input_line, -1, PREG_SPLIT_DELIM_CAPTURE);

#3


0  

Try this (combination of str_replace and explode). It is not optimum but may work for this case:

试试这个(str_replace和explode的组合)。它不是最佳的,但可能适用于这种情况:

$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$arr = Array(" - ", " (", ") ");
$var2 = str_replace($arr, "|", $var);
$final = explode('|', $var2);
var_dump($final);

Output:

array(4) { [0]=> string(10) "Some Words" [1]=> string(11) "Other Words" [2]=> string(10) "More Words" [3]=> string(16) "Dash-Binded-Word" }

array(4){[0] => string(10)“Some Words”[1] => string(11)“Other Words”[2] => string(10)“More Words”[3] => string (16)“Dash-Binded-Word”}

#4


0  

$var = "Some Words - Other Words (More Words) Dash-Binded-Word";

$var=preg_replace('/[^A-Za-z\-]/', ' ', $var);
$var=str_replace('-', ' ', $var); // Replaces all hyphens with spaces.
print_r (explode(" ",preg_replace('!\s+!', ' ', $var)));  //replaces all multiple spaces with one and explode creates array split where there is space

OUTPUT :-

Array ( [0] => Some [1] => Words [2] => Other [3] => Words [4] => More [5] => Words [6] => Dash [7] => Binded [8] => Word ) 

#1


4  

You can use a simple preg_match_all:

你可以使用一个简单的preg_match_all:

\w+(?:[- ]\w+)*

See demo

  • \w+ - 1 or more alphanumeric or underscore
  • \ w + - 1个或更多字母数字或下划线

  • (?:[- ]\w+)* - 0 or more sequences of...
    • [- ] - a hyphen or space (you may change space to \s to match any whitespace)
    • [ - ] - 连字符或空格(您可以将空格更改为\ s以匹配任何空格)

    • \w+ - 1 or more alphanumeric or underscore
    • \ w + - 1个或更多字母数字或下划线

  • (?:[ - ] \ w +)* - 0或更多序列... [ - ] - 连字符或空格(您可以将空格更改为\ s以匹配任何空格)\ w + - 1个或更多字母数字或下划线

IDEONE demo:

$re = '/\w+(?:[- ]\w+)*/'; 
$str = "Some Words - Other Words (More Words) Dash-Binded-Word"; 
preg_match_all($re, $str, $matches);
print_r($matches[0]);

Result:

Array
(
    [0] => Some Words
    [1] => Other Words
    [2] => More Words
    [3] => Dash-Binded-Word
)

#2


2  

You can split by:

您可以拆分:

/\s*(?<!\w(?=.\w))[\-[\]()]\s*/

Explanation:

  1. The match is attempted against the character class [\-[\]()] (matches any of those characters). You could also add any char you want to that character class.
  2. 尝试对字符类[\ - [\]()]匹配(匹配任何这些字符)。您还可以将所需的任何字符添加到该字符类中。

  3. It's using a negative lookbehind (?<!\w) for the condition: "not preceded by a word character".
  4. 它使用负面的lookbehind(?

  5. And it also has a nested lookahead (?=.\w) that checks for: "if the first condition is met, it shouldn't be followed by any char -the one used to split- and a word character".
  6. 它还有一个嵌套的前瞻(?=。\ w),它检查:“如果满足第一个条件,则不应该跟随任何字符 - 用于拆分的字符和单词字符”。

  7. \s* at the beggining and the end is to trim whitespaces.
  8. \ * *在开始和结束是修剪空格。

Code:

$input_line = "Some Words - Other Words (More Words) Dash-Binded-Word";
$result = preg_split("/\s*(?<!\w(?=.\w))[\-[\]()]\s*/", $input_line);
var_dump($result);

Output:

array(4) {
  [0]=>
  string(10) "Some Words"
  [1]=>
  string(11) "Other Words"
  [2]=>
  string(10) "More Words"
  [3]=>
  string(16) "Dash-Binded-Word"
}

Run this code here

在此处运行此代码

Capturing parens

As stated in another comment, if you want to also capture parentheses:

如另一条评论中所述,如果您还要捕获括号:

$result = preg_split("/\s*(?:(?<!\w)-(?!\w)|(\(.*?\)|\[.*?]))\s*/", $input_line, -1, PREG_SPLIT_DELIM_CAPTURE);

#3


0  

Try this (combination of str_replace and explode). It is not optimum but may work for this case:

试试这个(str_replace和explode的组合)。它不是最佳的,但可能适用于这种情况:

$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$arr = Array(" - ", " (", ") ");
$var2 = str_replace($arr, "|", $var);
$final = explode('|', $var2);
var_dump($final);

Output:

array(4) { [0]=> string(10) "Some Words" [1]=> string(11) "Other Words" [2]=> string(10) "More Words" [3]=> string(16) "Dash-Binded-Word" }

array(4){[0] => string(10)“Some Words”[1] => string(11)“Other Words”[2] => string(10)“More Words”[3] => string (16)“Dash-Binded-Word”}

#4


0  

$var = "Some Words - Other Words (More Words) Dash-Binded-Word";

$var=preg_replace('/[^A-Za-z\-]/', ' ', $var);
$var=str_replace('-', ' ', $var); // Replaces all hyphens with spaces.
print_r (explode(" ",preg_replace('!\s+!', ' ', $var)));  //replaces all multiple spaces with one and explode creates array split where there is space

OUTPUT :-

Array ( [0] => Some [1] => Words [2] => Other [3] => Words [4] => More [5] => Words [6] => Dash [7] => Binded [8] => Word )