I've been wrapping my head around this for days now, but nothing seems to give the desired result.
我已经在这周围绕着我好几天了,但似乎没有什么能给出理想的结果。
Example:
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
Desired result:
array(
[0] => Some Words
[1] => Other Words
[2] => More Words
[3] => Dash-Bound-Word
)
I was able to get this all working using preg_match_all, but then the "Dash-Bound-Word" was broken up as well. Trying to match it with surrounding spaces didn't work as it would break all the words except the dash bound ones.
我能够使用preg_match_all完成所有工作,但随后“Dash-Bound-Word”也被分解了。试图将它与周围的空间匹配不起作用,因为它会破坏除了破折号之外的所有单词。
The preg_match_all statement I used (which broke up the dash bound words too) is this:
我使用的preg_match_all语句(它也破坏了破折号的单词)是这样的:
preg_match_all('#\(.*?\)|\[.*?\]|[^?!\-|\(|\[]+#', $var, $array);
I'm certainly no expert on preg_match, preg_split so any help here would be greatly appreciated.
我当然不是preg_match,preg_split的专家,所以任何帮助都会非常感激。
4 个解决方案
#1
4
You can use a simple preg_match_all
:
你可以使用一个简单的preg_match_all:
\w+(?:[- ]\w+)*
See demo
-
\w+
- 1 or more alphanumeric or underscore -
(?:[- ]\w+)*
- 0 or more sequences of...-
[- ]
- a hyphen or space (you may change space to\s
to match any whitespace) -
\w+
- 1 or more alphanumeric or underscore
[ - ] - 连字符或空格(您可以将空格更改为\ s以匹配任何空格)
\ w + - 1个或更多字母数字或下划线
-
\ w + - 1个或更多字母数字或下划线
(?:[ - ] \ w +)* - 0或更多序列... [ - ] - 连字符或空格(您可以将空格更改为\ s以匹配任何空格)\ w + - 1个或更多字母数字或下划线
$re = '/\w+(?:[- ]\w+)*/';
$str = "Some Words - Other Words (More Words) Dash-Binded-Word";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Result:
Array
(
[0] => Some Words
[1] => Other Words
[2] => More Words
[3] => Dash-Binded-Word
)
#2
2
You can split by:
您可以拆分:
/\s*(?<!\w(?=.\w))[\-[\]()]\s*/
Explanation:
- The match is attempted against the character class
[\-[\]()]
(matches any of those characters). You could also add any char you want to that character class. - It's using a negative lookbehind
(?<!\w)
for the condition: "not preceded by a word character". - And it also has a nested lookahead
(?=.\w)
that checks for: "if the first condition is met, it shouldn't be followed by any char -the one used to split- and a word character". -
\s*
at the beggining and the end is to trim whitespaces.
尝试对字符类[\ - [\]()]匹配(匹配任何这些字符)。您还可以将所需的任何字符添加到该字符类中。
它使用负面的lookbehind(?
它还有一个嵌套的前瞻(?=。\ w),它检查:“如果满足第一个条件,则不应该跟随任何字符 - 用于拆分的字符和单词字符”。
\ * *在开始和结束是修剪空格。
Code:
$input_line = "Some Words - Other Words (More Words) Dash-Binded-Word";
$result = preg_split("/\s*(?<!\w(?=.\w))[\-[\]()]\s*/", $input_line);
var_dump($result);
Output:
array(4) {
[0]=>
string(10) "Some Words"
[1]=>
string(11) "Other Words"
[2]=>
string(10) "More Words"
[3]=>
string(16) "Dash-Binded-Word"
}
在此处运行此代码
Capturing parens
As stated in another comment, if you want to also capture parentheses:
如另一条评论中所述,如果您还要捕获括号:
$result = preg_split("/\s*(?:(?<!\w)-(?!\w)|(\(.*?\)|\[.*?]))\s*/", $input_line, -1, PREG_SPLIT_DELIM_CAPTURE);
#3
0
Try this (combination of str_replace and explode). It is not optimum but may work for this case:
试试这个(str_replace和explode的组合)。它不是最佳的,但可能适用于这种情况:
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$arr = Array(" - ", " (", ") ");
$var2 = str_replace($arr, "|", $var);
$final = explode('|', $var2);
var_dump($final);
Output:
array(4) { [0]=> string(10) "Some Words" [1]=> string(11) "Other Words" [2]=> string(10) "More Words" [3]=> string(16) "Dash-Binded-Word" }
array(4){[0] => string(10)“Some Words”[1] => string(11)“Other Words”[2] => string(10)“More Words”[3] => string (16)“Dash-Binded-Word”}
#4
0
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$var=preg_replace('/[^A-Za-z\-]/', ' ', $var);
$var=str_replace('-', ' ', $var); // Replaces all hyphens with spaces.
print_r (explode(" ",preg_replace('!\s+!', ' ', $var))); //replaces all multiple spaces with one and explode creates array split where there is space
OUTPUT :-
Array ( [0] => Some [1] => Words [2] => Other [3] => Words [4] => More [5] => Words [6] => Dash [7] => Binded [8] => Word )
#1
4
You can use a simple preg_match_all
:
你可以使用一个简单的preg_match_all:
\w+(?:[- ]\w+)*
See demo
-
\w+
- 1 or more alphanumeric or underscore -
(?:[- ]\w+)*
- 0 or more sequences of...-
[- ]
- a hyphen or space (you may change space to\s
to match any whitespace) -
\w+
- 1 or more alphanumeric or underscore
[ - ] - 连字符或空格(您可以将空格更改为\ s以匹配任何空格)
\ w + - 1个或更多字母数字或下划线
-
\ w + - 1个或更多字母数字或下划线
(?:[ - ] \ w +)* - 0或更多序列... [ - ] - 连字符或空格(您可以将空格更改为\ s以匹配任何空格)\ w + - 1个或更多字母数字或下划线
$re = '/\w+(?:[- ]\w+)*/';
$str = "Some Words - Other Words (More Words) Dash-Binded-Word";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Result:
Array
(
[0] => Some Words
[1] => Other Words
[2] => More Words
[3] => Dash-Binded-Word
)
#2
2
You can split by:
您可以拆分:
/\s*(?<!\w(?=.\w))[\-[\]()]\s*/
Explanation:
- The match is attempted against the character class
[\-[\]()]
(matches any of those characters). You could also add any char you want to that character class. - It's using a negative lookbehind
(?<!\w)
for the condition: "not preceded by a word character". - And it also has a nested lookahead
(?=.\w)
that checks for: "if the first condition is met, it shouldn't be followed by any char -the one used to split- and a word character". -
\s*
at the beggining and the end is to trim whitespaces.
尝试对字符类[\ - [\]()]匹配(匹配任何这些字符)。您还可以将所需的任何字符添加到该字符类中。
它使用负面的lookbehind(?
它还有一个嵌套的前瞻(?=。\ w),它检查:“如果满足第一个条件,则不应该跟随任何字符 - 用于拆分的字符和单词字符”。
\ * *在开始和结束是修剪空格。
Code:
$input_line = "Some Words - Other Words (More Words) Dash-Binded-Word";
$result = preg_split("/\s*(?<!\w(?=.\w))[\-[\]()]\s*/", $input_line);
var_dump($result);
Output:
array(4) {
[0]=>
string(10) "Some Words"
[1]=>
string(11) "Other Words"
[2]=>
string(10) "More Words"
[3]=>
string(16) "Dash-Binded-Word"
}
在此处运行此代码
Capturing parens
As stated in another comment, if you want to also capture parentheses:
如另一条评论中所述,如果您还要捕获括号:
$result = preg_split("/\s*(?:(?<!\w)-(?!\w)|(\(.*?\)|\[.*?]))\s*/", $input_line, -1, PREG_SPLIT_DELIM_CAPTURE);
#3
0
Try this (combination of str_replace and explode). It is not optimum but may work for this case:
试试这个(str_replace和explode的组合)。它不是最佳的,但可能适用于这种情况:
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$arr = Array(" - ", " (", ") ");
$var2 = str_replace($arr, "|", $var);
$final = explode('|', $var2);
var_dump($final);
Output:
array(4) { [0]=> string(10) "Some Words" [1]=> string(11) "Other Words" [2]=> string(10) "More Words" [3]=> string(16) "Dash-Binded-Word" }
array(4){[0] => string(10)“Some Words”[1] => string(11)“Other Words”[2] => string(10)“More Words”[3] => string (16)“Dash-Binded-Word”}
#4
0
$var = "Some Words - Other Words (More Words) Dash-Binded-Word";
$var=preg_replace('/[^A-Za-z\-]/', ' ', $var);
$var=str_replace('-', ' ', $var); // Replaces all hyphens with spaces.
print_r (explode(" ",preg_replace('!\s+!', ' ', $var))); //replaces all multiple spaces with one and explode creates array split where there is space
OUTPUT :-
Array ( [0] => Some [1] => Words [2] => Other [3] => Words [4] => More [5] => Words [6] => Dash [7] => Binded [8] => Word )