PHP preg_split使用分隔符作为数组键

时间:2021-07-17 03:32:36

I need to split a string by an regex delimiter, but need the delimiter as array key.


Here is an example string:


*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times

The delimiter is an asterisk (*) followed by two alphanumeric characters. I use this regex pattern: /\*[A-Z0-9]{2}/

分隔符是星号(*),后跟两个字母数字字符。我使用这个正则表达式模式:/ \ * [A-Z0-9] {2} /

This is my preg_split call:


$attributes = preg_split('/\*[A-Z0-9]{2}/', $line);

This works, but I need each matching delimiter as the key of the value in an associative array.


What I get looks like this:


$matches = [
        0 => 'the title',
        1 => 'the author',
        2 => 'other useless infos',
        3 => 'other useful infos',
        4 => 'some delimiters can be there multiple times'

It should look like this:


$matches = [
        '*01' => 'the title',
        '*35' => 'the author',
        '*A7' => 'other useless infos',
        '*AE' => [
            'other useful infos',
            'some delimiters can be there multiple times',

Has anyone any suggestions on how to achieve this?


3 个解决方案



Use the PREG_SPLIT_DELIM_CAPTURE flag of the preg_split function to also get the captured delimiter (see documentation).


So in your case:


# The -1 is the limit parameter (no limit)
$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);

Now you have element 0 of $attributes as everything before the first delimiter and then alternating the captured delimiter and the next group so you can build your $matches array like this (assuming that you do not want to keep the first group):

现在你将$ attributes的元素0作为第一个分隔符之前的所有内容然后交替捕获的分隔符和下一个组,这样你就可以构建你的$ matches数组(假设你不想保留第一个组):

for($i=1; $i<sizeof($attributes)-1; $i+=2){
    $matches[$attributes[$i]] = $attributes[$i+1];

In order to account for delimiters being present multiple times you can adjust the line inside the for loop to check whether this key already exists and in that case create an array.


Edit: a possibility to create an array if necessary is to use this code:


for($i=1; $i<sizeof($attributes)-1; $i+=2){
    $key = $attributes[$i];
    if(array_key_exists($key, $matches)){
            $matches[$key] = [$matches[$key]];
        array_push($matches[$key], $attributes[$i+1]);
    } else {
        $matches[$attributes[$i]] = $attributes[$i+1];

The downstream code can certainly be simplified, especially if you put all values in (possibly single element) arrays.




You may match and capture the keys into Group 1 and all the text before the next delimiter into Group 2 where the delimiter is not the same as the first one captured. Then, in a loop, check all the keys and values and split those values with the delimiter pattern where it appears one or more times.


The regex is



See the regex demo.



  • (\*[A-Z0-9]{2}) - Delimiter, Group 1: a * and two uppercase letters or digits
  • (\ * [A-Z0-9] {2}) - 分隔符,第1组:a *和两个大写字母或数字

  • (.*?) - Value, Group 2: any 0+ chars other than line break chars, as few as possible
  • (。*?) - 值,第2组:除了换行符之外的任何0+字符,尽可能少

  • (?=(?!\1)\*[A-Z0-9]{2}|$) - up to the delimiter pattern (\*[A-Z0-9]{2}) that is not equal to the text captured in Group 1 ((?!\1)) or end of string ($).
  • (?=(?!\ 1)\ * [A-Z0-9] {2} | $) - 直至分隔符模式(\ * [A-Z0-9] {2})不等于在第1组((?!\ 1))或字符串结尾($)中捕获的文本。

See the PHP demo:


$re = '/(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)/';
$str = '*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times';
$res = [];
if (preg_match_all($re, $str, $m, PREG_SET_ORDER, 0)) {
    foreach ($m as $kvp) {
        $tmp = preg_split('~\*[A-Z0-9]+~', $kvp[2]);
        if (count($tmp) > 1) {
            $res[$kvp[1]] = $tmp;
        } else {
            $res[$kvp[1]] = $kvp[2];


    [*01] => the title
    [*35] => the author
    [*A7] => other useless infos
    [*AE] => Array
            [0] => other useful infos
            [1] => some delimiters can be there multiple times




Ok, I answer my own question on how to handle the multiple same delimiters. Thanks to @markus-ankenbrand for the start:

好的,我回答了自己关于如何处理多个相同分隔符的问题。感谢@ markus-ankenbrand的开始:

$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);
        $matches = [];
        for ($i = 1; $i < sizeof($attributes) - 1; $i += 2) {
            if (isset($matches[$attributes[$i]]) && is_array($matches[$attributes[$i]])) {
                $matches[$attributes[$i]][] = $attributes[$i + 1];
            } elseif (isset($matches[$attributes[$i]]) && !is_array($matches[$attributes[$i]])) {
                $currentValue = $matches[$attributes[$i]];
                $matches[$attributes[$i]] = [$currentValue];
                $matches[$attributes[$i]][] = $attributes[$i + 1];
            } else {
                $matches[$attributes[$i]] = $attributes[$i + 1];

The fat if/else statement does not look really nice, but it does what it need to do.

胖if / else语句看起来不太好,但它做了它需要做的事情。



Use the PREG_SPLIT_DELIM_CAPTURE flag of the preg_split function to also get the captured delimiter (see documentation).


So in your case:


# The -1 is the limit parameter (no limit)
$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);

Now you have element 0 of $attributes as everything before the first delimiter and then alternating the captured delimiter and the next group so you can build your $matches array like this (assuming that you do not want to keep the first group):

现在你将$ attributes的元素0作为第一个分隔符之前的所有内容然后交替捕获的分隔符和下一个组,这样你就可以构建你的$ matches数组(假设你不想保留第一个组):

for($i=1; $i<sizeof($attributes)-1; $i+=2){
    $matches[$attributes[$i]] = $attributes[$i+1];

In order to account for delimiters being present multiple times you can adjust the line inside the for loop to check whether this key already exists and in that case create an array.


Edit: a possibility to create an array if necessary is to use this code:


for($i=1; $i<sizeof($attributes)-1; $i+=2){
    $key = $attributes[$i];
    if(array_key_exists($key, $matches)){
            $matches[$key] = [$matches[$key]];
        array_push($matches[$key], $attributes[$i+1]);
    } else {
        $matches[$attributes[$i]] = $attributes[$i+1];

The downstream code can certainly be simplified, especially if you put all values in (possibly single element) arrays.




You may match and capture the keys into Group 1 and all the text before the next delimiter into Group 2 where the delimiter is not the same as the first one captured. Then, in a loop, check all the keys and values and split those values with the delimiter pattern where it appears one or more times.


The regex is



See the regex demo.



  • (\*[A-Z0-9]{2}) - Delimiter, Group 1: a * and two uppercase letters or digits
  • (\ * [A-Z0-9] {2}) - 分隔符,第1组:a *和两个大写字母或数字

  • (.*?) - Value, Group 2: any 0+ chars other than line break chars, as few as possible
  • (。*?) - 值,第2组:除了换行符之外的任何0+字符,尽可能少

  • (?=(?!\1)\*[A-Z0-9]{2}|$) - up to the delimiter pattern (\*[A-Z0-9]{2}) that is not equal to the text captured in Group 1 ((?!\1)) or end of string ($).
  • (?=(?!\ 1)\ * [A-Z0-9] {2} | $) - 直至分隔符模式(\ * [A-Z0-9] {2})不等于在第1组((?!\ 1))或字符串结尾($)中捕获的文本。

See the PHP demo:


$re = '/(\*[A-Z0-9]{2})(.*?)(?=(?!\1)\*[A-Z0-9]{2}|$)/';
$str = '*01the title*35the author*A7other useless infos*AEother useful infos*AEsome delimiters can be there multiple times';
$res = [];
if (preg_match_all($re, $str, $m, PREG_SET_ORDER, 0)) {
    foreach ($m as $kvp) {
        $tmp = preg_split('~\*[A-Z0-9]+~', $kvp[2]);
        if (count($tmp) > 1) {
            $res[$kvp[1]] = $tmp;
        } else {
            $res[$kvp[1]] = $kvp[2];


    [*01] => the title
    [*35] => the author
    [*A7] => other useless infos
    [*AE] => Array
            [0] => other useful infos
            [1] => some delimiters can be there multiple times




Ok, I answer my own question on how to handle the multiple same delimiters. Thanks to @markus-ankenbrand for the start:

好的,我回答了自己关于如何处理多个相同分隔符的问题。感谢@ markus-ankenbrand的开始:

$attributes = preg_split('/(\*[A-Z0-9]{2})/', $line, -1, PREG_SPLIT_DELIM_CAPTURE);
        $matches = [];
        for ($i = 1; $i < sizeof($attributes) - 1; $i += 2) {
            if (isset($matches[$attributes[$i]]) && is_array($matches[$attributes[$i]])) {
                $matches[$attributes[$i]][] = $attributes[$i + 1];
            } elseif (isset($matches[$attributes[$i]]) && !is_array($matches[$attributes[$i]])) {
                $currentValue = $matches[$attributes[$i]];
                $matches[$attributes[$i]] = [$currentValue];
                $matches[$attributes[$i]][] = $attributes[$i + 1];
            } else {
                $matches[$attributes[$i]] = $attributes[$i + 1];

The fat if/else statement does not look really nice, but it does what it need to do.

胖if / else语句看起来不太好,但它做了它需要做的事情。