使用preg_split来拆分和弦和单词

时间:2021-08-14 22:08:34

I'm working on a little piece of code playing handling song tabs, but i'm stuck on a problem.

我正在编写一段处理歌曲标签的代码,但是我遇到了一个问题。

I need to parse each song tab line and to split it to get chunks of chords on the one hand, and words in the other.

我需要解析每一个歌曲标签行,并将其拆分,一方面获得大量和弦,另一方面获得单词。

Each chunk would be like :

每一块都是这样的:

$line_chunk = array(
    0 => //part of line containing one or several chords
    1 => //part of line containing words
);

They should stay "grouped". I mean by this that it should split only when the function reaches the "limit" between chords and words.

他们应该保持“分组”。我的意思是,只有当函数达到和弦和单词之间的“极限”时,它才会分裂。

I guess I should use preg_split to achieve this. I made some tests, but I've been only able to split on chords, not "groups" of chords:

我想我应该使用preg_split来实现这一点。我做了一些测试,但我只能在和弦上分裂,而不是在和弦的“群”上:

$line_chunks = preg_split('/(\[[^]]*\])/', $line, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

Those examples shows you what I would like to get :

这些例子告诉你我想要得到什么:

on a line containing no chords :

在没有和弦的线上:

$input = '{intro}';

$results = array(
    array(
        0 => null,
        1 => '{intro}
    )
);

on a line containing only chords :

在一行中只包含和弦:

$input = '[C#] [Fm] [C#] [Fm] [C#] [Fm]';

$results = array(
    array(
        0 => '[C#] [Fm] [C#] [Fm] [C#] [Fm]',
        1 => null
    )
);

on a line containing both :

在一行中包含两个:

$input = '[C#]I’m looking for [Fm]you [G#]';

$results = array(
    array(
        0 => '[C#]',
        1 => 'I’m looking for'
    ),
    array(
        0 => '[Fm]',
        1 => 'you '
    ),
    array(
        0 => '[G#]',
        1 => null
    ),
);

Any ideas of how to do this ?

有什么办法吗?

Thanks !

谢谢!

2 个解决方案

#1


1  

preg_split isn't the way to go. Most of the time, when you have a complicated split task to achieve, it's more easy to try to match what you are interested by instead of trying to split with a not easy to define separator.

preg_split不是前进的方向。大多数情况下,当您有一个复杂的分割任务来实现时,您更容易尝试匹配您感兴趣的内容,而不是尝试使用不容易定义的分隔符。

A preg_match_all approach:

preg_match_all方法:

$pattern = '~ \h*
(?|        # open a "branch reset group"
    ( \[ [^]]+ ] (?: \h* \[ [^]]+ ] )*+ ) # one or more chords in capture group 1
    \h*
    ( [^[\n]* (?<=\S) )  # eventual lyrics (group 2)
  |                      # OR
    ()                   # no chords (group 1)
    ( [^[\n]* [^\s[] )   # lyrics (group 2)
)          # close the "branch reset group"
~x';

if (preg_match_all($pattern, $input, $matches, PREG_SET_ORDER)) {
    $result = array_map(function($i) { return [$i[1], $i[2]]; }, $matches);
    print_r($result);
}

demo

演示

A branch reset group preserves the same group numbering for each branch.

分支重置组为每个分支保留相同的组编号。

Note: feel free to add:

注意:请随意添加:

if (empty($i[1])) $i[1] = null;    
if (empty($i[2])) $i[2] = null;

in the map function if you want to obtain null items instead of empty items.

在map函数中,如果您想要获得空项而不是空项,则需要获取空项。

Note2: if you work line by line, you can remove the \n from the pattern.

如果你按一行一行,你可以从模式中删除\n。

#2


0  

I would go with PHP explode:

我用PHP表示:

/*
 * Process data
 */
$input = '[C#]I’m looking for [Fm]you [G#]';
$parts = explode("[", $input);
$results = array();

foreach ($parts as $item)
{
    $pieces = explode("]", $item);

    if (count($pieces) < 2)
    {
        $arrayitem = array( "Chord" => $pieces[0],
                            "Lyric" => "");
    }
    else
    {
        $arrayitem = array( "Chord" => $pieces[0],
                            "Lyric" => $pieces[1]);
    }

    $results[] = $arrayitem;
}

/*
 * Echo results
 */
foreach ($results as $str)
{
    echo "Chord: " . $str["Chord"];
    echo "Lyric: " . $str["Lyric"];
}

Boudaries are not tested in the code, as well as remaining whitespaces, but it is a base to work on.

在代码中没有测试Boudaries和剩下的空白,但是它是一个基础。

#1


1  

preg_split isn't the way to go. Most of the time, when you have a complicated split task to achieve, it's more easy to try to match what you are interested by instead of trying to split with a not easy to define separator.

preg_split不是前进的方向。大多数情况下,当您有一个复杂的分割任务来实现时,您更容易尝试匹配您感兴趣的内容,而不是尝试使用不容易定义的分隔符。

A preg_match_all approach:

preg_match_all方法:

$pattern = '~ \h*
(?|        # open a "branch reset group"
    ( \[ [^]]+ ] (?: \h* \[ [^]]+ ] )*+ ) # one or more chords in capture group 1
    \h*
    ( [^[\n]* (?<=\S) )  # eventual lyrics (group 2)
  |                      # OR
    ()                   # no chords (group 1)
    ( [^[\n]* [^\s[] )   # lyrics (group 2)
)          # close the "branch reset group"
~x';

if (preg_match_all($pattern, $input, $matches, PREG_SET_ORDER)) {
    $result = array_map(function($i) { return [$i[1], $i[2]]; }, $matches);
    print_r($result);
}

demo

演示

A branch reset group preserves the same group numbering for each branch.

分支重置组为每个分支保留相同的组编号。

Note: feel free to add:

注意:请随意添加:

if (empty($i[1])) $i[1] = null;    
if (empty($i[2])) $i[2] = null;

in the map function if you want to obtain null items instead of empty items.

在map函数中,如果您想要获得空项而不是空项,则需要获取空项。

Note2: if you work line by line, you can remove the \n from the pattern.

如果你按一行一行,你可以从模式中删除\n。

#2


0  

I would go with PHP explode:

我用PHP表示:

/*
 * Process data
 */
$input = '[C#]I’m looking for [Fm]you [G#]';
$parts = explode("[", $input);
$results = array();

foreach ($parts as $item)
{
    $pieces = explode("]", $item);

    if (count($pieces) < 2)
    {
        $arrayitem = array( "Chord" => $pieces[0],
                            "Lyric" => "");
    }
    else
    {
        $arrayitem = array( "Chord" => $pieces[0],
                            "Lyric" => $pieces[1]);
    }

    $results[] = $arrayitem;
}

/*
 * Echo results
 */
foreach ($results as $str)
{
    echo "Chord: " . $str["Chord"];
    echo "Lyric: " . $str["Lyric"];
}

Boudaries are not tested in the code, as well as remaining whitespaces, but it is a base to work on.

在代码中没有测试Boudaries和剩下的空白,但是它是一个基础。