除了括号内的Perl拆分列表之外?

时间:2021-04-25 21:41:16

I have a database with a number of fields containing comma separated values. I need to split these fields in Perl, which is straightforward enough except that some of the values are followed by nested CSVs contained in brackets that I do not want to split.

我有一个包含逗号分隔值的字段的数据库。我需要在Perl中拆分这些字段,这很简单,除了一些值后面是嵌在括号中的嵌套的CSVs,我不想拆分。

Example:

例子:

recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education

Splitting on ", " gives me:

分裂","给我:

recycling
environmental science
interdisciplinary (e.g.
consumerism
waste management
chemistry
toxicology
government policy
and ethics)
consumer education

What I want is:

我想要的是:

recycling
environmental science
interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics)
consumer education

Can any Perl regex(perts) lend a hand?

任何Perl regex(perts)都能帮忙吗?

I have tried modifying a regex string I found in a similar SO post which returns no results:

我尝试修改我在类似SO post中找到的regex字符串,它没有返回任何结果:

#!/usr/bin/perl

use strict;
use warnings;

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};

my @parts = $s =~ m{\A(\w+) ([0-9]) (\([^\(]+\)) (\w+) ([0-9]) ([0-9]{2})};

use Data::Dumper;
print Dumper \@parts;

4 个解决方案

#1


9  

Try this:

试试这个:

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};

my @parts = split /(?![^(]+\)), /, $s;

#2


3  

The solution you have chosen is superior, but to those who would say otherwise, regular expressions have a recursion element which will match nested parentheses. The following works fine

您所选择的解决方案是优越的,但是对于那些可能会这样说的人来说,正则表达式有一个与嵌套圆括号匹配的递归元素。以下工作正常

use strict;
use warnings;

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};

my @parts;

push @parts, $1 while $s =~ /
((?:
  [^(),]+ |
  ( \(
    (?: [^()]+ | (?2) )*
  \) )
)*)
(?: ,\s* | $)
/xg;


print "$_\n" for @parts;

even if the parentheses are nested further. No it's not pretty but it does work!

即使括号是嵌套的。不,它不漂亮,但确实有用!

#3


0  

Did anyone say you have to do it in one step? You could slice of values in a loop. Given your example you could use something like this.

有人说过你必须一步完成吗?可以在循环中切片值。举个例子,你可以用这样的东西。

use strict;
use warnings;
use 5.010;

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};

my @parts;
while(1){

        my ($elem, $rest) = $s =~ m/^((?:\w|\s)+)(?:,\s*([^\(]*.*))?$/;
        if (not $elem) {
                say "second approach";
                ($elem, $rest) = $s =~ m/^(?:((?:\w|\s)+\s*\([^\)]+\)),\s*(.*))$/;
        }
        $s = $rest;
        push @parts, $elem;
        last if not $s;

}

use Data::Dumper;
print Dumper \@parts;

#4


0  

Another approach that uses loops and split. I haven't tested the performance, but shouldn't this be faster than the look-ahead regexp solutions (as the length of $str increases)?

另一种使用循环和拆分的方法。我还没有对性能进行测试,但是这不应该比前面的regexp解决方案(随着$str的长度增加)更快吗?

my @elems = split ",", $str;
my @answer;
my @parens;
while(scalar @elems) {
    push @answer,(shift @elems) while($elems[0] !~ /\(/);
    push @parens, (shift @elems) while($elems[0] !~ /\)/);
    push @answer, join ",", (@parens, shift @elems);
    @parens = ();
}

#1


9  

Try this:

试试这个:

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};

my @parts = split /(?![^(]+\)), /, $s;

#2


3  

The solution you have chosen is superior, but to those who would say otherwise, regular expressions have a recursion element which will match nested parentheses. The following works fine

您所选择的解决方案是优越的,但是对于那些可能会这样说的人来说,正则表达式有一个与嵌套圆括号匹配的递归元素。以下工作正常

use strict;
use warnings;

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};

my @parts;

push @parts, $1 while $s =~ /
((?:
  [^(),]+ |
  ( \(
    (?: [^()]+ | (?2) )*
  \) )
)*)
(?: ,\s* | $)
/xg;


print "$_\n" for @parts;

even if the parentheses are nested further. No it's not pretty but it does work!

即使括号是嵌套的。不,它不漂亮,但确实有用!

#3


0  

Did anyone say you have to do it in one step? You could slice of values in a loop. Given your example you could use something like this.

有人说过你必须一步完成吗?可以在循环中切片值。举个例子,你可以用这样的东西。

use strict;
use warnings;
use 5.010;

my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};

my @parts;
while(1){

        my ($elem, $rest) = $s =~ m/^((?:\w|\s)+)(?:,\s*([^\(]*.*))?$/;
        if (not $elem) {
                say "second approach";
                ($elem, $rest) = $s =~ m/^(?:((?:\w|\s)+\s*\([^\)]+\)),\s*(.*))$/;
        }
        $s = $rest;
        push @parts, $elem;
        last if not $s;

}

use Data::Dumper;
print Dumper \@parts;

#4


0  

Another approach that uses loops and split. I haven't tested the performance, but shouldn't this be faster than the look-ahead regexp solutions (as the length of $str increases)?

另一种使用循环和拆分的方法。我还没有对性能进行测试,但是这不应该比前面的regexp解决方案(随着$str的长度增加)更快吗?

my @elems = split ",", $str;
my @answer;
my @parens;
while(scalar @elems) {
    push @answer,(shift @elems) while($elems[0] !~ /\(/);
    push @parens, (shift @elems) while($elems[0] !~ /\)/);
    push @answer, join ",", (@parens, shift @elems);
    @parens = ();
}