In Perl, how can I use one regex grouping to capture more than one occurrence that matches it, into several array elements?
在Perl中,如何使用一个regex分组将多个匹配的事件捕获到几个数组元素中?
For example, for a string:
例如,对于字符串:
var1=100 var2=90 var5=hello var3="a, b, c" var7=test var3=hello
to process this with code:
用代码处理:
$string = "var1=100 var2=90 var5=hello var3=\"a, b, c\" var7=test var3=hello";
my @array = $string =~ <regular expression here>
for ( my $i = 0; $i < scalar( @array ); $i++ )
{
print $i.": ".$array[$i]."\n";
}
I would like to see as output:
我想把它看作是输出:
0: var1=100
1: var2=90
2: var5=hello
3: var3="a, b, c"
4: var7=test
5: var3=hello
What would I use as a regex?
我将如何使用正则表达式?
The commonality between things I want to match here is an assignment string pattern, so something like:
我想要匹配的东西之间的共性是一个赋值字符串模式,比如:
my @array = $string =~ m/(\w+=[\w\"\,\s]+)*/;
Where the * indicates one or more occurrences matching the group.
其中*表示与组匹配的一个或多个事件。
(I discounted using a split() as some matches contain spaces within themselves (i.e. var3...) and would therefore not give desired results.)
(我使用split()进行了折扣,因为一些匹配包含了自己的空间(例如var3…),因此不会给出期望的结果。)
With the above regex, I only get:
使用上面的regex,我只能得到:
0: var1=100 var2
Is it possible in a regex? Or addition code required?
在regex中可能吗?或者需要添加代码?
Looked at existing answers already, when searching for "perl regex multiple group" but not enough clues:
在搜索“perl regex多组”时,已经查看了现有的答案,但是没有足够的线索:
- Dealing with multiple capture groups in multiple records
- 处理多个记录中的多个捕获组
- Multiple matches within a regex group?
- regex组中的多个匹配?
- Regex: Repeated capturing groups
- Regex:重复捕获组
- Regex match and grouping
- 正则表达式匹配和分组
- How do I regex match with grouping with unknown number of groups
- 如何将正则表达式与未知组匹配
- awk extract multiple groups from each line
- awk从每一行提取多个组
- Matching multiple regex groups and removing them
- 匹配多个regex组并删除它们
- Perl: Deleting multiple reccuring lines where a certain criterion is met
- Perl:删除满足某个标准的多个重写行
- Regex matching into multiple groups per line?
- Regex匹配为每行多个组?
- PHP RegEx Grouping Multiple Matches
- 对多个匹配进行分组
- How to find multiple occurrences with regex groups?
- 如何找到regex组的多次出现?
9 个解决方案
#1
39
my $string = "var1=100 var2=90 var5=hello var3=\"a, b, c\" var7=test var3=hello";
while($string =~ /(?:^|\s+)(\S+)\s*=\s*("[^"]*"|\S*)/g) {
print "<$1> => <$2>\n";
}
Prints:
打印:
<var1> => <100>
<var2> => <90>
<var5> => <hello>
<var3> => <"a, b, c">
<var7> => <test>
<var3> => <hello>
Explanation:
解释:
Last piece first: the g
flag at the end means that you can apply the regex to the string multiple times. The second time it will continue matching where the last match ended in the string.
最后一篇文章:末尾的g标志意味着您可以多次将正则表达式应用于字符串。第二次它将继续匹配最后一个匹配在字符串中结束的位置。
Now for the regex: (?:^|\s+)
matches either the beginning of the string or a group of one or more spaces. This is needed so when the regex is applied next time, we will skip the spaces between the key/value pairs. The ?:
means that the parentheses content won't be captured as group (we don't need the spaces, only key and value). \S+
matches the variable name. Then we skip any amount of spaces and an equal sign in between. Finally, ("[^"]*"|\S*)/
matches either two quotes with any amount of characters in between, or any amount of non-space characters for the value. Note that the quote matching is pretty fragile and won't handle escpaped quotes properly, e.g. "\"quoted\""
would result in "\"
.
regex:现在(?:^ | \ s +)匹配字符串的开始或一组一个或多个空格。这是必需的,所以下次应用regex时,我们将跳过键/值对之间的空格。?:表示圆括号内容不会被捕获为组(我们不需要空格,只需要键和值)。\S+匹配变量名。然后我们跳过任何空格和中间的等号。最后,(“[^ "]*”| \ S *)/匹配两个引号之间的任意数量的字符,或任何数量的字符进行技术改造的价值。注意,报价匹配非常脆弱,不能正确地处理escpaped报价,例如。"\"引述\"会产生"\"。
EDIT:
编辑:
Since you really want to get the whole assignment, and not the single keys/values, here's a one-liner that extracts those:
因为你真的想要得到整个赋值,而不是单个键/值,这里有一个一行代码可以提取这些:
my @list = $string =~ /(?:^|\s+)((?:\S+)\s*=\s*(?:"[^"]*"|\S*))/g;
#2
8
With regular expressions, use a technique that I like to call tack-and-stretch: anchor on features you know will be there (tack) and then grab what's between (stretch).
使用正则表达式时,使用一种我喜欢称之为“处理-拉伸”的技术:在您知道的特性上进行锚定(大头针),然后获取(拉伸)之间的内容。
In this case, you know that a single assignment matches
在这种情况下,您知道单个赋值匹配
\b\w+=.+
and you have many of these repeated in $string
. Remember that \b
means word boundary:
你有很多这样的重复的$string。记住,\b的意思是单词边界:
A word boundary (
\b
) is a spot between two characters that has a\w
on one side of it and a\W
on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a\W
.单词边界(\b)是两个字符之间的一个点,其中一边有一个\w,另一边有一个\w(按任意顺序排列),将字符串开头和结尾的虚构字符计数为匹配的\w。
The values in the assignments can be a little tricky to describe with a regular expression, but you also know that each value will terminate with whitespace—although not necessarily the first whitespace encountered!—followed by either another assignment or end-of-string.
赋值中的值可能有点难以用正则表达式来描述,但是您也知道每个值都将使用whitespace终止——尽管这并不一定是第一次遇到的空格!-后面是另一个赋值或字符串结尾。
To avoid repeating the assertion pattern, compile it once with qr//
and reuse it in your pattern along with a look-ahead assertion (?=...)
to stretch the match just far enough to capture the entire value while also preventing it from spilling into the next variable name.
为了避免重复断言模式,使用qr// /对其进行一次编译,并在您的模式中重用它,并使用一个向前查找的断言(?
Matching against your pattern in list context with m//g
gives the following behavior:
将列表上下文中的模式与m//g匹配,可以得到以下行为:
The
/g
modifier specifies global pattern matching—that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern./g修饰符指定全局模式匹配——即在字符串中尽可能多地匹配。它的行为取决于上下文。在list上下文中,它返回由正则表达式中的捕获括号匹配的子字符串的列表。如果没有括号,它将返回所有匹配字符串的列表,就好像整个模式中都有括号一样。
The pattern $assignment
uses non-greedy .+?
to cut off the value as soon as the look-ahead sees another assignment or end-of-line. Remember that the match returns the substrings from all capturing subpatterns, so the look-ahead's alternation uses non-capturing (?:...)
. The qr//
, in contrast, contains implicit capturing parentheses.
$assignment模式使用非贪婪。+?当前面的人看到另一个赋值或行尾时,立即将值截断。请记住,匹配返回来自所有捕获子模式的子字符串,因此look-ahead的交替使用非捕获(?:…)。而qr// /则包含隐式捕获括号。
#! /usr/bin/perl
use warnings;
use strict;
my $string = <<'EOF';
var1=100 var2=90 var5=hello var3="a, b, c" var7=test var3=hello
EOF
my $assignment = qr/\b\w+ = .+?/x;
my @array = $string =~ /$assignment (?= \s+ (?: $ | $assignment))/gx;
for ( my $i = 0; $i < scalar( @array ); $i++ )
{
print $i.": ".$array[$i]."\n";
}
Output:
输出:
0: var1=100 1: var2=90 2: var5=hello 3: var3="a, b, c" 4: var7=test 5: var3=hello
#3
7
I'm not saying this is what you should do, but what you're trying to do is write a Grammar. Now your example is very simple for a Grammar, but Damian Conway's module Regexp::Grammars is really great at this. If you have to grow this at all, you'll find it will make your life much easier. I use it quite a bit here - it is kind of perl6-ish.
我不是说这是你应该做的,但是你要做的是写一个语法。现在您的示例对于语法来说非常简单,但是Damian Conway的模块Regexp::Grammars在这方面非常出色。如果你必须种植这种植物,你会发现它会让你的生活更轻松。我在这里用了很多,有点像perl6-ish。
use Regexp::Grammars;
use Data::Dumper;
use strict;
use warnings;
my $parser = qr{
<[pair]>+
<rule: pair> <key>=(?:"<list>"|<value=literal>)
<token: key> var\d+
<rule: list> <[MATCH=literal]> ** (,)
<token: literal> \S+
}xms;
q[var1=100 var2=90 var5=hello var3="a, b, c" var7=test var3=hello] =~ $parser;
die Dumper {%/};
Output:
输出:
$VAR1 = {
'' => 'var1=100 var2=90 var5=hello var3="a, b, c" var7=test var3=hello',
'pair' => [
{
'' => 'var1=100',
'value' => '100',
'key' => 'var1'
},
{
'' => 'var2=90',
'value' => '90',
'key' => 'var2'
},
{
'' => 'var5=hello',
'value' => 'hello',
'key' => 'var5'
},
{
'' => 'var3="a, b, c"',
'key' => 'var3',
'list' => [
'a',
'b',
'c'
]
},
{
'' => 'var7=test',
'value' => 'test',
'key' => 'var7'
},
{
'' => 'var3=hello',
'value' => 'hello',
'key' => 'var3'
}
]
#4
4
A bit over the top maybe, but an excuse for me to look into http://p3rl.org/Parse::RecDescent. How about making a parser?
可能有点过头了,但我有理由去查看http://p3rl.org/Parse: RecDescent。做一个解析器怎么样?
#!/usr/bin/perl
use strict;
use warnings;
use Parse::RecDescent;
use Regexp::Common;
my $grammar = <<'_EOGRAMMAR_'
INTEGER: /[-+]?\d+/
STRING: /\S+/
QSTRING: /$Regexp::Common::RE{quoted}/
VARIABLE: /var\d+/
VALUE: ( QSTRING | STRING | INTEGER )
assignment: VARIABLE "=" VALUE /[\s]*/ { print "$item{VARIABLE} => $item{VALUE}\n"; }
startrule: assignment(s)
_EOGRAMMAR_
;
$Parse::RecDescent::skip = '';
my $parser = Parse::RecDescent->new($grammar);
my $code = q{var1=100 var2=90 var5=hello var3="a, b, c" var7=test var8=" haha \" heh " var3=hello};
$parser->startrule($code);
yields:
收益率:
var1 => 100
var2 => 90
var5 => hello
var3 => "a, b, c"
var7 => test
var8 => " haha \" heh "
var3 => hello
PS. Note the double var3, if you want the latter assignment to overwrite the first one you can use a hash to store the values, and then use them later.
注意,如果您希望后一个任务覆盖第一个,您可以使用散列来存储值,然后再使用它们。
PPS. My first thought was to split on '=' but that would fail if a string contained '=' and since regexps are almost always bad for parsing, well I ended up trying it out and it works.
pp。我的第一个想法是在'='上拆分,但是如果包含'='的字符串会失败,而且由于regexp几乎总是不利于解析,所以我最后尝试了一下,它是有效的。
Edit: Added support for escaped quotes inside quoted strings.
编辑:添加了对引用字符串中转义引号的支持。
#5
3
I've recently had to parse x509 certificates "Subject" lines. They had similar form to the one you have provided:
我最近不得不解析x509证书的“主题”行。它们的形式与您提供的类似:
echo 'Subject: C=HU, L=Budapest, O=Microsec Ltd., CN=Microsec e-Szigno Root CA 2009/emailAddress=info@e-szigno.hu' | \
perl -wne 'my @a = m/(\w+\=.+?)(?=(?:, \w+\=|$))/g; print "$_\n" foreach @a;'
C=HU
L=Budapest
O=Microsec Ltd.
CN=Microsec e-Szigno Root CA 2009/emailAddress=info@e-szigno.hu
Short description of the regex:
regex的简短描述:
(\w+\=.+?)
- capture words followed by '=' and any subsequent symbols in non greedy mode(?=(?:, \w+\=|$))
- which are followed by either another , KEY=val
or end of line.
(\w+\=.+?) -在非贪心模式(?= ?:, \w+\=|$) -后面跟着另一个,KEY=val或行尾。
The interesting part of the regex used are:
使用regex的有趣部分是:
-
.+?
- Non greedy mode - + ?——非贪婪模式
-
(?:pattern)
- Non capturing mode - (?:模式)-非捕获模式
-
(?=pattern)
zero-width positive look-ahead assertion - 零宽度正向前看断言
#6
2
This one will provide you also common escaping in double-quotes as for example var3="a, \"b, c".
它还将提供常见的双引号转义,例如var3="a, \"b, c"。
@a = /(\w+=(?:\w+|"(?:[^\\"]*(?:\\.[^\\"]*)*)*"))/g;
In action:
在行动:
echo 'var1=100 var2=90 var42="foo\"bar\\" var5=hello var3="a, b, c" var7=test var3=hello' |
perl -nle '@a = /(\w+=(?:\w+|"(?:[^\\"]*(?:\\.[^\\"]*)*)*"))/g; $,=","; print @a'
var1=100,var2=90,var42="foo\"bar\\",var5=hello,var3="a, b, c",var7=test,var3=hello
#7
2
#!/usr/bin/perl
use strict; use warnings;
use Text::ParseWords;
use YAML;
my $string =
"var1=100 var2=90 var5=hello var3=\"a, b, c\" var7=test var3=hello";
my @parts = shellwords $string;
print Dump \@parts;
@parts = map { { split /=/ } } @parts;
print Dump \@parts;
#8
1
You asked for a RegEx solution or other code. Here is a (mostly) non regex solution using only core modules. The only regex is \s+
to determine the delimiter; in this case one or more spaces.
您需要RegEx解决方案或其他代码。这里有一个(大部分)非regex解决方案,只使用核心模块。唯一的正则表达式是\s+来确定定界符;在这种情况下,一个或多个空格。
use strict; use warnings;
use Text::ParseWords;
my $string="var1=100 var2=90 var5=hello var3=\"a, b, c\" var7=test var3=hello";
my @array = quotewords('\s+', 0, $string);
for ( my $i = 0; $i < scalar( @array ); $i++ )
{
print $i.": ".$array[$i]."\n";
}
Or you can execute the code HERE
或者你可以在这里执行代码
The output is:
的输出是:
0: var1=100
1: var2=90
2: var5=hello
3: var3=a, b, c
4: var7=test
5: var3=hello
If you really want a regex solution, Alan Moore's comment linking to his code on IDEone is the gas!
如果你真的想要一个regex解决方案,Alan Moore的评论链接到他在IDEone上的代码就是the gas!
#9
0
It is possible to do this with regexes, however it's fragile.
使用regex也可以这样做,但是它是脆弱的。
my $string = "var1=100 var2=90 var5=hello var3=\"a, b, c\" var7=test var3=hello";
my $regexp = qr/( (?:\w+=[\w\,]+) | (?:\w+=\"[^\"]*\") )/x;
my @matches = $string =~ /$regexp/g;
#1
39
my $string = "var1=100 var2=90 var5=hello var3=\"a, b, c\" var7=test var3=hello";
while($string =~ /(?:^|\s+)(\S+)\s*=\s*("[^"]*"|\S*)/g) {
print "<$1> => <$2>\n";
}
Prints:
打印:
<var1> => <100>
<var2> => <90>
<var5> => <hello>
<var3> => <"a, b, c">
<var7> => <test>
<var3> => <hello>
Explanation:
解释:
Last piece first: the g
flag at the end means that you can apply the regex to the string multiple times. The second time it will continue matching where the last match ended in the string.
最后一篇文章:末尾的g标志意味着您可以多次将正则表达式应用于字符串。第二次它将继续匹配最后一个匹配在字符串中结束的位置。
Now for the regex: (?:^|\s+)
matches either the beginning of the string or a group of one or more spaces. This is needed so when the regex is applied next time, we will skip the spaces between the key/value pairs. The ?:
means that the parentheses content won't be captured as group (we don't need the spaces, only key and value). \S+
matches the variable name. Then we skip any amount of spaces and an equal sign in between. Finally, ("[^"]*"|\S*)/
matches either two quotes with any amount of characters in between, or any amount of non-space characters for the value. Note that the quote matching is pretty fragile and won't handle escpaped quotes properly, e.g. "\"quoted\""
would result in "\"
.
regex:现在(?:^ | \ s +)匹配字符串的开始或一组一个或多个空格。这是必需的,所以下次应用regex时,我们将跳过键/值对之间的空格。?:表示圆括号内容不会被捕获为组(我们不需要空格,只需要键和值)。\S+匹配变量名。然后我们跳过任何空格和中间的等号。最后,(“[^ "]*”| \ S *)/匹配两个引号之间的任意数量的字符,或任何数量的字符进行技术改造的价值。注意,报价匹配非常脆弱,不能正确地处理escpaped报价,例如。"\"引述\"会产生"\"。
EDIT:
编辑:
Since you really want to get the whole assignment, and not the single keys/values, here's a one-liner that extracts those:
因为你真的想要得到整个赋值,而不是单个键/值,这里有一个一行代码可以提取这些:
my @list = $string =~ /(?:^|\s+)((?:\S+)\s*=\s*(?:"[^"]*"|\S*))/g;
#2
8
With regular expressions, use a technique that I like to call tack-and-stretch: anchor on features you know will be there (tack) and then grab what's between (stretch).
使用正则表达式时,使用一种我喜欢称之为“处理-拉伸”的技术:在您知道的特性上进行锚定(大头针),然后获取(拉伸)之间的内容。
In this case, you know that a single assignment matches
在这种情况下,您知道单个赋值匹配
\b\w+=.+
and you have many of these repeated in $string
. Remember that \b
means word boundary:
你有很多这样的重复的$string。记住,\b的意思是单词边界:
A word boundary (
\b
) is a spot between two characters that has a\w
on one side of it and a\W
on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a\W
.单词边界(\b)是两个字符之间的一个点,其中一边有一个\w,另一边有一个\w(按任意顺序排列),将字符串开头和结尾的虚构字符计数为匹配的\w。
The values in the assignments can be a little tricky to describe with a regular expression, but you also know that each value will terminate with whitespace—although not necessarily the first whitespace encountered!—followed by either another assignment or end-of-string.
赋值中的值可能有点难以用正则表达式来描述,但是您也知道每个值都将使用whitespace终止——尽管这并不一定是第一次遇到的空格!-后面是另一个赋值或字符串结尾。
To avoid repeating the assertion pattern, compile it once with qr//
and reuse it in your pattern along with a look-ahead assertion (?=...)
to stretch the match just far enough to capture the entire value while also preventing it from spilling into the next variable name.
为了避免重复断言模式,使用qr// /对其进行一次编译,并在您的模式中重用它,并使用一个向前查找的断言(?
Matching against your pattern in list context with m//g
gives the following behavior:
将列表上下文中的模式与m//g匹配,可以得到以下行为:
The
/g
modifier specifies global pattern matching—that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern./g修饰符指定全局模式匹配——即在字符串中尽可能多地匹配。它的行为取决于上下文。在list上下文中,它返回由正则表达式中的捕获括号匹配的子字符串的列表。如果没有括号,它将返回所有匹配字符串的列表,就好像整个模式中都有括号一样。
The pattern $assignment
uses non-greedy .+?
to cut off the value as soon as the look-ahead sees another assignment or end-of-line. Remember that the match returns the substrings from all capturing subpatterns, so the look-ahead's alternation uses non-capturing (?:...)
. The qr//
, in contrast, contains implicit capturing parentheses.
$assignment模式使用非贪婪。+?当前面的人看到另一个赋值或行尾时,立即将值截断。请记住,匹配返回来自所有捕获子模式的子字符串,因此look-ahead的交替使用非捕获(?:…)。而qr// /则包含隐式捕获括号。
#! /usr/bin/perl
use warnings;
use strict;
my $string = <<'EOF';
var1=100 var2=90 var5=hello var3="a, b, c" var7=test var3=hello
EOF
my $assignment = qr/\b\w+ = .+?/x;
my @array = $string =~ /$assignment (?= \s+ (?: $ | $assignment))/gx;
for ( my $i = 0; $i < scalar( @array ); $i++ )
{
print $i.": ".$array[$i]."\n";
}
Output:
输出:
0: var1=100 1: var2=90 2: var5=hello 3: var3="a, b, c" 4: var7=test 5: var3=hello
#3
7
I'm not saying this is what you should do, but what you're trying to do is write a Grammar. Now your example is very simple for a Grammar, but Damian Conway's module Regexp::Grammars is really great at this. If you have to grow this at all, you'll find it will make your life much easier. I use it quite a bit here - it is kind of perl6-ish.
我不是说这是你应该做的,但是你要做的是写一个语法。现在您的示例对于语法来说非常简单,但是Damian Conway的模块Regexp::Grammars在这方面非常出色。如果你必须种植这种植物,你会发现它会让你的生活更轻松。我在这里用了很多,有点像perl6-ish。
use Regexp::Grammars;
use Data::Dumper;
use strict;
use warnings;
my $parser = qr{
<[pair]>+
<rule: pair> <key>=(?:"<list>"|<value=literal>)
<token: key> var\d+
<rule: list> <[MATCH=literal]> ** (,)
<token: literal> \S+
}xms;
q[var1=100 var2=90 var5=hello var3="a, b, c" var7=test var3=hello] =~ $parser;
die Dumper {%/};
Output:
输出:
$VAR1 = {
'' => 'var1=100 var2=90 var5=hello var3="a, b, c" var7=test var3=hello',
'pair' => [
{
'' => 'var1=100',
'value' => '100',
'key' => 'var1'
},
{
'' => 'var2=90',
'value' => '90',
'key' => 'var2'
},
{
'' => 'var5=hello',
'value' => 'hello',
'key' => 'var5'
},
{
'' => 'var3="a, b, c"',
'key' => 'var3',
'list' => [
'a',
'b',
'c'
]
},
{
'' => 'var7=test',
'value' => 'test',
'key' => 'var7'
},
{
'' => 'var3=hello',
'value' => 'hello',
'key' => 'var3'
}
]
#4
4
A bit over the top maybe, but an excuse for me to look into http://p3rl.org/Parse::RecDescent. How about making a parser?
可能有点过头了,但我有理由去查看http://p3rl.org/Parse: RecDescent。做一个解析器怎么样?
#!/usr/bin/perl
use strict;
use warnings;
use Parse::RecDescent;
use Regexp::Common;
my $grammar = <<'_EOGRAMMAR_'
INTEGER: /[-+]?\d+/
STRING: /\S+/
QSTRING: /$Regexp::Common::RE{quoted}/
VARIABLE: /var\d+/
VALUE: ( QSTRING | STRING | INTEGER )
assignment: VARIABLE "=" VALUE /[\s]*/ { print "$item{VARIABLE} => $item{VALUE}\n"; }
startrule: assignment(s)
_EOGRAMMAR_
;
$Parse::RecDescent::skip = '';
my $parser = Parse::RecDescent->new($grammar);
my $code = q{var1=100 var2=90 var5=hello var3="a, b, c" var7=test var8=" haha \" heh " var3=hello};
$parser->startrule($code);
yields:
收益率:
var1 => 100
var2 => 90
var5 => hello
var3 => "a, b, c"
var7 => test
var8 => " haha \" heh "
var3 => hello
PS. Note the double var3, if you want the latter assignment to overwrite the first one you can use a hash to store the values, and then use them later.
注意,如果您希望后一个任务覆盖第一个,您可以使用散列来存储值,然后再使用它们。
PPS. My first thought was to split on '=' but that would fail if a string contained '=' and since regexps are almost always bad for parsing, well I ended up trying it out and it works.
pp。我的第一个想法是在'='上拆分,但是如果包含'='的字符串会失败,而且由于regexp几乎总是不利于解析,所以我最后尝试了一下,它是有效的。
Edit: Added support for escaped quotes inside quoted strings.
编辑:添加了对引用字符串中转义引号的支持。
#5
3
I've recently had to parse x509 certificates "Subject" lines. They had similar form to the one you have provided:
我最近不得不解析x509证书的“主题”行。它们的形式与您提供的类似:
echo 'Subject: C=HU, L=Budapest, O=Microsec Ltd., CN=Microsec e-Szigno Root CA 2009/emailAddress=info@e-szigno.hu' | \
perl -wne 'my @a = m/(\w+\=.+?)(?=(?:, \w+\=|$))/g; print "$_\n" foreach @a;'
C=HU
L=Budapest
O=Microsec Ltd.
CN=Microsec e-Szigno Root CA 2009/emailAddress=info@e-szigno.hu
Short description of the regex:
regex的简短描述:
(\w+\=.+?)
- capture words followed by '=' and any subsequent symbols in non greedy mode(?=(?:, \w+\=|$))
- which are followed by either another , KEY=val
or end of line.
(\w+\=.+?) -在非贪心模式(?= ?:, \w+\=|$) -后面跟着另一个,KEY=val或行尾。
The interesting part of the regex used are:
使用regex的有趣部分是:
-
.+?
- Non greedy mode - + ?——非贪婪模式
-
(?:pattern)
- Non capturing mode - (?:模式)-非捕获模式
-
(?=pattern)
zero-width positive look-ahead assertion - 零宽度正向前看断言
#6
2
This one will provide you also common escaping in double-quotes as for example var3="a, \"b, c".
它还将提供常见的双引号转义,例如var3="a, \"b, c"。
@a = /(\w+=(?:\w+|"(?:[^\\"]*(?:\\.[^\\"]*)*)*"))/g;
In action:
在行动:
echo 'var1=100 var2=90 var42="foo\"bar\\" var5=hello var3="a, b, c" var7=test var3=hello' |
perl -nle '@a = /(\w+=(?:\w+|"(?:[^\\"]*(?:\\.[^\\"]*)*)*"))/g; $,=","; print @a'
var1=100,var2=90,var42="foo\"bar\\",var5=hello,var3="a, b, c",var7=test,var3=hello
#7
2
#!/usr/bin/perl
use strict; use warnings;
use Text::ParseWords;
use YAML;
my $string =
"var1=100 var2=90 var5=hello var3=\"a, b, c\" var7=test var3=hello";
my @parts = shellwords $string;
print Dump \@parts;
@parts = map { { split /=/ } } @parts;
print Dump \@parts;
#8
1
You asked for a RegEx solution or other code. Here is a (mostly) non regex solution using only core modules. The only regex is \s+
to determine the delimiter; in this case one or more spaces.
您需要RegEx解决方案或其他代码。这里有一个(大部分)非regex解决方案,只使用核心模块。唯一的正则表达式是\s+来确定定界符;在这种情况下,一个或多个空格。
use strict; use warnings;
use Text::ParseWords;
my $string="var1=100 var2=90 var5=hello var3=\"a, b, c\" var7=test var3=hello";
my @array = quotewords('\s+', 0, $string);
for ( my $i = 0; $i < scalar( @array ); $i++ )
{
print $i.": ".$array[$i]."\n";
}
Or you can execute the code HERE
或者你可以在这里执行代码
The output is:
的输出是:
0: var1=100
1: var2=90
2: var5=hello
3: var3=a, b, c
4: var7=test
5: var3=hello
If you really want a regex solution, Alan Moore's comment linking to his code on IDEone is the gas!
如果你真的想要一个regex解决方案,Alan Moore的评论链接到他在IDEone上的代码就是the gas!
#9
0
It is possible to do this with regexes, however it's fragile.
使用regex也可以这样做,但是它是脆弱的。
my $string = "var1=100 var2=90 var5=hello var3=\"a, b, c\" var7=test var3=hello";
my $regexp = qr/( (?:\w+=[\w\,]+) | (?:\w+=\"[^\"]*\") )/x;
my @matches = $string =~ /$regexp/g;