如何在Perl中使用正则表达式拆分字符串?

时间:2020-12-11 21:38:04

I have a string in Perl: 'CCCCCCCC^hC^iC^*C^"C^8A'.

我在Perl中有一个字符串:'CCCCCCCC ^ hC ^ iC ^ * C ^“C ^ 8A'。

I want to split this string using a regular expression: "^[any_character]C". In other words, I want to split it by the actual character ^, followed by any character, followed by a specific letter (in this case C, but it could be A, or any other character).

我想使用正则表达式拆分此字符串:“^ [any_character] C”。换句话说,我想用实际字符^,后跟任何字符,然后是特定字母(在本例中为C,但它可以是A或任何其他字符)将其拆分。

I have tried looking at other questions/posts and finally came up with my @split_str = split(/\^(\.)C/, $letters), but this seems not to be working.

我试过看其他问题/帖子,最后想出了我的@split_str = split(/ \ ^(\。)C /,$ letters),但这似乎没有用。

I'm sure I'm doing something wrong, but I don't know what.

我确定我做错了什么,但我不知道是什么。

3 个解决方案

#1


6  

You were very close. There were just a couple of errors in your code. Before I explain them, here's the code I was using to test solutions.

你非常接近。您的代码中只有几个错误。在我解释之前,这是我用来测试解决方案的代码。

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

use Data::Dumper;

$_ = 'CCCCCCCC^hC^iC^*C^"C^8A';

my @data = split /\^(\.)C/;

say Dumper @data;

Running this with your original regex, we get this output:

使用原始正则表达式运行此命令,我们得到此输出:

$VAR1 = 'CCCCCCCC^hC^iC^*C^"C^8A';

No splitting has taken place at all. That's because your regex includes \.. The dot matches any character in a string, but by escaping it with the backslash you have told Perl to treat it as an ordinary dot. There are no dots in your string, so the regex doesn't match and the string is not split.

根本没有发生分裂。这是因为你的正则表达式包含\ ..点匹配字符串中的任何字符,但是通过反斜杠转义它,你告诉Perl将它视为普通点。字符串中没有点,因此正则表达式不匹配且字符串未拆分。

If we remove the backslash, we get this output:

如果我们删除反斜杠,我们得到这个输出:

$VAR1 = 'CCCCCCCC';
$VAR2 = 'h';
$VAR3 = '';
$VAR4 = 'i';
$VAR5 = '';
$VAR6 = '*';
$VAR7 = '';
$VAR8 = '"';
$VAR9 = '^8A';

This is better. Some splitting has taken place. But because we have parentheses around the dot ((.)), Perl has "captured" the characters that the dot matches and added them to the list of values that split() returns.

这个更好。发生了一些分裂。但是因为我们在点((。))周围有括号,Perl“捕获”了点匹配的字符并将它们添加到split()返回的值列表中。

If we remove those parentheses, we get only the values between the split markers.

如果我们删除这些括号,我们只得到分割标记之间的值。

$VAR1 = 'CCCCCCCC';
$VAR2 = '';
$VAR3 = '';
$VAR4 = '';
$VAR5 = '^8A';

Note that we get a few empty elements. That's because in places like "^hC^iC" in your string, there is no data between two adjacent split markers.

请注意,我们得到一些空元素。那是因为在字符串中的“^ hC ^ iC”这样的地方,两个相邻的分割标记之间没有数据。

By moving the parentheses around the whole of the regex (split /(\^.C)/), we can get a list which includes all of the split markers together with the data between them.

通过围绕整个正则表达式(split /(\^.C)/)移动括号,我们可以得到一个列表,其中包括所有拆分标记以及它们之间的数据。

$VAR1 = 'CCCCCCCC';
$VAR2 = '^hC';
$VAR3 = '';
$VAR4 = '^iC';
$VAR5 = '';
$VAR6 = '^*C';
$VAR7 = '';
$VAR8 = '^"C';
$VAR9 = '^8A';

Which of these options is most useful to you depends on exactly what you're trying to do.

哪些选项对您最有用取决于您正在尝试做什么。

#2


5  

When you say [any_character], you must mean . pattern, a dot matches any char but linebreaks symbols, and if you use an s modifier, it will match any char.

当你说[any_character]时,你必须指的是。 pattern,一个点匹配任何char但是linebreaks符号,如果使用s修饰符,它将匹配任何char。

So, in your case, you just should not have escape the dot:

所以,在你的情况下,你不应该逃避点:

@split_str = split /\^.C/, $letters;
                      ^

Or, with an s modifier:

或者,使用s修饰符:

@split_str = split /\^.C/s, $letters;
                         ^

The caret should be escaped to denote a literal caret symbol in a regex pattern.

应该转义插入符以表示正则表达式中的文字插入符号。

#3


-4  

try like this @split_str = split(/\^/, $letters)

试试这样@split_str = split(/ \ ^ /,$ letters)

#1


6  

You were very close. There were just a couple of errors in your code. Before I explain them, here's the code I was using to test solutions.

你非常接近。您的代码中只有几个错误。在我解释之前,这是我用来测试解决方案的代码。

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

use Data::Dumper;

$_ = 'CCCCCCCC^hC^iC^*C^"C^8A';

my @data = split /\^(\.)C/;

say Dumper @data;

Running this with your original regex, we get this output:

使用原始正则表达式运行此命令,我们得到此输出:

$VAR1 = 'CCCCCCCC^hC^iC^*C^"C^8A';

No splitting has taken place at all. That's because your regex includes \.. The dot matches any character in a string, but by escaping it with the backslash you have told Perl to treat it as an ordinary dot. There are no dots in your string, so the regex doesn't match and the string is not split.

根本没有发生分裂。这是因为你的正则表达式包含\ ..点匹配字符串中的任何字符,但是通过反斜杠转义它,你告诉Perl将它视为普通点。字符串中没有点,因此正则表达式不匹配且字符串未拆分。

If we remove the backslash, we get this output:

如果我们删除反斜杠,我们得到这个输出:

$VAR1 = 'CCCCCCCC';
$VAR2 = 'h';
$VAR3 = '';
$VAR4 = 'i';
$VAR5 = '';
$VAR6 = '*';
$VAR7 = '';
$VAR8 = '"';
$VAR9 = '^8A';

This is better. Some splitting has taken place. But because we have parentheses around the dot ((.)), Perl has "captured" the characters that the dot matches and added them to the list of values that split() returns.

这个更好。发生了一些分裂。但是因为我们在点((。))周围有括号,Perl“捕获”了点匹配的字符并将它们添加到split()返回的值列表中。

If we remove those parentheses, we get only the values between the split markers.

如果我们删除这些括号,我们只得到分割标记之间的值。

$VAR1 = 'CCCCCCCC';
$VAR2 = '';
$VAR3 = '';
$VAR4 = '';
$VAR5 = '^8A';

Note that we get a few empty elements. That's because in places like "^hC^iC" in your string, there is no data between two adjacent split markers.

请注意,我们得到一些空元素。那是因为在字符串中的“^ hC ^ iC”这样的地方,两个相邻的分割标记之间没有数据。

By moving the parentheses around the whole of the regex (split /(\^.C)/), we can get a list which includes all of the split markers together with the data between them.

通过围绕整个正则表达式(split /(\^.C)/)移动括号,我们可以得到一个列表,其中包括所有拆分标记以及它们之间的数据。

$VAR1 = 'CCCCCCCC';
$VAR2 = '^hC';
$VAR3 = '';
$VAR4 = '^iC';
$VAR5 = '';
$VAR6 = '^*C';
$VAR7 = '';
$VAR8 = '^"C';
$VAR9 = '^8A';

Which of these options is most useful to you depends on exactly what you're trying to do.

哪些选项对您最有用取决于您正在尝试做什么。

#2


5  

When you say [any_character], you must mean . pattern, a dot matches any char but linebreaks symbols, and if you use an s modifier, it will match any char.

当你说[any_character]时,你必须指的是。 pattern,一个点匹配任何char但是linebreaks符号,如果使用s修饰符,它将匹配任何char。

So, in your case, you just should not have escape the dot:

所以,在你的情况下,你不应该逃避点:

@split_str = split /\^.C/, $letters;
                      ^

Or, with an s modifier:

或者,使用s修饰符:

@split_str = split /\^.C/s, $letters;
                         ^

The caret should be escaped to denote a literal caret symbol in a regex pattern.

应该转义插入符以表示正则表达式中的文字插入符号。

#3


-4  

try like this @split_str = split(/\^/, $letters)

试试这样@split_str = split(/ \ ^ /,$ letters)