I have found one method, but I don't understand the principle:
我找到了一种方法,但我不理解原理:
#remove lines starting with //
$file =~ s/(?<=\n)[ \t]*?\/\/.*?\n//sg;
How does (?<=\n)[ \t]*?
work?
(如何? < = \ n)\[t]* ?工作吗?
1 个解决方案
#1
3
The critical piece is the lookbehind (?<=...)
. It is a zero-width assertion, what means that it does not consume its match -- it only asserts that the pattern given inside is indeed in the string, right before the pattern that follows it.
关键部分是lookbehind(?<=…)。它是一个零宽度断言,这意味着它不使用它的匹配——它只断言内部给定的模式确实在字符串中,就在后面的模式之前。
So (?<=\n)[ \t]
matches either a space or a tab, [ \t]
, that has a newline before it. With the quantifier, [ \t]*
, it matches a space-or-tab any number of times (possibly zero). Then we have the //
(each escaped by \
). Then it matches any character any number of times up to the first newline, .*?\n
.
所以(?<=\n)[\t]匹配一个空格或一个制表符[\t],它前面有一个换行符。使用量词,[\t]*,它匹配一个空格-tab任意次数(可能为零)。然后我们有//(每个都是用\来转义的)。然后,它匹配任何一个字符数到第一个换行数。*?\n。
Here ?
makes .*
non-greedy so that it stops at the first match of the following pattern.
在这里吗?make .*非贪心,使它在以下模式的第一个匹配处停止。
This can be done in other ways, too.
这也可以通过其他方式实现。
$file =~ s{ ^ \s* // .*? \n }{}gmx
The modifier m
makes anchors ^
and $
(unused here) match the beginning and end of each line. I use {}{}
as delimiters so that I don't have to escape /
. The modifier x
allows use of spaces (and comments and newlines) inside for readability.
改性剂m使锚^和$(未使用的)匹配每一行的开始和结束。我使用{}{}作为分隔符,以便不必转义/。修饰符x允许使用内部的空格(以及注释和换行)来提高可读性。
You can also do it by split-ing the string by newline and passing lines through grep
还可以通过换行对字符串进行分割,并通过grep传递行
my $new_file = join '\n', grep { not m|^\s*//.*| } split /\n/, $file;
The split
returns a list of lines and this is input for grep
, which passes those for which the code in the block evaluates to true. The list that it returns is then joined back, if you wish to again have a multiline string. If you want lines remove join '\n'
and assign to an array instead.
分割返回行列表,这是grep的输入,它将块中的代码计算为true的代码传递给这些行。如果您希望再次拥有多行字符串,那么它返回的列表将返回。如果想要换行,请删除join '\n'并将其分配给数组。
The regex in the grep
block is now far simpler, but the whole thing may be an eye-full in comparison with the previous regex. However, this approach can turn hard jobs into easy ones: instead of going for a monster master regex, break the string and process the pieces easily.
grep块中的regex现在要简单得多,但是与以前的regex相比,整个过程可能非常复杂。然而,这种方法可以把困难的工作变成容易的工作:与其去找一个强大的regex,不如打破常规,轻松地处理这些部分。
#1
3
The critical piece is the lookbehind (?<=...)
. It is a zero-width assertion, what means that it does not consume its match -- it only asserts that the pattern given inside is indeed in the string, right before the pattern that follows it.
关键部分是lookbehind(?<=…)。它是一个零宽度断言,这意味着它不使用它的匹配——它只断言内部给定的模式确实在字符串中,就在后面的模式之前。
So (?<=\n)[ \t]
matches either a space or a tab, [ \t]
, that has a newline before it. With the quantifier, [ \t]*
, it matches a space-or-tab any number of times (possibly zero). Then we have the //
(each escaped by \
). Then it matches any character any number of times up to the first newline, .*?\n
.
所以(?<=\n)[\t]匹配一个空格或一个制表符[\t],它前面有一个换行符。使用量词,[\t]*,它匹配一个空格-tab任意次数(可能为零)。然后我们有//(每个都是用\来转义的)。然后,它匹配任何一个字符数到第一个换行数。*?\n。
Here ?
makes .*
non-greedy so that it stops at the first match of the following pattern.
在这里吗?make .*非贪心,使它在以下模式的第一个匹配处停止。
This can be done in other ways, too.
这也可以通过其他方式实现。
$file =~ s{ ^ \s* // .*? \n }{}gmx
The modifier m
makes anchors ^
and $
(unused here) match the beginning and end of each line. I use {}{}
as delimiters so that I don't have to escape /
. The modifier x
allows use of spaces (and comments and newlines) inside for readability.
改性剂m使锚^和$(未使用的)匹配每一行的开始和结束。我使用{}{}作为分隔符,以便不必转义/。修饰符x允许使用内部的空格(以及注释和换行)来提高可读性。
You can also do it by split-ing the string by newline and passing lines through grep
还可以通过换行对字符串进行分割,并通过grep传递行
my $new_file = join '\n', grep { not m|^\s*//.*| } split /\n/, $file;
The split
returns a list of lines and this is input for grep
, which passes those for which the code in the block evaluates to true. The list that it returns is then joined back, if you wish to again have a multiline string. If you want lines remove join '\n'
and assign to an array instead.
分割返回行列表,这是grep的输入,它将块中的代码计算为true的代码传递给这些行。如果您希望再次拥有多行字符串,那么它返回的列表将返回。如果想要换行,请删除join '\n'并将其分配给数组。
The regex in the grep
block is now far simpler, but the whole thing may be an eye-full in comparison with the previous regex. However, this approach can turn hard jobs into easy ones: instead of going for a monster master regex, break the string and process the pieces easily.
grep块中的regex现在要简单得多,但是与以前的regex相比,整个过程可能非常复杂。然而,这种方法可以把困难的工作变成容易的工作:与其去找一个强大的regex,不如打破常规,轻松地处理这些部分。