
时间:2022-09-17

I have found one method, but I don't understand the principle:


#remove lines starting with //

$file =~ s/(?<=\n)[ \t]*?\/\/.*?\n//sg;

How does (?<=\n)[ \t]*? work?

(如何? < = \ n)\[t]* ?工作吗?

The critical piece is the lookbehind (?<=...). It is a zero-width assertion, what means that it does not consume its match -- it only asserts that the pattern given inside is indeed in the string, right before the pattern that follows it.


So (?<=\n)[ \t] matches either a space or a tab, [ \t], that has a newline before it. With the quantifier, [ \t]*, it matches a space-or-tab any number of times (possibly zero). Then we have the // (each escaped by \). Then it matches any character any number of times up to the first newline, .*?\n.


Here ? makes .* non-greedy so that it stops at the first match of the following pattern.

在这里吗?make .*非贪心,使它在以下模式的第一个匹配处停止。

This can be done in other ways, too.


$file =~ s{ ^ \s* // .*? \n }{}gmx

The modifier m makes anchors ^ and $ (unused here) match the beginning and end of each line. I use {}{} as delimiters so that I don't have to escape /. The modifier x allows use of spaces (and comments and newlines) inside for readability.


You can also do it by split-ing the string by newline and passing lines through grep


my $new_file = join '\n', grep { not m|^\s*//.*| } split /\n/, $file;

The split returns a list of lines and this is input for grep, which passes those for which the code in the block evaluates to true. The list that it returns is then joined back, if you wish to again have a multiline string. If you want lines remove join '\n' and assign to an array instead.

分割返回行列表,这是grep的输入,它将块中的代码计算为true的代码传递给这些行。如果您希望再次拥有多行字符串,那么它返回的列表将返回。如果想要换行,请删除join '\n'并将其分配给数组。

The regex in the grep block is now far simpler, but the whole thing may be an eye-full in comparison with the previous regex. However, this approach can turn hard jobs into easy ones: instead of going for a monster master regex, break the string and process the pieces easily.




