Perl如何匹配verilog文件的注释“//”?

时间:2022-09-17 12:16:36

I have found one method, but I don't understand the principle:

我找到了一种方法,但我不理解原理:

#remove lines starting with //

$file =~ s/(?<=\n)[ \t]*?\/\/.*?\n//sg;

How does (?<=\n)[ \t]*? work?

(如何? < = \ n)\[t]* ?工作吗?

1 个解决方案

#1


3  

The critical piece is the lookbehind (?<=...). It is a zero-width assertion, what means that it does not consume its match -- it only asserts that the pattern given inside is indeed in the string, right before the pattern that follows it.

关键部分是lookbehind(?<=…)。它是一个零宽度断言,这意味着它不使用它的匹配——它只断言内部给定的模式确实在字符串中,就在后面的模式之前。

So (?<=\n)[ \t] matches either a space or a tab, [ \t], that has a newline before it. With the quantifier, [ \t]*, it matches a space-or-tab any number of times (possibly zero). Then we have the // (each escaped by \). Then it matches any character any number of times up to the first newline, .*?\n.

所以(?<=\n)[\t]匹配一个空格或一个制表符[\t],它前面有一个换行符。使用量词,[\t]*,它匹配一个空格-tab任意次数(可能为零)。然后我们有//(每个都是用\来转义的)。然后,它匹配任何一个字符数到第一个换行数。*?\n。

Here ? makes .* non-greedy so that it stops at the first match of the following pattern.

在这里吗?make .*非贪心,使它在以下模式的第一个匹配处停止。

This can be done in other ways, too.

这也可以通过其他方式实现。

$file =~ s{ ^ \s* // .*? \n }{}gmx

The modifier m makes anchors ^ and $ (unused here) match the beginning and end of each line. I use {}{} as delimiters so that I don't have to escape /. The modifier x allows use of spaces (and comments and newlines) inside for readability.

改性剂m使锚^和$(未使用的)匹配每一行的开始和结束。我使用{}{}作为分隔符,以便不必转义/。修饰符x允许使用内部的空格(以及注释和换行)来提高可读性。

You can also do it by split-ing the string by newline and passing lines through grep

还可以通过换行对字符串进行分割,并通过grep传递行

my $new_file = join '\n', grep { not m|^\s*//.*| } split /\n/, $file;

The split returns a list of lines and this is input for grep, which passes those for which the code in the block evaluates to true. The list that it returns is then joined back, if you wish to again have a multiline string. If you want lines remove join '\n' and assign to an array instead.

分割返回行列表,这是grep的输入,它将块中的代码计算为true的代码传递给这些行。如果您希望再次拥有多行字符串,那么它返回的列表将返回。如果想要换行,请删除join '\n'并将其分配给数组。

The regex in the grep block is now far simpler, but the whole thing may be an eye-full in comparison with the previous regex. However, this approach can turn hard jobs into easy ones: instead of going for a monster master regex, break the string and process the pieces easily.

grep块中的regex现在要简单得多,但是与以前的regex相比,整个过程可能非常复杂。然而,这种方法可以把困难的工作变成容易的工作:与其去找一个强大的regex,不如打破常规,轻松地处理这些部分。

#1


3  

The critical piece is the lookbehind (?<=...). It is a zero-width assertion, what means that it does not consume its match -- it only asserts that the pattern given inside is indeed in the string, right before the pattern that follows it.

关键部分是lookbehind(?<=…)。它是一个零宽度断言,这意味着它不使用它的匹配——它只断言内部给定的模式确实在字符串中,就在后面的模式之前。

So (?<=\n)[ \t] matches either a space or a tab, [ \t], that has a newline before it. With the quantifier, [ \t]*, it matches a space-or-tab any number of times (possibly zero). Then we have the // (each escaped by \). Then it matches any character any number of times up to the first newline, .*?\n.

所以(?<=\n)[\t]匹配一个空格或一个制表符[\t],它前面有一个换行符。使用量词,[\t]*,它匹配一个空格-tab任意次数(可能为零)。然后我们有//(每个都是用\来转义的)。然后,它匹配任何一个字符数到第一个换行数。*?\n。

Here ? makes .* non-greedy so that it stops at the first match of the following pattern.

在这里吗?make .*非贪心,使它在以下模式的第一个匹配处停止。

This can be done in other ways, too.

这也可以通过其他方式实现。

$file =~ s{ ^ \s* // .*? \n }{}gmx

The modifier m makes anchors ^ and $ (unused here) match the beginning and end of each line. I use {}{} as delimiters so that I don't have to escape /. The modifier x allows use of spaces (and comments and newlines) inside for readability.

改性剂m使锚^和$(未使用的)匹配每一行的开始和结束。我使用{}{}作为分隔符,以便不必转义/。修饰符x允许使用内部的空格(以及注释和换行)来提高可读性。

You can also do it by split-ing the string by newline and passing lines through grep

还可以通过换行对字符串进行分割,并通过grep传递行

my $new_file = join '\n', grep { not m|^\s*//.*| } split /\n/, $file;

The split returns a list of lines and this is input for grep, which passes those for which the code in the block evaluates to true. The list that it returns is then joined back, if you wish to again have a multiline string. If you want lines remove join '\n' and assign to an array instead.

分割返回行列表,这是grep的输入,它将块中的代码计算为true的代码传递给这些行。如果您希望再次拥有多行字符串,那么它返回的列表将返回。如果想要换行,请删除join '\n'并将其分配给数组。

The regex in the grep block is now far simpler, but the whole thing may be an eye-full in comparison with the previous regex. However, this approach can turn hard jobs into easy ones: instead of going for a monster master regex, break the string and process the pieces easily.

grep块中的regex现在要简单得多,但是与以前的regex相比,整个过程可能非常复杂。然而,这种方法可以把困难的工作变成容易的工作:与其去找一个强大的regex,不如打破常规,轻松地处理这些部分。