FLEX - 正则表达式 - 在AWK中匹配注释

I am working on a program to scan a file and highlight (g)awk tokens. Flex is used to generate a lexer scanner for (g)awk.

我正在开发一个程序来扫描文件并突出显示(g)awk令牌。 Flex用于生成(g)awk的词法扫描程序。

My Problem: I am inexperienced in writing regular expressions for Flex. Right now I cannot figure out how to generate a regular expression for matching an entire comment. See the below sample .awk program which will be scanned:

我的问题:我没有经验为Flex编写正则表达式。现在我无法弄清楚如何生成匹配整个注释的正则表达式。请参阅以下示例.awk程序,该程序将被扫描:

#!/usr/bin/awk -f
###############################################################################
#
# @(#) solve.awk - sudoku solver in awk using efficient backtracking algorithm
# @(#) $Id: solve.awk,v 1.16 2008/03/24 04:04:44 bduncan Exp bduncan $
# @(#) Copyright (C) 2005-2008, Bill Duncan, <bduncan-sudoku@beachnet.org>
#
# License:
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see http://www.gnu.org/licenses/.
#
# Description:
# - uses simple recursive backtracking algorithm with up-front and
# ongoing elimination of invalid tries by tracking for each row,
# column and region..
#
# Notes:
# - precalc of regmap didn't seem to make a difference
# - removed unmark() function and added a parameter to mark()
#
# Variables:
# regmap[r,c] ; pre-compiled, which region sector is r,c
# master[r,c] ; master matrix
# C[col, elem] ; true if elem in Column
# R[row, elem] ; true if elem in Row
# Q[reg, elem] ; true if elem in Region (Quadrant)
#
###############################################################################
BEGIN {
SUBSEP = "," # so we can dump and it looks nice
ORDER = 9
DEBUG = 0
count = 0
# precompile region map for faster lookup
# for (i = 0; i < ORDER; i++)
# for (j = 0; j < ORDER; j++)
# regmap[i+1,j+1] = int(i/3)*3+int(j/3)+1
}
function dump( i,j) {
printf "\n"
for (i=1;i<=ORDER;i++) {
if (!((i-1)%3)) printf "\n"
for (j=1;j<=ORDER;j++) {
if (!((j-1)%3)) printf " "
printf " %1d",master[i,j]
}
printf "\n"
}
printf "\n"
}
function fregmap(r,c) {
# return regmap[r,c]
return int((r-1)/3)*3+int((c-1)/3)+1
}
function inuse(r,c,try) {
# q = fregmap(r,c)
# can we use it or is it in use? returns true if already used, not avail
return (C[c,try] || R[r,try] || Q[fregmap(r,c),try])
}
function mark(r,c,try, flag, q) {
q = fregmap(r,c)
Q[q,try] = flag
R[r,try] = flag
C[c,try] = flag
master[r,c] = flag ? try : 0
}
function search(r,c, q,i,a,try) {
# find the next empty slot from here r,c
# if we've reached the end (no more empty) do check?
# for each available number, recurse search
count++
while (master[r,c]) {
if (++c > ORDER) {
c = 1
if (++r > ORDER) {
# then we're done filling! return goodness
return 1
}
}
}
# for each of the available numbers for this slot
for (try=1; try <= ORDER; try++) {
if (! inuse(r,c,try)) {
mark(r,c,try, 1)
if (search(r,c)) return 1
# else zero returned -- unwind
mark(r,c,try, 0) # unmark
}
}
return 0
}
############
# PATTERNS #
############
NF == 0 { next }
$1 ~ /^#/ { next }
NF != ORDER {
printf "error on line %d, NF=%d\n", FNR, NF
exit 1
}
{
++row
for (col=1; col <= ORDER; col++) {
mark(row,col,$col, 1)
}
}
END {
search(1,1)
printf "\n# Searches=%d\n", count
dump()
}

I am currently using "^#+" to match comments. This matches all of the "#" characters however it does not match the rest of the characters in that line. How do you match everything followed after a "#"?

我目前正在使用“^#+”来匹配评论。这匹配所有“#”字符,但它与该行中的其余字符不匹配。如何匹配“#”之后的所有内容?

Flex pattern structures can be reviewed in the Flex manual.

可以在Flex手册中查看Flex图案结构。

1 个解决方案

#1

The + modifier means 'one or more of previous pattern' which is the literal #, so this only matches the start section of a line that contains one or more consecutive hashes starting from column 1.

+修饰符表示“前一个模式中的一个或多个”,即文字#,因此这仅匹配包含从第1列开始的一个或多个连续哈希的行的开始部分。

For the anchored (start of line) match, you will need:

对于锚定(开始线)匹配,您将需要:

^#.*

AFAICR, the . does not match newlines. This means a line starting with # followed by zero or more other characters of any type (except newline).

AFAICR,。与换行符不匹配。这意味着以#开头的行后跟零个或多个任何类型的其他字符(换行符除外)。

Don't forget that awk comments are not constrained to start at the beginning of a line:

不要忘记awk注释不限于从一行的开头开始:

awk '{
         # This comment is indented by a number of spaces
         print $1; # And this is preceded by a command
     }'

Within broad limits, any time you come across a #, it's a start of a comment. The limits exclude the bodies of strings and regular expressions:

在很大的范围内,每当你遇到#时,它都是评论的开始。限制排除了字符串和正则表达式的主体:

awk '{ print "# Not a comment" }'

awk '/#.*/ { print "Line contains a # comment: ", $0; }'

Etc. So, you need your regex rule and string rule to kick in before the comment rule does.

等等。因此,您需要在评论规则执行之前启用正则表达式规则和字符串规则。

#1