I'm working on a lexer for the Python grammar (written in Flex) for a compiler construction class and I'm having trouble getting a properly working regular expression to catch when there is no white space at the beginning of a line (to account for the end of an indented block).
我正在为一个编译器构造类的Python语法编写一个lexer(用Flex编写),当行开头没有空格时(为了解释缩进块的结束),我在获取一个正常工作的正则表达式时遇到了麻烦。
The rule checking for no indentation appears after those checking for comments, blank lines, and indentation. It is also before rules checking for anything else. Here's what it looks like right now:
在检查注释、空行和缩进之后会出现无缩进的规则检查。这也是在规则检查其他东西之前。下面是它现在的样子:
<INITIAL>^[^ \t] {
printf("DEBUG: Expression ^[^ \\t] matches string: %s\n", yytext);
/* Dedent to 0 if not mid-expression */
if(!lineJoin && bracketDepth() == 0)
changeIndent(0);
/* Treat line as normal */
REJECT;
}
As I understand it, the rule above should output that debug line for any line in the lexed file that has actual python code but doesn't start with indentation. However, as it stands now, very few lines in my many text cases display it.
正如我所理解的,上面的规则应该输出带有实际python代码但不以缩进开头的lexed文件中的任何一行的调试行。然而,正如它现在的样子,在我的许多文本框中很少有行显示它。
For example, the debug output appears nowhere for this test case (it also misses the dedent entirely on line 4):
例如,调试输出在这个测试用例中没有出现(它也完全忽略了第4行中的dedent):
myList = [1,2,3,4]
for index in range(len(myList)):
myList[index] += 1
print( myList )
but appears for every line in this one:
但是在这一行的每一行中
a = 1 + 1
b = 2 % 3
c = 1 ^ 1
d = 1 - 1
f = 1 * 1
g = 1 / 1
Given that most of the other rules work properly, I'm led to believe that the regex is the problem in the above rule but I don't see why this one is failing most of the time. Does anyone have any insight?
考虑到其他大多数规则都能正常工作,我认为regex是上述规则中的问题,但我不明白为什么这个规则在大多数情况下会失败。有人知道吗?
1 个解决方案
#1
3
I don't know flex, but I observe that each sample that worked is a single character, while each one that didn't work is not. Perhaps flex is matching against entire tokens instead of single characters? You might try adding a +
after the character class.
我不知道flex,但我注意到每个成功的示例都是一个字符,而每个失败的示例都不是。也许flex是匹配整个令牌而不是单个字符?您可以尝试在字符类之后添加+。
#1
3
I don't know flex, but I observe that each sample that worked is a single character, while each one that didn't work is not. Perhaps flex is matching against entire tokens instead of single characters? You might try adding a +
after the character class.
我不知道flex,但我注意到每个成功的示例都是一个字符,而每个失败的示例都不是。也许flex是匹配整个令牌而不是单个字符?您可以尝试在字符类之后添加+。