C关键字规则没有正确识别。

I wrote a simple lex file to identify C keywords. My rules looks like:

我编写了一个简单的lex文件来识别C关键字。我的规则看起来像:

keyword do|while|char|if
%%
{keyword}  { printf("Keyword %s found.", yytext); }

The problem is the rule correctly identifies char in source code, but it also identifies things like putchar as keyword char. How can I force the rule to only identify the keyword char and not when it's present in other words?

问题是规则正确地标识了源代码中的char，但是它也将putchar标识为关键字char。我如何能强制规则只识别关键字char，而不是在它出现的时候?

2 个解决方案

#1

You need to put keywords before identifiers. That's all. Lex is searching for regular expressions sequentially.

您需要将关键字放在标识符之前。这是所有。Lex按顺序搜索正则表达式。

%%

IF|ELSE|etc {action for keywords }

[a-zA-Z_][a-zA-Z0-9]* {action for identifiers}

%%

#2

Your lexer has to match other things (including something that will match the "put" substring) to allow it to distinguish between keywords and non-keywords.

您的lexer必须匹配其他内容(包括匹配“put”子字符串的内容)，以允许它区分关键字和非关键字。

If I were writing the lexer, I would simplify it by matching possible identifiers and using a lookup table to identify keywords in the resulting tokens.

如果我正在编写lexer，我将通过匹配可能的标识符和使用查找表来识别结果标记中的关键字来简化它。

#1