I am making a Lexical Analyzer using Flex on Unix. If you've ever used it before you know that you mainly just define the regex for the tokens of whatever language you are writing the Lexical Analyzer for. I am stuck on the final part. I need the correct Regex for multi-line comments that allows something like
我正在Unix上使用Flex创建一个词法分析器。如果您曾经使用过它,您就会知道,您主要是为您正在编写的词汇分析器的任何语言的标记定义regex。我被困在最后的部分。我需要正确的Regex用于允许类似的多行注释
/* This is a comment \*/
but also allows
但也允许
/* This **** //// is another type of comment */
Can anyone help with this?
有人能帮忙吗?
4 个解决方案
#1
14
You don't match C style comments with a simple regular expression in Flex; they require a more complex matching method based on start states. The Flex FAQ says how (well, they do for the /*...*/
form; handling the other form in just the <INITIAL>
state should be simple).
您不能将C风格的注释与Flex中的简单正则表达式匹配;它们需要基于起始状态的更复杂的匹配方法。Flex FAQ(常见问题解答)会告诉你如何(好的,他们为/*…)* /形式;仅以 <初始> 状态处理另一个表单应该很简单)。
#2
8
If you're required to make do with just regex, however, there is indeed a not-too-complex solution:
但是,如果您被要求只使用regex,那么确实有一个不太复杂的解决方案:
"/*"( [^*] | (\*+[^*/]) )*\*+\/
The full explanation and derivation of that regex is excellently elaborated upon here.
In short:
#3
0
http://www.lysator.liu.se/c/ANSI-C-grammar-l.html does:
http://www.lysator.liu.se/c/ANSI-C-grammar-l.html:
"/*" { comment(); }
comment() {
char c, c1;
loop:
while ((c = input()) != '*' && c != 0)
putchar(c);
if ((c1 = input()) != '/' && c != 0) {
unput(c1);
goto loop;
}
if (c != 0)
putchar(c1);
}
A question which would also solve this is How do I write a non-greedy match in LEX / FLEX?
一个同样可以解决这个问题的问题是我如何在LEX / FLEX中编写一个非贪婪匹配?
#4
-2
i don't know flex but i do know regexs. /\/\*.*?\*\//s
should match both types (in PCRE), but if you need to differentiate them in your analyser, you may want to then iterate the list of matches to see if they're the second type with /\*\*\s+\/{4}/
我不知道flex,但我知道regexs。/ \ / \ * . * ?\*\/ s应该匹配这两种类型(在PCRE中),但是如果您需要在您的分析器中对它们进行区分,您可能需要迭代匹配列表,以查看它们是否是第二个类型的/\*\* *\s+\/{4}/
#1
14
You don't match C style comments with a simple regular expression in Flex; they require a more complex matching method based on start states. The Flex FAQ says how (well, they do for the /*...*/
form; handling the other form in just the <INITIAL>
state should be simple).
您不能将C风格的注释与Flex中的简单正则表达式匹配;它们需要基于起始状态的更复杂的匹配方法。Flex FAQ(常见问题解答)会告诉你如何(好的,他们为/*…)* /形式;仅以 <初始> 状态处理另一个表单应该很简单)。
#2
8
If you're required to make do with just regex, however, there is indeed a not-too-complex solution:
但是,如果您被要求只使用regex,那么确实有一个不太复杂的解决方案:
"/*"( [^*] | (\*+[^*/]) )*\*+\/
The full explanation and derivation of that regex is excellently elaborated upon here.
In short:
#3
0
http://www.lysator.liu.se/c/ANSI-C-grammar-l.html does:
http://www.lysator.liu.se/c/ANSI-C-grammar-l.html:
"/*" { comment(); }
comment() {
char c, c1;
loop:
while ((c = input()) != '*' && c != 0)
putchar(c);
if ((c1 = input()) != '/' && c != 0) {
unput(c1);
goto loop;
}
if (c != 0)
putchar(c1);
}
A question which would also solve this is How do I write a non-greedy match in LEX / FLEX?
一个同样可以解决这个问题的问题是我如何在LEX / FLEX中编写一个非贪婪匹配?
#4
-2
i don't know flex but i do know regexs. /\/\*.*?\*\//s
should match both types (in PCRE), but if you need to differentiate them in your analyser, you may want to then iterate the list of matches to see if they're the second type with /\*\*\s+\/{4}/
我不知道flex,但我知道regexs。/ \ / \ * . * ?\*\/ s应该匹配这两种类型(在PCRE中),但是如果您需要在您的分析器中对它们进行区分,您可能需要迭代匹配列表,以查看它们是否是第二个类型的/\*\* *\s+\/{4}/