如何在Regex中指定特定的字符串

I'm tinkering around with flex and bison to create a small calculator program. The token will be something like this:

我用flex和bison创建了一个小型的计算器程序。令牌是这样的:

read A
read B

sum := A + B
write sum

Read, write will be keyword indicating reading a value in or writing a value to the output. ":=" is the assignment operator. A,B are identifiers, which can be strings. There will also be comment //comment and block comment /* asdfsd */

读，写将是关键字，指示读取一个值或写入一个值到输出。":="是赋值运算符。A,B是标识符，可以是字符串。还有评论//评论和块评论/* asdfsd */

Would these regular expression be correct to specify the little grammar I specify?

这些正则表达式是否正确地指定了我指定的小语法?

[:][=]    //assignment operator
[ \t]     //skipping whitespace
[a-zA-Z0-9]+      //identifiers
[Rr][Ee][Aa][Dd]   //read symbols, not case-sensitive
[/][/]         `//comment`

For the assignment operator and the comment regex, can I just do this instead? would flex and bison accept it?

对于赋值操作符和注释regex，我可以这样做吗?flex和bison会接受吗?

":="      //assignment operator
"//"      //comment

2 个解决方案

#1

You can start with (with ignore case option):

您可以从(使用忽略大小写选项)开始:

(read|write)\s+[a-z]+ will match read/write expression;
(读|写)\s+[a-z]+将匹配读/写表达式;
[a-z]+\s:=[a-z+\/* -]* will match assignation with simple calculus;
[a-z]+\s:=[a-z+\/* -]*将与简单的微积分配对;
\/\/.* will match an inline comment;
\ \ /。*将匹配内联注释;
\/\*[\s\S]*\*\/ will match multi-lines comments.
\/\*[\s\ s\ s\]*\*\/将匹配多行评论。

Keep in mind that theses are basic regex and may not fit for too complex syntaxes.

请记住，这些是基本的正则表达式，可能不适合过于复杂的语法。

You can try it with Regex101.com for example

你可以用Regex101.com试试

#2

Yes, ":=" and "//" will work, though the comment rule should really be "//".* because you want to skip everything after the // (until the end of line). If you just match "//", flex will try to tokenize what comes after it, which you don't want because a comment doesn't have to consist of valid tokens (and even if it did, those tokens should be seen by the parser).

是的，":"和"/ "将有效，尽管注释规则应该是"/ "。*因为你想在//之后跳过所有内容(直到行尾)。如果您只是匹配“//”，flex将尝试标记后面的内容，这是您不希望的，因为注释不必包含有效的令牌(即使它匹配，这些令牌也应该被解析器看到)。

Further [Rr][Ee][Aa][Dd] should be placed before the identifier rule. Otherwise it will never be matched (because if two rules can match the same lexeme, flex will pick the one that comes first in the file). It can also be written more succinctly as (?i:read) or you can enable case insensitivity globally with %option caseless and just write read.

进一步的[Rr][Ee][Aa][Dd]应该放在标识符规则之前。否则它将永远无法匹配(因为如果两个规则可以匹配相同的lexeme, flex将选择在文件中最先出现的那个)。它也可以更简洁地写成(?i:read)，也可以在全局启用大小写不敏感(%选项为caseless)，只写read。

#1