Lex/Flex :Regular expression for string literals in C/C++?

时间:2022-08-19 09:38:07

I look here ANSI C grammar .

我在这里看ANSI C语法。

This page includes a lot of regular expressions in Lex/Flex for ANSI C.

此页面包含许多用于ANSI C的Lex / Flex中的正则表达式。

Having a problem in understanding regular expression for string literals.

在理解字符串文字的正则表达式时遇到问题。

They have mentioned regular expression as \"(\\.|[^\\"])*\"

他们将正则表达式称为\“(\\。| [^ \\”])* \“

As I can understand \" this is used for double quotes, \\ is for escape character, . is for any character except escape character and * is for zero or more times.

我可以理解“这用于双引号,\\是转义字符,。是除了转义字符之外的任何字符,*是零次或多次。

[^\\"] implies characters except \ , " .

[^ \\“]表示除\,”之外的字符。

So, in my opinion, regular expression should be \"(\\.)*\".

所以,在我看来,正则表达式应该是\“(\\。)* \”。

Can you give some strings where above regular expression will fail?

你能给出一些正则表达式失败的字符串吗?

or

要么

Why they have used [^\\"]?

他们为什么用[^ \\“]?

1 个解决方案

#1


4  

The regex \"(\\.)*\" that you proposed matches strings that consist of \ symbols alternating with any characters like:

你提出的正则表达式\“(\\。)* \”匹配包含\符号的字符串与任何字符交替,如:

"\z\x\p\r"

This regular expression would therefore fail to match a string like:

因此,这个正则表达式将无法匹配如下字符串:

"hello"

The string "hello" would be matched by the regex \".*\" but that would also match the string """" or "\" both of which are invalid.

字符串“hello”将匹配正则表达式\“。* \”,但也匹配字符串“”“”或“\”两者都无效。

To get rid of these invalid matches we can use \"[^\\"]*\", but this will now fail to match a string like "\a\a\a" which is a valid string.

为了摆脱这些无效的匹配,我们可以使用\“[^ \\”] * \“,但现在这将无法匹配像”\ a \ a \ a“这样的字符串,这是一个有效的字符串。

As we saw \"(\\.)*\" does match this string, so all we need to do is combine these two to get \"(\\.|[^\\"])*\".

当我们看到\“(\\。)* \”匹配这个字符串时,我们需要做的就是将这两个结合起来得到\“(\\。| [^ \\”])* \“。

#1


4  

The regex \"(\\.)*\" that you proposed matches strings that consist of \ symbols alternating with any characters like:

你提出的正则表达式\“(\\。)* \”匹配包含\符号的字符串与任何字符交替,如:

"\z\x\p\r"

This regular expression would therefore fail to match a string like:

因此,这个正则表达式将无法匹配如下字符串:

"hello"

The string "hello" would be matched by the regex \".*\" but that would also match the string """" or "\" both of which are invalid.

字符串“hello”将匹配正则表达式\“。* \”,但也匹配字符串“”“”或“\”两者都无效。

To get rid of these invalid matches we can use \"[^\\"]*\", but this will now fail to match a string like "\a\a\a" which is a valid string.

为了摆脱这些无效的匹配,我们可以使用\“[^ \\”] * \“,但现在这将无法匹配像”\ a \ a \ a“这样的字符串,这是一个有效的字符串。

As we saw \"(\\.)*\" does match this string, so all we need to do is combine these two to get \"(\\.|[^\\"])*\".

当我们看到\“(\\。)* \”匹配这个字符串时,我们需要做的就是将这两个结合起来得到\“(\\。| [^ \\”])* \“。