解析类C定义字符串最有效的方法？

I've got a set of function definitions written in a C-like language with some additional keywords that can be put before some arguments(the same way as "unsigned" or "register", for example) and I need to analyze these lines as well as some function stubs and generate actual C code from them.

我有一组用C语言编写的函数定义,其中一些额外的关键字可以放在一些参数之前(例如,与“unsigned”或“register”相同)我需要分析这些行以及一些函数存根并从它们生成实际的C代码。

Is that correct that Flex/Yacc are the most proper way to do it?

Flex / Yacc是最正确的方法吗?
Will it be slower than writing a Shell or Python script using regexps(which may become big pain, as I suppose, if the number of additional keywords becomes bigger and their effects would be rather different) provided that I have zero experience with analysers/parsers(though I know how LALR does its job)?

它会比使用正则表达式写一个Shell或Python脚本慢(这可能会变得非常痛苦,因为我认为,如果额外关键字的数量变得更大并且它们的影响会相当不同),前提是我对分析器/解析器没有经验(虽然我知道LALR如何完成它的工作)?
Are there any good materials on Lex/Yacc that cover similar problems? All papers I could find use the same primitive example of a "toy" calculator.

Lex / Yacc上是否有任何包含类似问题的好材料?我能找到的所有论文都使用了与“玩具”计算器相同的原始例子。

Any help will be appreciated.

任何帮助将不胜感激。

5 个解决方案

#1

ANTLR is commonly used (as are Lex\Yacc).

通常使用ANTLR(和Lex \ Yacc一样)。

ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.

ANTLR是另一种语言识别工具,是一种语言工具,它提供了一个框架,用于从包含各种目标语言中的动作的语法描述构建识别器,解释器,编译器和翻译器。

#2

There is also the Lemon Parser, which features a less restrictive grammar. The down side is you're married to lemon, re-writing a parser's grammar to something else when you discover some limitation sucks. The up side is its really easy to use .. and self contained. You can drop it in tree and not worry about checking for the presence of others.

还有Lemon Parser,其语法限制较少。不好的一面是你和柠檬结婚了,当你发现一些限制很糟糕时,将解析器的语法重写为其他东西。好的一面是它非常容易使用..并且自给自足。您可以将其放在树中,而不用担心检查是否存在其他人。

SQLite3 uses it, as do several other popular projects. I'm not saying use it because SQLite does, but perhaps give it a try if time permits.

SQLite3和其他一些受欢迎的项目一样使用它。我不是说使用它,因为SQLite会这样做,但如果时间允许,也许可以尝试一下。

#3

That entirely depends on your definition of "effective". If you have all the time of the world, the fastest parser would be a hand-written pull parser. They take a long time to debug and develop but today, no parser generator beats hand-written code in terms of runtime performance.

这完全取决于你对“有效”的定义。如果你有世界上所有的时间,最快的解析器将是一个手写的拉解析器。它们需要很长时间才能进行调试和开发,但是今天,没有任何解析器生成器在运行时性能方面胜过手写代码。

If you want something that can parse valid C within a week or so, use a parser generator. The code will be fast enough and most parser generators come with a grammar for C already which you can use as a starting point (avoiding 90% of the common mistakes).

如果你想在一周左右的时间内解析有效的C,可以使用解析器生成器。代码将足够快,并且大多数解析器生成器都带有C语法,您可以将其用作起点(避免90%的常见错误)。

Note that regexps are not suitable for parsing recursive structures. This approach would both be slower than using a generator and more error prone than a hand-written pull parser.

请注意,regexp不适合解析递归结构。这种方法比使用生成器慢,并且比手写的pull解析器更容易出错。

#4

actually, it depends how complex is your language and whether it's really close to C or not...

实际上,这取决于你的语言有多复杂,以及它是否真的接近于C ......

Still, you could use lex as a first step even for regular expression ....

尽管如此,你仍然可以使用lex作为第一步,即使是正则表达式....

I would go for lex + menhir and o'caml....

我会选择lex + menhir和o'caml ....

but any flex/yacc combination would be fine..

但任何flex / yacc组合都可以。

The main problem with regular bison (the gnu implementation of yacc) stems from the C typing.. you have to describe your whole tree (and all the manipulation functions)... Using o'caml would be really easier ...

常规野牛(yacc的gnu实现)的主要问题源于C类型..你必须描述你的整个树(以及所有操作函数)...使用o'caml会非常容易...

#5

For what you want to do, our DMS Software Reengineering Toolkit is likely a very effective solution.

对于您想要做的事,我们的DMS软件再造工具包可能是一个非常有效的解决方案。

DMS is designed specifically to support customer analyzers/code generators of the type you are discussing. It provides very strong facilities for defining arbitrary language parsers/analyzers (tested on 30+ real languages including several complete dialects of C, C++, Java, C#, and COBOL).

DMS专门用于支持您正在讨论的类型的客户分析器/代码生成器。它为定义任意语言解析器/分析器提供了非常强大的功能(在30多种真实语言上测试,包括几种完整的C,C ++,Java,C#和COBOL方言)。

DMS automates the construction of ASTs (so you don't have to do anything but get the grammar right to have a usable AST), enables the construction of custom analyses of exactly the pattern-directed inspection you indicated, can construct new C-specific ASTs representing the code you want to generate, and spit them out as compilable C source text. The pre-existing definitions of C for DMS can likely be bent to cover your C-like language.

DMS自动构建AST(因此您无需做任何事情,只需获得正确的语法以获得可用的AST),可以构建完全针对您指示的模式定向检查的自定义分析,可以构建新的C特定的代表您要生成的代码的AST,并将它们作为可编译的C源文本吐出。对于DMS,C的预先存在的定义可能会倾向于覆盖您的C语言。

#1