解析表达式(带有自定义函数和操作)

时间:2022-12-20 00:07:01

I have a string, which contains a custom expression, I have to parse and evaluate:

我有一个字符串,它包含一个自定义表达式,我必须解析和计算:

For example:

例如:

(FUNCTION_A(5,4,5) UNION FUNCTION_B(3,3)) 
INTERSECT (FUNCTION_C(5,4,5) UNION FUNCTION_D(3,3))

FUNCTION_X represent functions, which are implemented in C# and return ILists. UNION or INTERSECT are custom functions which should be applied to the lists, which are returned from those functions.

FUNCTION_X表示函数,这些函数在c#中实现并返回ILists。UNION或INTERSECT是应该应用于列表的自定义函数,这些函数返回列表。

Union and intersect are implemented via Enumerable.Intersect/Enumerable.Union.

联合和相交是通过可枚举的。

How can the parsing and evaluating be implemented in an elegant and expandable manner?

如何以优雅和可扩展的方式实现解析和评估?

1 个解决方案

#1


5  

It depends on how complex your expressions will become, how many different operators are going to be available, and a whole number of different variables. Whichever way you do it, you will probably need to first determine a grammar for your mini-language.

它取决于你的表达式有多复杂,有多少不同的运算符可用,还有很多不同的变量。无论使用哪种方法,您都可能需要首先确定迷你语言的语法。

For simple grammars, you can just write a custom parser. In the case of many calculators and similar applications, a recursive descent parser is expressive enough to handle the grammar and is intuitive to write. The linked Wikipedia page gives a sample grammar and the implementation of a C parser for it. Eric White also has a blog post on building recursive descent parsers in C#.

对于简单的语法,您可以编写一个自定义解析器。在许多计算器和类似应用程序的情况下,递归下降解析器具有足够的表达能力来处理语法,并且编写起来很直观。链接的Wikipedia页面给出了一个示例语法和一个C解析器的实现。Eric White也有一篇关于在c#中构建递归下降解析器的博客文章。

For more complex grammars, you will likely want to skip the work of creating this yourself and use a lex/yacc-type lexer and parser toolset. Normally you give as input to these a grammar in EBNF or similar syntax, and they will produce the code necessary to parse the input for you. The parser will typically return a syntax tree which you can traverse, allowing you to apply logic for each token in the input stream (each node in the tree). For C#, I have worked with GPLex and GPPG, but others such as ANTLR are also available.

对于更复杂的语法,您可能想跳过创建这个自己的工作,并使用lex/ yaccc类型的lexer和解析器工具集。通常,您将EBNF或类似语法作为这些语法的输入,它们将生成解析输入所需的代码。解析器通常返回一个可以遍历的语法树,允许您为输入流中的每个标记(树中的每个节点)应用逻辑。对于c#,我曾与GPLex和GPPG一起工作过,但也有其他的,比如ANTLR。

Basic Parsing Concepts

In general, you want to be able to split each item in the input into a meaningful token, and build a tree based on those tokens. Once the tree is built, you can traverse the tree and perform the necessary action at each node. A syntax tree for FUNCTION_A(5,4,5) UNION FUNCTION_B(3,3) might look like this, where the node types are in capital letters and their values are in parenthesis:

通常,您希望能够将输入中的每个项分割为一个有意义的令牌,并基于这些令牌构建树。一旦构建了树,您可以遍历树并在每个节点上执行必要的操作。FUNCTION_A(5,4,5) UNION FUNCTION_B(3,3)的语法树可能是这样的,其中节点类型用大写字母表示,其值用括号括起来:

                        PROGRAM
                           |
                           |
                         UNION
                           |
            ------------------------------
           |                              |
  FUNCTION (FUNCTION_A)          FUNCTION(FUNCTION_B)
        |                                 |
  -------------                       ----------
 |      |      |                     |          |
INT(5) INT(4) INT(5)                INT(3)     INT(3)

The parser needs to be smart enough to know that when a UNION is found, it needs to be supplied with two items to union, etc. Given this tree, you would start at the root (PROGRAM) and do a depth-first traversal. At the UNION node, the action would be to first visit all children, and then union the results together. At a FUNCTION node, the action would be to first visit all of the children, find their values, and use those values as parameters to the function, and secondly to evaluate the function on those inputs and return the value.

解析器需要足够智能才能知道,当找到一个联合时,需要为它提供两个条目来进行联合,等等。在UNION节点上,操作是首先访问所有儿童,然后将结果联合在一起。在函数节点上,操作将首先访问所有子节点,找到它们的值,并将这些值作为函数的参数,然后在这些输入上评估函数并返回值。

This would continue for all tokens, for any expression you can come up with. In this way, if you spend the time to get the parser to produce the right tree and each node knows how to perform whatever action it needs to, your design is very extensible and can handle any input that matches the grammar it was designed for.

对于所有令牌,对于您能想到的任何表达式,这都将继续。通过这种方式,如果您花时间让解析器生成正确的树,并且每个节点都知道如何执行它需要的任何操作,那么您的设计是非常可扩展的,并且可以处理与它设计的语法匹配的任何输入。

#1


5  

It depends on how complex your expressions will become, how many different operators are going to be available, and a whole number of different variables. Whichever way you do it, you will probably need to first determine a grammar for your mini-language.

它取决于你的表达式有多复杂,有多少不同的运算符可用,还有很多不同的变量。无论使用哪种方法,您都可能需要首先确定迷你语言的语法。

For simple grammars, you can just write a custom parser. In the case of many calculators and similar applications, a recursive descent parser is expressive enough to handle the grammar and is intuitive to write. The linked Wikipedia page gives a sample grammar and the implementation of a C parser for it. Eric White also has a blog post on building recursive descent parsers in C#.

对于简单的语法,您可以编写一个自定义解析器。在许多计算器和类似应用程序的情况下,递归下降解析器具有足够的表达能力来处理语法,并且编写起来很直观。链接的Wikipedia页面给出了一个示例语法和一个C解析器的实现。Eric White也有一篇关于在c#中构建递归下降解析器的博客文章。

For more complex grammars, you will likely want to skip the work of creating this yourself and use a lex/yacc-type lexer and parser toolset. Normally you give as input to these a grammar in EBNF or similar syntax, and they will produce the code necessary to parse the input for you. The parser will typically return a syntax tree which you can traverse, allowing you to apply logic for each token in the input stream (each node in the tree). For C#, I have worked with GPLex and GPPG, but others such as ANTLR are also available.

对于更复杂的语法,您可能想跳过创建这个自己的工作,并使用lex/ yaccc类型的lexer和解析器工具集。通常,您将EBNF或类似语法作为这些语法的输入,它们将生成解析输入所需的代码。解析器通常返回一个可以遍历的语法树,允许您为输入流中的每个标记(树中的每个节点)应用逻辑。对于c#,我曾与GPLex和GPPG一起工作过,但也有其他的,比如ANTLR。

Basic Parsing Concepts

In general, you want to be able to split each item in the input into a meaningful token, and build a tree based on those tokens. Once the tree is built, you can traverse the tree and perform the necessary action at each node. A syntax tree for FUNCTION_A(5,4,5) UNION FUNCTION_B(3,3) might look like this, where the node types are in capital letters and their values are in parenthesis:

通常,您希望能够将输入中的每个项分割为一个有意义的令牌,并基于这些令牌构建树。一旦构建了树,您可以遍历树并在每个节点上执行必要的操作。FUNCTION_A(5,4,5) UNION FUNCTION_B(3,3)的语法树可能是这样的,其中节点类型用大写字母表示,其值用括号括起来:

                        PROGRAM
                           |
                           |
                         UNION
                           |
            ------------------------------
           |                              |
  FUNCTION (FUNCTION_A)          FUNCTION(FUNCTION_B)
        |                                 |
  -------------                       ----------
 |      |      |                     |          |
INT(5) INT(4) INT(5)                INT(3)     INT(3)

The parser needs to be smart enough to know that when a UNION is found, it needs to be supplied with two items to union, etc. Given this tree, you would start at the root (PROGRAM) and do a depth-first traversal. At the UNION node, the action would be to first visit all children, and then union the results together. At a FUNCTION node, the action would be to first visit all of the children, find their values, and use those values as parameters to the function, and secondly to evaluate the function on those inputs and return the value.

解析器需要足够智能才能知道,当找到一个联合时,需要为它提供两个条目来进行联合,等等。在UNION节点上,操作是首先访问所有儿童,然后将结果联合在一起。在函数节点上,操作将首先访问所有子节点,找到它们的值,并将这些值作为函数的参数,然后在这些输入上评估函数并返回值。

This would continue for all tokens, for any expression you can come up with. In this way, if you spend the time to get the parser to produce the right tree and each node knows how to perform whatever action it needs to, your design is very extensible and can handle any input that matches the grammar it was designed for.

对于所有令牌,对于您能想到的任何表达式,这都将继续。通过这种方式,如果您花时间让解析器生成正确的树,并且每个节点都知道如何执行它需要的任何操作,那么您的设计是非常可扩展的,并且可以处理与它设计的语法匹配的任何输入。