解析:F#中的延迟初始化和相互递归的monad

时间:2022-12-05 10:00:11

I've been writing a little monadic parser-combinator library in F# (somewhat similar to FParsec) and now tried to implement a small parser for a programming language.

我一直在F#中编写一个小的monadic解析器 - 组合器库(有点类似于FParsec),现在尝试为编程语言实现一个小的解析器。

I first implemented the code in Haskell (with Parsec) which ran perfectly well. The parsers for infix expressions are designed mutually recursive.

我首先在Haskell(使用Parsec)中实现了代码,该代码运行得非常好。中缀表达式的解析器是相互递归设计的。

parseInfixOp :: Parser String -> Parser Expression -> Parser Expression
parseInfixOp operatorParser subParser = ignoreSpaces $ do
                                          x <- ignoreSpaces $ subParser
                                          do
                                            op <- ignoreSpaces $ operatorParser
                                            y  <- parseInfixOp operatorParser subParser
                                            return $ BinaryOp op x y
                                           <|> return x

parseInfix :: [String] -> Parser Expression -> Parser Expression
parseInfix list = parseInfixOp (foldl1 (<|>) $ map string list)

parseExpr :: Parser Expression
parseExpr = parseInfix0

parseInfix0 = parseInfix ["==", "<>", "And", "Or", "Xor", "<", ">", "<=", ">="] parseInfix1
parseInfix1 = parseInfix ["+", "-", "Mod"] parseInfix2
parseInfix2 = parseInfix ["*", "/", "\\"] parseInfix3
parseInfix3 = parseInfix ["^"] parseInfix4
parseInfix4 = parseFactor

parseFactor :: Parser Expression
parseFactor = parseFactor' <|> (betweenChars '(' ')' parseExpr)

parseFactor' :: Parser Expression
parseFactor' = parseString
           <|> parseBool
           <|> parseNumber
           <|> parseVariable
           <|> (try parseFunCall) <|> parseIdentifier  

Since the order of functions doesn't matter and Haskell is evaluating in a non-strict way, this is OK, but F# is evaluating strictly.

由于函数的顺序无关紧要且Haskell正在以非严格的方式进行评估,这是可以的,但F#正在严格评估。

let rec parseExpr = parseInfix0
and parseFactor = (parseFactor') <|> (betweenChars '(' ')' parseExpr)
and parseInfix2 = parseInfix ["^"] parseFactor BinaryOp
and parseInfix1 = parseInfix ["*"; "/"] parseInfix2 BinaryOp
and parseInfix0 = parseInfix ["+"; "-"] parseInfix1 BinaryOp
and parseFunCall = parser {
        let! first = letter
        let! rest = many (letter <|> digit)
        let  funcName = toStr $ first::rest

        do! ignoreSpace
        let! args = betweenChars '(' ')' $ sepBy (parseExpr) ","

        return FunCall(funcName, args)
    }
and parseFactor' =
    parseNumber
<|> parseString
<|> parseBool
<|> parseFunCall
<|> parseIdentifier

F# now either complains about recursive objects and just throws a *Exception at runtime due to an infinite loop or it even doesn't compile the source because "a value would be part of its own definion".

F#现在或者抱怨递归对象,并且由于无限循环而在运行时抛出一个*Exception,或者它甚至不编译源代码,因为“一个值将成为它自己定义的一部分”。

What's the best way to prevent this errors. The debugger advices me to make use functions or lazys instead but what should I make lazy here?

什么是防止这种错误的最佳方法。调试器建议我使用函数或lazys代替但是我应该在这里做什么?

3 个解决方案

#1


What is the warning about recursive objects? Show the text; there's one such warning that is ignorable (indeed, in a sense desirable) for this case.

关于递归对象的警告是什么?显示文字;对于这种情况,有一个这样的警告是可忽略的(实际上,在某种意义上是可取的)。

If it doesn't compile because of recursive values, you simply need to turn the 'syntactic values' into 'syntactic functions'. That is, rather than

如果由于递归值而无法编译,则只需将“语法值”转换为“语法函数”即可。那就是,而不是

...
and parseInfix2 = body
...

use

...
and parseInfix2 x = (body) x
...

even though the type of 'parseInfix2' is the same function type either way... F# (unlike Haskell) will sometimes require you to be explicit (do eta-conversion as above).

即使'parseInfix2'的类型是相同的函数类型,任何一种方式...... F#(与Haskell不同)有时需要你是显式的(如上所述进行eta转换)。

I'd ignore suggestions about inserting 'lazy', parsers are indeed functions, not values, so the eta-conversion will cover the same issue (none of this will be evaluated eagerly at all, it all needs to 'wait' until you pass the string to be parsed before anything starts to 'run').

我忽略了关于插入'lazy'的建议,解析器确实是函数,而不是值,因此eta转换将涵盖相同的问题(根本不会对这些问题进行评估,所有这些都需要'等待'直到你通过在任何事情开始“运行”之前要解析的字符串。

Regarding *Exceptions, if you post a looping snippet of the stack it may help, but you maybe can see for yourself e.g. if you have a left-recursive portion of the grammar that doesn't consume any input and gets caught in a loop. I think that's an easy pitfall for most parsing technologies in most languages.

关于*Exceptions,如果您发布堆栈的循环片段可能有所帮助,但您可以自己看看,例如如果你有一个语法的左递归部分,它不消耗任何输入并被一个循环捕获。我认为这对于大多数语言中的大多数解析技术来说都是一个容易陷阱。

#2


η-conversion is not necessarily a great solution - if you do this, you'll have to prove that the delayed function is run at most once, or pay a lot of overhead for calling it during parsing.

η-转换不一定是一个很好的解决方案 - 如果你这样做,你必须证明延迟函数最多运行一次,或者在解析过程中调用它需要花费很多开销。

You want something like this:

你想要这样的东西:

let rec p1 = lazy (...)
and p2 = ... p1.Value ..

A better way, if you have a workflow builder, is to define the Delay member to do this for you:

如果您有工作流程构建器,更好的方法是定义Delay成员为您执行此操作:

type Builder() =
    member this.Delay(f: unit -> Parser<_>) : Parser<_> = 
        let promise = Lazy.Create f
        Return () |> Bind (fun () -> promise.Value)

let parse = new Builder()


let rec p1 =
    parse {
        ...
    }

and p2 =
    parse {
        ...
    }

#3


Neither the eta rewrite nor the lazy delaying is a sure thing. The F# compiler seems to have a hard time dealing with deep recursions. What worked for me was to collapse the recursion into a single top level function (by passing the function to be called recursively as an argument). This top level is written eta style.

eta重写和延迟延迟都不是肯定的。 F#编译器似乎很难处理深度递归。对我有用的是将递归折叠成单个*函数(通过将函数传递为递归调用作为参数)。这个*是eta风格。

Top level, I have:

*,我有:

let rec myParser s = (parseExpr myParser) s

Note wikipedia says: "In a strict language like OCaml, we can avoid the infinite recursion problem by forcing the use of a closure. ". That is what worked for me.

注意*说:“在像OCaml这样的严格语言中,我们可以通过强制使用闭包来避免无限递归问题。”这对我有用。

#1


What is the warning about recursive objects? Show the text; there's one such warning that is ignorable (indeed, in a sense desirable) for this case.

关于递归对象的警告是什么?显示文字;对于这种情况,有一个这样的警告是可忽略的(实际上,在某种意义上是可取的)。

If it doesn't compile because of recursive values, you simply need to turn the 'syntactic values' into 'syntactic functions'. That is, rather than

如果由于递归值而无法编译,则只需将“语法值”转换为“语法函数”即可。那就是,而不是

...
and parseInfix2 = body
...

use

...
and parseInfix2 x = (body) x
...

even though the type of 'parseInfix2' is the same function type either way... F# (unlike Haskell) will sometimes require you to be explicit (do eta-conversion as above).

即使'parseInfix2'的类型是相同的函数类型,任何一种方式...... F#(与Haskell不同)有时需要你是显式的(如上所述进行eta转换)。

I'd ignore suggestions about inserting 'lazy', parsers are indeed functions, not values, so the eta-conversion will cover the same issue (none of this will be evaluated eagerly at all, it all needs to 'wait' until you pass the string to be parsed before anything starts to 'run').

我忽略了关于插入'lazy'的建议,解析器确实是函数,而不是值,因此eta转换将涵盖相同的问题(根本不会对这些问题进行评估,所有这些都需要'等待'直到你通过在任何事情开始“运行”之前要解析的字符串。

Regarding *Exceptions, if you post a looping snippet of the stack it may help, but you maybe can see for yourself e.g. if you have a left-recursive portion of the grammar that doesn't consume any input and gets caught in a loop. I think that's an easy pitfall for most parsing technologies in most languages.

关于*Exceptions,如果您发布堆栈的循环片段可能有所帮助,但您可以自己看看,例如如果你有一个语法的左递归部分,它不消耗任何输入并被一个循环捕获。我认为这对于大多数语言中的大多数解析技术来说都是一个容易陷阱。

#2


η-conversion is not necessarily a great solution - if you do this, you'll have to prove that the delayed function is run at most once, or pay a lot of overhead for calling it during parsing.

η-转换不一定是一个很好的解决方案 - 如果你这样做,你必须证明延迟函数最多运行一次,或者在解析过程中调用它需要花费很多开销。

You want something like this:

你想要这样的东西:

let rec p1 = lazy (...)
and p2 = ... p1.Value ..

A better way, if you have a workflow builder, is to define the Delay member to do this for you:

如果您有工作流程构建器,更好的方法是定义Delay成员为您执行此操作:

type Builder() =
    member this.Delay(f: unit -> Parser<_>) : Parser<_> = 
        let promise = Lazy.Create f
        Return () |> Bind (fun () -> promise.Value)

let parse = new Builder()


let rec p1 =
    parse {
        ...
    }

and p2 =
    parse {
        ...
    }

#3


Neither the eta rewrite nor the lazy delaying is a sure thing. The F# compiler seems to have a hard time dealing with deep recursions. What worked for me was to collapse the recursion into a single top level function (by passing the function to be called recursively as an argument). This top level is written eta style.

eta重写和延迟延迟都不是肯定的。 F#编译器似乎很难处理深度递归。对我有用的是将递归折叠成单个*函数(通过将函数传递为递归调用作为参数)。这个*是eta风格。

Top level, I have:

*,我有:

let rec myParser s = (parseExpr myParser) s

Note wikipedia says: "In a strict language like OCaml, we can avoid the infinite recursion problem by forcing the use of a closure. ". That is what worked for me.

注意*说:“在像OCaml这样的严格语言中,我们可以通过强制使用闭包来避免无限递归问题。”这对我有用。