我如何处理AST中的评论?

时间:2023-01-24 23:11:40

I am writing a Delphi code parser using Parsec, my current AST data structures look like this:

我正在使用Parsec编写Delphi代码解析器,我目前的AST数据结构如下所示:

module Text.DelphiParser.Ast where

data TypeName = TypeName String [String] deriving (Show)
type UnitName = String
data ArgumentKind = Const | Var | Out | Normal deriving (Show)
data Argument = Argument ArgumentKind String TypeName deriving (Show)
data MethodFlag = Overload | Override | Reintroduce | Static | StdCall deriving (Show)
data ClassMember = 
      ConstField String TypeName
    | VarField String TypeName
    | Property String TypeName String (Maybe String)
    | ConstructorMethod String [Argument] [MethodFlag]
    | DestructorMethod String [Argument] [MethodFlag]
    | ProcMethod String [Argument] [MethodFlag]
    | FunMethod String [Argument] TypeName [MethodFlag]
    | ClassProcMethod String [Argument] [MethodFlag]
    | ClassFunMethod String [Argument] TypeName [MethodFlag]
     deriving (Show)
data Visibility = Private | Protected | Public | Published deriving (Show)
data ClassSection = ClassSection Visibility [ClassMember] deriving (Show)
data Class = Class String [ClassSection] deriving (Show)
data Type = ClassType Class deriving (Show)
data Interface = Interface [UnitName] [Type] deriving (Show)
data Implementation = Implementation [UnitName]  deriving (Show)
data Unit = Unit String Interface Implementation deriving (Show)

I want to preserve comments in my AST data structures and I'm currently trying to figure out how to do this.

我想保留我的AST数据结构中的注释,我现在正试图弄清楚如何做到这一点。

My parser is split into a lexer and a parser (both written with Parsec) and I have already implemented lexing of comment tokens.

我的解析器分为lexer和解析器(都用Parsec编写),我已经实现了注释令牌的lexing。

unit SomeUnit;

interface

uses
  OtherUnit1, OtherUnit2;

type
  // This is my class that does blabla
  TMyClass = class
  var
    FMyAttribute: Integer;
  public
    procedure SomeProcedure;
    { The constructor takes an argument ... }
    constructor Create(const Arg1: Integer);
  end;

implementation

end.

The token stream looks like this:

令牌流如下所示:

[..., Type, LineComment " This is my class that does blabla", Identifier "TMyClass", Equals, Class, ...]

The parser translates this into:

解析器将其转换为:

Class "TMyClass" ...

The Class data type doesn't have any way to attach comments and since comments (especially block comments) could appear almost anywhere in the token stream I would have to add an optional comment to all data types in the AST?

类数据类型没有任何附加注释的方法,因为注释(尤其是块注释)几乎可以出现在令牌流中的任何位置,我必须向AST中的所有数据类型添加可选注释吗?

How can I deal with comments in my AST?

我如何处理AST中的评论?

1 个解决方案

#1


A reasonable approach for dealing with annotated data on an AST is to thread an extra type parameter through that can contain whatever metadata you like. Apart from being able to selectively include or ignore comments, this will also let you include other sorts of information with your tree.

处理AST上的注释数据的一种合理方法是通过一个额外的类型参数来创建,该参数可以包含您喜欢的任何元数据。除了能够有选择地包含或忽略注释之外,这还允许您在树中包含其他类型的信息。

First, you would rewrite all your AST types with an extra parameter:

首先,您将使用额外参数重写所有AST类型:

data TypeName a = TypeName a String [String]
{- ... -}
data ClassSection a = ClassSection a Visibility [ClassMember a]
{- ... -}

It would be useful to add deriving Functor to all of them as well, making it easy to transform the annotations on a given AST.

将衍生Functor添加到所有这些中也很有用,可以轻松转换给定AST上的注释。

Now an AST with the comments remaining would have the type Class Comment or something to that effect. You could also reuse this for additional information like scope analysis, where you would include the current scope with the relevant part of the AST.

现在,带有注释的AST将具有Class Comment类型或类似的效果。您还可以将其重用于范围分析等其他信息,其中您将当前范围包含在AST的相关部分中。

If you wanted multiple annotations at once, the simplest solution would be to use a record, although that's a bit awkward because (at least for now¹) we can't easily write code polymorphic over record fields. (Ie we can't easily write the type "any record with a comments :: Comment field".)

如果你想同时使用多个注释,最简单的解决方案是使用记录,虽然这有点尴尬,因为(至少现在¹)我们不能轻易地在记录字段上编写代码多态。 (即我们不能轻易地写出“带注释的任何记录::注释字段”。)

One additional neat thing you can do is use PatternSynonyms (available from GHC 7.8) to have a suite of patterns that work just like your current unannotated AST, letting you reuse your existing case statements. (To do this, you'll also have to rename the constructors for the annotated types so they don't overlap.)

您可以做的另一个巧妙的事情是使用PatternSynonyms(可从GHC 7.8获得)来获得一套模式,这些模式就像您当前未注释的AST一样,让您重用现有的case语句。 (为此,您还必须重命名带注释类型的构造函数,以使它们不重叠。)

pattern TypeName a as <- TypeName' _ a as

Footnotes

¹ Hopefully part 2 the revived overloaded record fields proposal will help in this regard when it actually gets added to the language.

¹希望第2部分重新启动的重载记录字段提案在实际添加到语言时将在这方面提供帮助。

#1


A reasonable approach for dealing with annotated data on an AST is to thread an extra type parameter through that can contain whatever metadata you like. Apart from being able to selectively include or ignore comments, this will also let you include other sorts of information with your tree.

处理AST上的注释数据的一种合理方法是通过一个额外的类型参数来创建,该参数可以包含您喜欢的任何元数据。除了能够有选择地包含或忽略注释之外,这还允许您在树中包含其他类型的信息。

First, you would rewrite all your AST types with an extra parameter:

首先,您将使用额外参数重写所有AST类型:

data TypeName a = TypeName a String [String]
{- ... -}
data ClassSection a = ClassSection a Visibility [ClassMember a]
{- ... -}

It would be useful to add deriving Functor to all of them as well, making it easy to transform the annotations on a given AST.

将衍生Functor添加到所有这些中也很有用,可以轻松转换给定AST上的注释。

Now an AST with the comments remaining would have the type Class Comment or something to that effect. You could also reuse this for additional information like scope analysis, where you would include the current scope with the relevant part of the AST.

现在,带有注释的AST将具有Class Comment类型或类似的效果。您还可以将其重用于范围分析等其他信息,其中您将当前范围包含在AST的相关部分中。

If you wanted multiple annotations at once, the simplest solution would be to use a record, although that's a bit awkward because (at least for now¹) we can't easily write code polymorphic over record fields. (Ie we can't easily write the type "any record with a comments :: Comment field".)

如果你想同时使用多个注释,最简单的解决方案是使用记录,虽然这有点尴尬,因为(至少现在¹)我们不能轻易地在记录字段上编写代码多态。 (即我们不能轻易地写出“带注释的任何记录::注释字段”。)

One additional neat thing you can do is use PatternSynonyms (available from GHC 7.8) to have a suite of patterns that work just like your current unannotated AST, letting you reuse your existing case statements. (To do this, you'll also have to rename the constructors for the annotated types so they don't overlap.)

您可以做的另一个巧妙的事情是使用PatternSynonyms(可从GHC 7.8获得)来获得一套模式,这些模式就像您当前未注释的AST一样,让您重用现有的case语句。 (为此,您还必须重命名带注释类型的构造函数,以使它们不重叠。)

pattern TypeName a as <- TypeName' _ a as

Footnotes

¹ Hopefully part 2 the revived overloaded record fields proposal will help in this regard when it actually gets added to the language.

¹希望第2部分重新启动的重载记录字段提案在实际添加到语言时将在这方面提供帮助。