Trees that Grow

29 September 2023

Trees that Grow is one of my favorite programming patterns for compilers, but I don’t think the paper does a great job of explaining it.

The core idea is pretty simple: many passes in a compiler will want to modify or extend the syntax tree in some way. For example, just after parsing, you’ll probably represent all variables with strings, but in the renamer you’ll want to replace them with a more efficient name representation. Similarly, you’ll probably want to extend many expressions with their types after type checking if you need those for codegen or a language server and if you’re lowering to something like GHC Core, you’ll probably want to extend your syntax tree with explicit type abstractions / applications. Even removing constructors from your AST is reasonable if you desugar some constructor from an earlier stage into other language concepts.

So how do you do this in Haskell?

Let’s say you start of with a syntax tree that looks something like this

data Expr
    = Var Text
    | Lambda Text Expr
    | App Expr Expr
    | Literal Integer

Step 1: Parameterize your AST over the compiler pass it came from

I like to use a data kind here, but you could use separate types. Doing so makes things a little less clear but might allow external code like compiler plugins to add its own passes

data Pass = Parsed | Renamed | Typed

data Expr (p :: Pass)
    = Var Text
    | Lambda Text (Expr p)
    | App (Expr p) (Expr p)
    | Literal Integer

Step 2: Replace all varying types in your AST with type families parameterized over the pass

In our case, the names might vary between passes, so we will replace them with a type family XName¹

data Pass = Parsed | Renamed | Typed

data Expr (p :: Pass)
    = Var (XName p)
    | Lambda (XName p) (Expr p)
    | App (Expr p) (Expr p)
    | Literal Integer

type family XName (p :: Pass) where
    XName Parsed  = Text
    XName Renamed = MuchBetterName
    XName Typed   = MuchBetterName

If you used separate types rather than a data kind, you would need to use an open type family here.

Step 3: Add extension fields to existing constructors.

The paper recommends adding an extension field to every constructor, but I prefer only doing so on demand when necessary. This might break code more often, so if your project is GHC-sized your mileage may vary, but I’ve found it to be much less hassle in practice.

The actual extension field is once again given by a type family. To specify that an extension should not be used, you can set the extension field to () and to disable a constructor completely, you can just set it to Void (the uninhabited type).

In our example, we extend variables with their types after typing and let’s say the renamer desugars literals to church encoded numbers so we can get rid of the Literal constructor afterwards.

data Pass = Parsed | Renamed | Typed

data Expr (p :: Pass)
    = Var (XVar p) (XName p)
    | Lambda (XName p) (Expr p)
    | App (Expr p) (Expr p)
    | Literal (XLiteral p) Integer

type family XName (p :: Pass) where
    XName Parsed  = Text
    XName Renamed = MuchBetterName
    XName Typed   = MuchBetterName

type family XVar (p :: Pass) where
    XVar Parsed  = ()
    XVar Renamed = ()
    XVar Typed   = Type

type family XLiteral (p :: Pass) where
    XLiteral Parsed  = ()
    XLiteral Renamed = Void
    XLiteral Typed   = Void

Step 4: Add an extension constructor

This is what allows us to add additional constructors to the AST. Unlike extension fields, this one shouldn’t exist at all when it’s unused so we’re just going to set it to Void in that case.

data Pass = Parsed | Renamed | Typed

data Expr (p :: Pass)
    = Var (XVar p) (XName p)
    | Lambda (XName p) (Expr p)
    | App (Expr p) (Expr p)
    | Literal (XLiteral p) Integer
    | ExprX (XExpr p)

type family XName (p :: Pass) where
    XName Parsed  = Text
    XName Renamed = MuchBetterName
    XName Typed   = MuchBetterName

type family XVar (p :: Pass) where
    XVar Parsed  = ()
    XVar Renamed = ()
    XVar Typed   = Type

type family XLiteral (p :: Pass) where
    XLiteral Parsed  = ()
    XLiteral Renamed = Void
    XLiteral Typed   = Void

type family XExpr (p :: Pass) where
    XLiteral Parsed  = Void
    XLiteral Renamed = Void
    XLiteral Typed   = TypedExprExt

data TypedExprExt
    = TypeLambda (XName Typed) (Expr Typed)
    | TypeApp (Expr Typed) Type

Step 5: Profit

With a tiny syntax tree like this, Trees That Grow might seem like a lot of effort for little gain, but in practice it’s really not. In a typical compiler, most constructors will not change between passes so the number of extension families really isn’t that large compared to the size of the tree that would need to be duplicated without them.

Also, doing this allows us to write functions that are polymorphic in the compiler pass. It’s even possible to write functions over only a subset of valid expression types (e.g. ones that use MuchBetterName for names) by using an equality constraint. For example:

f :: (XName p ~ MuchMoreEfficientName) => Expr p -> ...

How would you do this in OCaml?

You… wouldn’t. Seriousy, I’ve tried. It’s not pretty. Your first instinct might be that this would be a great use of functors and you wouldn’t be alone². But it’s not. The issue is mutual recursion. Say you want to annotate every variable with its type. If you tried to do this with modules, your attempt would probably look something like this

module AST(Ext : sig 
    type var_ext
end) = struct 
    type expr =
        | ...
        | Var of var_ext * name

    type type_ = ...
end

module Parsed = AST(struct type var_ext = unit end)
module Renamed = AST(struct type var_ext = unit end)
module Typed = AST(struct type var_ext = Typed.type_ end)

Except that doesn’t work, because modules don’t let you tie the knot like that and include a module in the argument to the functor application of its own definition. Mutually recursive modules can get close, but those don’t work without explicit module types so they’re not an option either if you want to avoid duplicating the syntax tree for every single pass.

Trying to get clever with explicit staging might even work for this example, but will break donw once you try to include an expression in an extension field for an expression.

In Polaris, I used a (pretty cursed) PPX rewriter that syntactically inlines a functor application to allow extension fields to reference other types defined in the same pass.

The X stands for “extension” but if you want to read it as “EXTREME” that’s also fine.↩︎
I almost wrote a blog post about this approach once. That would have been very embarassing in retrospect…↩︎