Trees that Grow
29 September 2023Trees that Grow is one of my favorite programming patterns for compilers, but I don’t think the paper does a great job of explaining it.
The core idea is pretty simple: many passes in a compiler will want to modify or extend the syntax tree in some way. For example, just after parsing, you’ll probably represent all variables with strings, but in the renamer you’ll want to replace them with a more efficient name representation. Similarly, you’ll probably want to extend many expressions with their types after type checking if you need those for codegen or a language server and if you’re lowering to something like GHC Core, you’ll probably want to extend your syntax tree with explicit type abstractions / applications. Even removing constructors from your AST is reasonable if you desugar some constructor from an earlier stage into other language concepts.
So how do you do this in Haskell?
Let’s say you start of with a syntax tree that looks something like this
data Expr
= Var Text
| Lambda Text Expr
| App Expr Expr
| Literal Integer
Step 1: Parameterize your AST over the compiler pass it came from
I like to use a data kind here, but you could use separate types. Doing so makes things a little less clear but might allow external code like compiler plugins to add its own passes
data Pass = Parsed | Renamed | Typed
data Expr (p :: Pass)
= Var Text
| Lambda Text (Expr p)
| App (Expr p) (Expr p)
| Literal Integer
Step 2: Replace all varying types in your AST with type families parameterized over the pass
In our case, the names might vary between passes, so we will replace
them with a type family XName
1
data Pass = Parsed | Renamed | Typed
data Expr (p :: Pass)
= Var (XName p)
| Lambda (XName p) (Expr p)
| App (Expr p) (Expr p)
| Literal Integer
type family XName (p :: Pass) where
XName Parsed = Text
XName Renamed = MuchBetterName
XName Typed = MuchBetterName
If you used separate types rather than a data kind, you would need to use an open type family here.
Step 3: Add extension fields to existing constructors.
The paper recommends adding an extension field to every constructor, but I prefer only doing so on demand when necessary. This might break code more often, so if your project is GHC-sized your mileage may vary, but I’ve found it to be much less hassle in practice.
The actual extension field is once again given by a type family. To
specify that an extension should not be used, you can set the extension
field to ()
and to disable a constructor completely, you
can just set it to Void
(the uninhabited type).
In our example, we extend variables with their types after typing and
let’s say the renamer desugars literals to church encoded numbers so we
can get rid of the Literal
constructor afterwards.
data Pass = Parsed | Renamed | Typed
data Expr (p :: Pass)
= Var (XVar p) (XName p)
| Lambda (XName p) (Expr p)
| App (Expr p) (Expr p)
| Literal (XLiteral p) Integer
type family XName (p :: Pass) where
XName Parsed = Text
XName Renamed = MuchBetterName
XName Typed = MuchBetterName
type family XVar (p :: Pass) where
XVar Parsed = ()
XVar Renamed = ()
XVar Typed = Type
type family XLiteral (p :: Pass) where
XLiteral Parsed = ()
XLiteral Renamed = Void
XLiteral Typed = Void
Step 4: Add an extension constructor
This is what allows us to add additional constructors to the AST.
Unlike extension fields, this one shouldn’t exist at all when
it’s unused so we’re just going to set it to Void
in that
case.
data Pass = Parsed | Renamed | Typed
data Expr (p :: Pass)
= Var (XVar p) (XName p)
| Lambda (XName p) (Expr p)
| App (Expr p) (Expr p)
| Literal (XLiteral p) Integer
| ExprX (XExpr p)
type family XName (p :: Pass) where
XName Parsed = Text
XName Renamed = MuchBetterName
XName Typed = MuchBetterName
type family XVar (p :: Pass) where
XVar Parsed = ()
XVar Renamed = ()
XVar Typed = Type
type family XLiteral (p :: Pass) where
XLiteral Parsed = ()
XLiteral Renamed = Void
XLiteral Typed = Void
type family XExpr (p :: Pass) where
XLiteral Parsed = Void
XLiteral Renamed = Void
XLiteral Typed = TypedExprExt
data TypedExprExt
= TypeLambda (XName Typed) (Expr Typed)
| TypeApp (Expr Typed) Type
Step 5: Profit
With a tiny syntax tree like this, Trees That Grow might seem like a lot of effort for little gain, but in practice it’s really not. In a typical compiler, most constructors will not change between passes so the number of extension families really isn’t that large compared to the size of the tree that would need to be duplicated without them.
Also, doing this allows us to write functions that are polymorphic in
the compiler pass. It’s even possible to write functions over only a
subset of valid expression types (e.g. ones that use
MuchBetterName
for names) by using an equality constraint.
For example:
f :: (XName p ~ MuchMoreEfficientName) => Expr p -> ...
How would you do this in OCaml?
You… wouldn’t. Seriousy, I’ve tried. It’s not pretty. Your first instinct might be that this would be a great use of functors and you wouldn’t be alone2. But it’s not. The issue is mutual recursion. Say you want to annotate every variable with its type. If you tried to do this with modules, your attempt would probably look something like this
module AST(Ext : sig
type var_ext
end) = struct
type expr =
| ...
| Var of var_ext * name
type type_ = ...
end
module Parsed = AST(struct type var_ext = unit end)
module Renamed = AST(struct type var_ext = unit end)
module Typed = AST(struct type var_ext = Typed.type_ end)
Except that doesn’t work, because modules don’t let you tie the knot like that and include a module in the argument to the functor application of its own definition. Mutually recursive modules can get close, but those don’t work without explicit module types so they’re not an option either if you want to avoid duplicating the syntax tree for every single pass.
Trying to get clever with explicit staging might even work for this example, but will break donw once you try to include an expression in an extension field for an expression.
In Polaris, I used a (pretty cursed) PPX rewriter that syntactically inlines a functor application to allow extension fields to reference other types defined in the same pass.