Blazingly Fast™ Type Class Resolution with Tries
19 February 2024One of the more fundamental hurdles when implementing type classes is instance resolution. A programmer might write any possible number of instances for a given class, so how do we find the one we are looking for?
For the remaining examples, let’s assume that our program contains the following instances
class C(a)
instance C(Int)
instance C(List(Bool))
instance C(List(String))
instance forall a. C((a, Int))
instance forall a. C((Bool, a))
Now, the easiest way to find an instance for a constraint, say
C((String, Int))
, is to linearly search through every
single instance until one matches the constraint arguments.
For example, in this case, we would try (and fail) to match every
single instance until we hit forall a. C((a, Int))
.
Somewhat surprisingly, this is actually the strategy GHC uses.1 But with a large number of instances, this is not exactly efficient and as it turns out, with a little effort, we can do much better!
We need to trie harder2
If you already know what tries are, the context in which you learned about them was probably very different.
The traditional use case for tries is matching sequences of text. The
prime example here is text auto-completion. Imagine that our set of
valid words consists of
["dog", "cat", "car", "candle", "dune", "can", "current", "dose", "bat", "category"]
and the user has typed in "cat"
. How do we find the set of
valid completions?
Just like above, we could linearly walk through the set of valid
words and pick out all the ones that contain the prefix
cat
. For reasons that will become apparent soon, we can
visualize this flat set of all valid words as a flat tree.
This is obviously pretty inefficient. If you were to try and optimize
this, you might try the following: All words that contain the prefix
cat
start with a c
, so we can group our words
by their first character to immediately filter out all the ones that
don’t start with a c
.
With this optimization, our tree, now represented as a map from characters to sets of words, looks like this.
Cool! This already cut the number of words we need to search through roughly in half. To speed things up further, you might notice that exactly the same argument applies to the second character, as well as the third, fourth, and every one after that!
So if we apply this optimization to every character, effectively unrolling the words into fancy interlinked lists of characters, we end up with a trie!
Now, instead of being linear in the number of words times their length, our search is only linear in the size of the prefix we are looking for. In practice, where words tend to share prefixes, this turns out to be much more efficient!
Back to the world of types
So, how can we use tries for type classes? Well, first of all, we need a way to unroll our complicated nested types into a linear string of atomic units that play a similar role to the characters in our word trie.
For most types, this will be pretty straightforward. We can unroll
each of them into a linear string of head constructors. You can
view this transformation as a depth-first search on our type that
separates every type constructor from its arguments in every step. For
example, Either(List(String), Array((Int, Bool)))
will be
unrolled into
Either(_, _) -> List(_) -> String -> Array(_) -> (_, _) -> Int -> Bool
.
We can now already construct a trie to hold our instances from above.
You might even think that we could resolve instances by simply searching through the trie just like we did in our text prediction example above, but it’s not quite that simple.
Universally quantified complications
The issue is that some instances can contain universally quantified
type variables! In our case, we have two such instances:
forall a. C(a, Int)
and
forall a. C(Bool, a)
.
Trying to resolve, say C(Bool, Int)
, needs to try both
instances and report an ambiguity error since both of them match.
More generally, at every inner node in our trie, there are either 0, 1, or 2 matching subtries that we need to check.
For example, looking up this instance in the trie above would traverse all of the red nodes.
This might sound like it could lead to exponential blow-up, but it actually doesn’t! In the absolute worst case, instance lookup needs to traverse every path through our trie, but that is exactly equivalent to what our naive linear search strategy did so it’s at least definitely not slower than that.
Okay, so we traverse every possible path through our trie that matches the types we are looking for. Is that it?
Not quite. Consider that variables might be used more than once in an
instance. For example, given an instance
forall a. C((a, a))
, we need to check that the two
arguments to the tuple correspond to the same type.
To achieve this, we need to remember the first type we match a variable against along a given path. If we come across this variable again, we patch the trie by inserting the (unrolled) type where the variable would be.
For example, say we have a trie with an instance for
forall a. C((a, (a, Bool)))
. Now, matching this trie
against C((List(Int), (List(Int), Bool)))
follows this
path. Once the traversal comes across the second occurrence of
a
, it locally patches the trie by replacing a
with the blue path.
And that’s it! This is enough to fully implement type class resolution in languages like Haskell, Rust or PureScript.
What about entailment?
This can easily match simple instances, but what about instances like
forall a. C(a) => C(List(a))
? Here,
C(List(a))
entails C(a)
, such that
C(List(a))
only holds if C(a)
holds. How do we
take that into account?
Well, following GHC’s footsteps, this is very easy: we don’t! Type
class resolution only matches the “instance head” (i.e. the part after
the =>
), so we only need to emit a new
constraint for C(a)
when we land on that instance (using
our substitution to find the correct type for a
).
This might seem a bit limiting and there are more elaborate systems that can use entailment information, but in an open world setting like type classes where any module might add new instances, these can have some serious subtle downsides around modularity. Only matching on the instance head is a very reasonable pragmatic choice that is enough to cover most real world cases while keeping excellent performance.
Row Polymorphism
The system we covered so far works very well for classic ML-style type systems that consist exclusively of (possibly parameterized) type constructors, but not every type system is this simple.
A common type system extension is row polymorphism, which can be used to model extensible records, polymorphic variants, and even algebraic effects. How do we include this in our trie?
The fundamental issue with row polymorphism is that the order of
labels in a row does not matter. In other words,
{ x : Int, y : Bool }
and
{ y : Bool, x : Int }
are the same type (and hence,
depending on the instantiations of a
and b
,
even { x : Int | a }
and { y : Bool | b }
might be equal). Depending on the exact flavor of your type system, rows
might be allowed to have duplicate labels. In that case, only the order
of distinct labels is irrelevant. For example,
{ x : Int, y : Bool, x : String }
is equal to
{ y : Bool, x : Int, x : String }
, but not
{ x : String, x : Int, y : Bool }
.
How could we possibly represent type class instances for these types in a trie?
The answer is actually surprisingly easy. We can treat the record with only its labels as a head constructor, similar to what we did before. Unlike before, we canonicalize rows by sorting their fields. This is especially important if we want to use the trie to detect two instances for exactly the same type (since we should probably throw an error in that case).
A beautiful property of the duplicate label semantics outlined above is that canonicalizing records with duplicate labels boils down to a stable sort since it only needs to preserve the order of duplicate labels.
For example, { y : Bool, x : Int }
will be canonicalized
to { x : Int, y : Bool }
and subsequently unrolled to
and { x : Int, y : Bool, x : String | a }
becomes
Now, when matching a type against a row head constructor like this, we move the types from the head constructor labels to the front in canonical order. This ensures that they are matched against the correct trie fields and keeps the remaining row to be matched against the extension variable if applicable.
For example, this is how one possible match could behave
Overlapping instances
Usually, requiring type class resolution to come up with a single unambiguous instance is the right choice, but sometimes it makes sense to disambiguate explicitly and let the compiler choose one instance over another. Can we do this with our trie?
Yes! In fact, since we report every matching candidate, we can perform this disambiguation entirely on the result of type class resolution without ever touching our trie. The exact rules for how to do this can be a bit subtle, but one reasonable approach is the one taken by GHC, which is documented in the fantastic GHC User’s Guide3.
Conclusion
Even if you don’t plan on implementing a programming language with type classes any time soon (why not?), there are a few things I would like you to take away from this.
- Tries are awesome!
- Picking the right data structures is incredibly important, even in domains like type checkers where they typically don’t receive that much love.
- There is a lot to be gained by applying knowledge from one area of computer science (e.g. text processing) to another (e.g. type checking)
If nothing else, this gave me a great opportunity to try out GraphViz support on my blog.