Namespaced De Bruijn indices for duplicate record fields
07 May 2024Namespaced De Bruijn indices are a really neat trick for representing names in programming language interpreters that I learned from @fullmoon. In short, the idea is pretty much that you can access shadowed variables by starting at their last occurence and counting up. So for example, in this code
let x = 1
let y = 2
let x = 3
let y = 4
let x = 5
x
(which is equivalent to x@0
) is assigned
5, x@1
is 3, x@2
is 1, y
and
y@0
are 4 and y@1
is 2.
The way Gabby uses this is mostly intended for interpreters that need to do capture avoiding substitution. But even in a compiler, these are actually quite useful in combination with a feature that might seem completely unrelated at first!
Row polymorphism with duplicate labels
At this point, row polymorphism isn’t actually that uncommon. It’s a
way to represent anonymous record types and abstract over the fields of
a record without needing subtyping (which would make type inference
difficult and might lose information). For example, a function that
takes any record that has an x
field of type
Int
and a y
field of type String
and that replaces the value of y
with something of type
Bool
can look like this. (where r
is the row
that represents the remaining fields)
let f : forall r. { x : Int, y : String | r } -> { x : Int, y : Bool | r }
let f(r) = { r with y = r.x > 0 }
Now, one problem that quickly comes up with row polymorphism is how to do record extensions. It would be quite convenient to be able to write a function that adds a field to a record like this
let f : forall r. { | r } -> { x : Int | r }
let f(r) = { r extend x = 5 }
This seems innocent enough, but consider what happens if you call
this function on { x = "a" }
. The record already has an
x
field so you can’t exactly add another, can you? You
could give up here and make the programmer ensure that f
is
never called on a record that already has an x
field, and
some row polymorphism implementations really do this in practice! (for
example with qualified types / type class constraints).
But a much more elegant answer is given in the paper Extensible
records with scoped labels that e.g. Elm and Purescript[^purescript]
are based on. Its answer to row extensions is: just allow duplicate
fields! If you apply f
to { x = "a" }
, you
just get { x = 5, x = "a" }
, which has type
{ x : Int, x : String }
. This means that the order of
fields is not entirely irrelevant anymore, but fortunately the type
system only needs to preserve the order of fields with the same
name so it’s barely noticable in practice.
If you paid attention, you might now be asking “how do I access the
second x
field?”. And the traditional answer to this is:
you don’t! The row extension has shadowed the inner x
, so
the only way to access it now is to remove the outer field. If your type
system even supports that (most don’t). Buuut…
We can use namespaced De Bruijn indices!
If you want to access the second x
field of a record,
you can just write r.x@1
! And if you want to access the,
say, 27th field, you just write r.x@26
! This also works
seamlessly with polymorphism and type inference. For example, a generic
function like f(r) = r.x@2
would easily be inferred to have
type
f : forall r a b c. { x : a, x : b, x : c | r } -> c
.
I think this is pretty cute :) [^purescript]: As far as I know.
Don’t quote me on this ^^ Well, kind
of