Namespaced De Bruijn indices for duplicate record fields

07 May 2024

Namespaced De Bruijn indices are a really neat trick for representing names in programming language interpreters that I learned from @fullmoon. In short, the idea is pretty much that you can access shadowed variables by starting at their last occurence and counting up. So for example, in this code

let x = 1
let y = 2
let x = 3
let y = 4
let x = 5

x (which is equivalent to x@0) is assigned 5, x@1 is 3, x@2 is 1, y and y@0 are 4 and y@1 is 2.

The way Gabby uses this is mostly intended for interpreters that need to do capture avoiding substitution. But even in a compiler, these are actually quite useful in combination with a feature that might seem completely unrelated at first!

Row polymorphism with duplicate labels

At this point, row polymorphism isn’t actually that uncommon. It’s a way to represent anonymous record types and abstract over the fields of a record without needing subtyping (which would make type inference difficult and might lose information). For example, a function that takes any record that has an x field of type Int and a y field of type String and that replaces the value of y with something of type Bool can look like this. (where r is the row that represents the remaining fields)

let f : forall r. { x : Int, y : String | r } -> { x : Int, y : Bool | r }
let f(r) = { r with y = r.x > 0 }

Now, one problem that quickly comes up with row polymorphism is how to do record extensions. It would be quite convenient to be able to write a function that adds a field to a record like this

let f : forall r. { | r } -> { x : Int | r }
let f(r) = { r extend x =  5 }

This seems innocent enough, but consider what happens if you call this function on { x = "a" }. The record already has an x field so you can’t exactly add another, can you? You could give up here and make the programmer ensure that f is never called on a record that already has an x field, and some row polymorphism implementations really do this in practice! (for example with qualified types / type class constraints).

But a much more elegant answer is given in the paper Extensible records with scoped labels that e.g. Elm and Purescript[^purescript] are based on. Its answer to row extensions is: just allow duplicate fields! If you apply f to { x = "a" }, you just get { x = 5, x = "a" }, which has type { x : Int, x : String }. This means that the order of fields is not entirely irrelevant anymore, but fortunately the type system only needs to preserve the order of fields with the same name so it’s barely noticable in practice.

If you paid attention, you might now be asking “how do I access the second x field?”. And the traditional answer to this is: you don’t! The row extension has shadowed the inner x, so the only way to access it now is to remove the outer field. If your type system even supports that (most don’t). Buuut…

We can use namespaced De Bruijn indices!

If you want to access the second x field of a record, you can just write r.x@1! And if you want to access the, say, 27th field, you just write r.x@26! This also works seamlessly with polymorphism and type inference. For example, a generic function like f(r) = r.x@2 would easily be inferred to have type f : forall r a b c. { x : a, x : b, x : c | r } -> c.

I think this is pretty cute :) [^purescript]: ~~As far as I know. Don’t quote me on this ^^~~ Well, kind of