Haskell's tuples should behave like multi-element newtypes

22 November 2023

Hear me out, I swear the title makes sense!

What I mean is that tuples should have the same laziness characteristics as newtypes, just extended to more than one element.

I’ll explain what this means in a second, but let’s get everyone on the same page about laziness semantics first. Laziness is usually “measured” by including a sort of pseudo-value representing non-terminating expressions, i.e. infinite loops or imprecise exceptions. For contrived plot reasons, this value is called “bottom” or ⊥. If a data constructor is lazy in its argument (and therefore needs to allocate a thunk at runtime), that is equivalent to saying that the argument can be bottom.

Now, let’s say we have two independent expressions of type a and b respectively. What values can those have? Well, each one can either be bottom or a value of type a or b (which might contain more bottoms but we don’t care about those here) so they can be any of the following four possibilities.

a b
a
b
a b

There shouldn’t be any surprises so far, so let’s look at the values of a tuple of type (a, b). Operationally, a tuple is really just two values glued together so any reasonable programmer would expect this to look almost identical to the table above, right?

(a, b)
(⊥, ⊥)
(a, ⊥)
(⊥, b)
(a, b)

No! This table has 5 rows but the previous one only had 4! The tuple constructor has added another case () on top of combining the laziness of the two subexpressions.

Written out like this, the difference might seem pretty abstract, but it can be a real source of space leaks!

Laziness bad, just use a strict tuple

I said the magic word, so you’re probably already dying to tell me to add strictness annotations to our tuple type to make it less lazy. Well, let’s try that!

data Pair a b = Pair !a !b
syntax highlighting by codehost

Okay, our tuple is now strict in both values so let’s see how that affects laziness

Pair a b
Pair a b

Oh no, that is not what we want! We got rid of three rows instead of one and now our a and b are linked together. There is no way to evaluate one without also forcing evaluation of the other. This might be a reasonable data structure in many cases, but it’s definitely not what the default tuple in a lazy language should look like.

So is there a way to get rid of only the initial ⊥ row without affecting the rest?

Yes! Unboxed tuples!

If we have an unboxed tuple, it is not a lifted type, so it cannot be bottom and therefore has the exact same laziness characteristics (and almost the same runtime representation!) as our two independent expressions from the beginning

(# a, b #)
(# ⊥, ⊥ #)
(# a, ⊥ #)
(# ⊥, b #)
(# a, b #)

Buuut we pay a price. Because this type cannot be bottom (it is “unlifted”) and it has a different runtime representation than regular types, it doesn’t have kind Type, but TYPE (TupleRep [LiftedRep, LiftedRep]). This means that it pretty much cannot participate in regular polymorphism at all.

So even this is not a solution for regular Haskell programs, despite having the exact laziness profile we are after.

Well, what would the ideal tuple look like then?

We cannot remove the initial ⊥ row since we still want our tuple to be lifted and we cannot touch the individual components since they should retain their own laziness. But there is another row that is entirely redundant! Tuples only have a single constructor, so there is really no reason to treat (⊥, ⊥) and ⊥ differently. If we could just get rid of the (⊥, ⊥) row, effectively ignoring the (,) constructor and only focusing on its components, our tuple would behave exactly like two independent values without sacrificing its kind.

Hey, remember the title?

Newtypes, by virtue of being erased at runtime, behave exactly like their underlying types with regards to laziness.

Identity a
Identity a

Because newtype constructors don’t actually exist, they don’t show up in laziness profiles. That’s exactly how we want tuples to behave, just with more than one underlying value!

Now wouldn’t it be nice if GHC actually let us do this :)