Fast Map Union and Local Instances Through Instance Types
26 February 2023In part 31 of my crusade against GHC’s coherence guarantees, I have actually done it! This time we will end up with a way to generate local type class instances without any asterisks about code breaking with optimizations. On the way, we are going to end up solving the dreaded Fast Map Union Problem, combining two of my favorite Haskell tricks, and discovering a bug in a previous version of GHC.
But before we get to that, let’s start at the beginning
What even is this ‘fast Map union problem’?
Let’s pretend that Haskell did have consistent locally
overridable type class instances. In that case, the interface for
Map
would be completely broken.
Look at the type of Data.Map.insert
insert :: Ord k => k -> v -> Map k v -> Map k v
See the issue? Map
is some kind of ordered tree
internally, so it depends on the Ord
instance being
consistent across different operations, but with local instances, we
don’t have any guarantees like that.
-- This uses the regular instance for Ord Int
let map = insert (1 :: Int) 1 mempty
-- This uses a different, incompatible instance on the same map!
$ insert (2 :: Int) 2 map withLocalOrd reverseOrdInt
If you think about this for a bit, you may come up with a solution: You can store the instance in the map. This is how most map implementations in other languages work after all.
data Map k v = MkMap (Dict (Ord k)) (ActualMapImplementation k v)
And now the type of insert doesn’t mention Ord
at all
anymore so everything is sunshine and roses
insert :: k -> v -> Map k v -> Map k v
…until it isn’t! Consider the type of union
now
union :: Map k v -> Map k v -> Map k v
What happens if both maps use different Ord k
instances?
We have no way to statically ensure that they don’t. The way to solve
this in practice would be to just reinsert every value from one map in
the other, but that means that union is now drastically slower than it
would be if we could assume that both maps use the same instance (It’s
now at least linear in the size of one map vs. logarithmic in that of
the smaller one).
Instance Types
Now, what we really need here is a way to statically ensure that both
maps use the same Ord
implementation. In a dependently
typed language, one could just carry the instance dictionary in the
type
data Map k v (inst :: Ord k) = ...
union :: Map k v inst -> Map k v inst -> Map k v inst
But we cannot do that in Haskell (yet?), so we need a few more tricks. What we absolutely need to do is to somehow move the used instance to the type level, so let’s start there. While we cannot directly depend on the instance value, we can define a dummy type as a stand-in.
With this new type, we can write a version of Ord
that
carries its concrete instance in the type. Every function with an
Ord
constraint should now be polymorphic over this instance
parameter.
class OrdI inst a | inst -> a where
compareI :: a -> a -> Ordering
data RegularOrdInt
instance OrdI RegularOrdInt Int where
= compare
compareI
data ReverseOrdInt
instance OrdI ReverseOrdInt Int where
= case compare x y of
compareI x y LT -> GT
EQ -> EQ
GT -> LT
And now we can finally parameterize Map over the used
Ord
instance!
data IMap inst k v = ...
insert :: OrdI inst k => k -> v -> IMap inst k v -> IMap inst k v
union :: OrdI inst k => IMap inst k v -> IMap inst k v -> IMap inst k v
union
statically ensures that both maps agree on their
instances.
The only ‘issue’ here is that we lose some inference. This is more or
less unavoidable if we want multiple instances to coexist (well, more on
that later). We need to tell Haskell what instance to use at some point,
but it’s not that bad, since at least for IMap
s, which
carry the instance in their type, we only need to do this once.
let map1 = insert 1 1 $ insert 2 2 $ empty @RegularOrdInt
let map2 = insert @ReverseOrdInt 3 3 $ insert 4 4 $ empty
-- Type error: Couldn't match type 'RegularOrdInt' with `ReverseOrdInt` union map1 map2
How do local instances fit into this?
Quite well as it turns out! We can combine two of my favorite Haskell tricks to implement them in current Haskell (You can stop pretending that they exist. We are going to implement them for real now)
Local instances might differ between different executions of the same
code, so we need to make sure that the instance types can only be used
locally and cannot escape. If this sounds familiar that is because there
is a decent chance that you have used something similar before: runST
.
ST
is a monad that is used for local mutability inside
pure code. You can use it to create STRef
s and mutate them,
just like you would in your favorite imperative language2,
except that Haskell, by virtue of being a pure language, needs to make
absolutely sure that you never ever return an STRef
from
runST
. Doing so would allow effects in one usage of the
supposedly pure runST
to affect the result of another
one.
How does ST
do this? It uses higher-rank types!
newSTRef :: a -> ST s (STRef s a)
runST :: (forall s. ST s a) -> a
If you have never seen this before, you’re probably quite confused
right now. The type forall s. ST s a
specifies that the
argument to runST
needs to be an ST
value that
works for any possible instantiation of s
. Crucially for
this case, this means that the s
variable is, unlike
a
, not a type parameter of runST
itself, but actually one of the arguments to runST. This is
much clearer if we write runST
with an explicit outer
forall
.
runST :: forall a. (forall s. ST s a) -> a
Now, the reason this ensures that no STRefs escape is that the
s
parameter of the STRef
is the same as that
of the containing ST
monad. If you were able to
return an STRef
from the argument to runST
,
e.g. by passing something of type
forall s. ST s (STRef s Int)
, what type would the result
have? Blindly substituting would yield STRef s Int
, but
what is s
now? Previously, s
was bound by the
forall
in the type of the argument to runST
,
but that forall
does not exist anymore! This is why GHC
will not accept this code and complain about an ‘escaping skolem’.
We can use exactly the same trick for local type class instances to invent an instance type that only exists locally! If we can implement a function of the following type, then this will ensure that instance types cannot possibly escape.
withOrdI :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => b) -> b
Now, how do we implement this function? If you read the first post in this series, you will probably know the answer already.
At runtime, GHC represents type classes via a technique called
dictionary passing. This means that a function with an Ord
constraint like Ord a => a -> a -> Ordering
will
be turned into a function that takes an implementation of that type
class as an argument (Ord a -> a -> a
). This
implementation, called a dictionary, is just a regular record-like data
type that contains an implementation for every method.
Using GADT
s, we can capture this dictionary in a
value.
data OrdIDict a where
OrdIDict :: Ord a => OrdIDict a
If we are now able to replicate the structure of this
OrdIDict
exactly, but with a custom record of methods, we
can use unsafeCoerce
to convert it to a functional
OrdIDict a
. By pattern matching on the result, we can
release the dictionary back into a regular instance, which now contains
our handwritten type class instance!
Instinctively, your definition of OrdIDict
would
probably look like this
data OrdIDict inst a where
OrdIDict :: OrdI inst a => OrdIDict inst a
This is not going to work though. We need to invent a new type for
inst
, so OrdIDict
cannot take it as a
parameter. Thanks to Haskell’s support for existential types, this is
not actually an issue, since we can just leave it off3.
data OrdIDict a where
OrdIDict :: OrdI inst a => OrdIDict a
OrdI
only has a single method, so it will be represented
by the equivalent of a newtype record at runtime. This is erased
entirely, so we do not need to build a record around our
a -> a -> Ordering
function.
The OrdIDict
wrapper adds some indirection though, so we
need to replicate that.
data FakeDict a = FakeDict a
All preparations are complete. We are ready for the magic4!
withOrdI :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => b) -> b
=
withOrdI dict body case unsafeCoerce (FakeDict dict) :: OrdIDict a of
OrdIDict @inst) -> body @inst (
And that’s… it!
Well, not quite. If you paid very close attention, you may notice
that we have no way to bind the inst
type variable in the
(forall inst. OrdI inst a => b)
argument. This means if
we put a lambda there, we cannot actually mention inst
anywhere. This is easy enough to solve by introducing a Proxy
value.
withOrdIProxy :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => Proxy inst -> b) -> b
=
withOrdIProxy dict body case unsafeCoerce (FakeDict dict) :: OrdIDict a of
OrdIDict @inst) -> body @inst (Proxy @inst) (
Let’s try it out!
We can start with some concrete instances
= insert 1 1 (empty @Int)
map1 = insert 2 2 (empty @Int)
map2
= insert 3 3 (empty @ReverseIntOrd)
map3 = insert 4 4 (empty @ReverseIntOrd)
map4
= union map1 map2
fine1 = union map3 map4
fine2
= union map1 map3 -- fails! notFine
Looking good. Now let’s try local instances
= withOrdIProxy compare \(Proxy @inst) -> do
local let localMap1 :: IMap inst Int Int = insert @inst 1 1 (empty @inst)
()
$ ghc-9.2 insttypes.hs
insttypes.hs:80:42: error:
• Could not deduce (OrdI (*) Int) arising from a use of ‘insert’
from the context: OrdI inst a0
...
Huh. That is… strange. It looks like for some strange reason, GHC
defaults the inst
parameter to… (*)
5? Even if we add as many type
annotations as physically possible?
If this smells like a bug in GHC, that is because it is! Or well, was.
If you run the same example with GHC 9.4, it will compile without complaining.
Now, let’s try that again
= withOrdIProxy compare \(Proxy @inst) -> do
local let localMap1 = insert 1 1 (empty @inst)
let localMap2 = insert 2 2 (empty @inst)
let perfectlyFine = union localMap1 localMap2
let notFine :: IMap RegularIntOrd Int Int = insert @inst 2 2 (empty @inst) -- fails!
let notFine2 = union localMap1 map1 -- also fails!
-- fails! localMap1
Perfect!
Do we have to annotate the instance every time?
In this specific example with IMap
, the number of type
annotations required was quite manageable, but in code that uses
OrdI
like regular Ord
, this is much less
pleasant
min3 :: IOrd inst a => a -> a -> a -> a
= case compare @inst x y of -- needs a type application!
min3 x y z GT -> case compare @inst y z of -- this one as well
GT -> z
-> y
_ -> case compare @inst x z of -- same here
_ GT -> case compare @inst y z of -- also here
GT -> z
-> y
_ -> x _
Used directly like this, this technique is really just Scrap your type classes with extra steps.
This is frustrating because we are just running up against GHC’s
stubbornness here. There is only a single possible instance for
IOrd inst a
in scope, but GHC doesn’t want to choose it.
Usually, this behavior might make sense, but it’s really not helpful for
us.
Fortunately for us though, we are not the first to run into this
issue! Polysemy, the
popular effect system library, had the exact same problem. Polysemy
makes it possible to define effects of the form
Member Effect r
over a set of effects r
. The
issue here is that, unlike mtl, Polysemy allows
duplicate effects, so GHC again doesn’t trust that you really wanted to
use the constraint from the signature and not another, currently
unwritten one.
How did Polysemy solve that? They wrote a type checker plugin that disambiguates Polysemy constraints whenever there is exactly one relevant instance in scope.
This is exactly what we want, so we could probably copy most of it.
But that is a topic for a future blog post.
Conclusion
We covered quite a bit of ground there.
- We parameterized type classes over a type representing the instance
- This made it possible to statically ensure that maps can safely be merged in logarithmic time, even in the presence of local instances
- We managed to implement local type class instances that are actually safe, even in the presence of optimizations
- We discovered a bug in GHC 9.2
- There is probably a way to avoid unnecessary type applications with a GHC plugin
This is the first time in this series that I can write a conclusion
without begging you to ‘please never ever use this anywhere near
production’. This technique still relies on GHC’s internal type class
representation (and you should always be careful around
unsafeCoerce
) but so does the popular reflection
package, so you should be fine.
If you want to try this for yourself, you can get the code in a gist. Just make sure to use GHC 9.4 or above.
You don’t need to have read any of the previous posts to understand this, don’t worry. On the other hand, if you like this one, you will probably enjoy the others as well ;)↩︎
If your favorite imperative language is OCaml or Standard ML that is.↩︎
In fact, pattern matching on a value containing an existentially quantified type variable will turn that variable into a ‘skolem’, just like the higher rank type in the
runST
example, so we are truly conjuring a new type from thin air with this.↩︎This is an OCaml joke. Don’t worry about it.↩︎
(*)
is another way of writingType
, the type of types. The only reason this is even valid is that modern GHC Haskell has dependent kinds andType : Type
.↩︎