Fast Map Union and Local Instances Through Instance Types
26 February 2023In part 31 of my crusade against GHC’s coherence guarantees, I have actually done it! This time we will end up with a way to generate local type class instances without any asterisks about code breaking with optimizations. On the way, we are going to end up solving the dreaded Fast Map Union Problem, combining two of my favorite Haskell tricks, and discovering a bug in a previous version of GHC.
But before we get to that, let’s start at the beginning
What even is this ‘fast Map union problem’?
Let’s pretend that Haskell did have consistent locally
overridable type class instances. In that case, the interface for
Map would be completely broken.
Look at the type of Data.Map.insert
insert :: Ord k => k -> v -> Map k v -> Map k vSee the issue? Map is some kind of ordered tree
internally, so it depends on the Ord instance being
consistent across different operations, but with local instances, we
don’t have any guarantees like that.
-- This uses the regular instance for Ord Int
let map = insert (1 :: Int) 1 mempty
-- This uses a different, incompatible instance on the same map!
withLocalOrd reverseOrdInt $ insert (2 :: Int) 2 map If you think about this for a bit, you may come up with a solution: You can store the instance in the map. This is how most map implementations in other languages work after all.
data Map k v = MkMap (Dict (Ord k)) (ActualMapImplementation k v)And now the type of insert doesn’t mention Ord at all
anymore so everything is sunshine and roses
insert :: k -> v -> Map k v -> Map k v…until it isn’t! Consider the type of union now
union :: Map k v -> Map k v -> Map k vWhat happens if both maps use different Ord k instances?
We have no way to statically ensure that they don’t. The way to solve
this in practice would be to just reinsert every value from one map in
the other, but that means that union is now drastically slower than it
would be if we could assume that both maps use the same instance (It’s
now at least linear in the size of one map vs. logarithmic in that of
the smaller one).
Instance Types
Now, what we really need here is a way to statically ensure that both
maps use the same Ord implementation. In a dependently
typed language, one could just carry the instance dictionary in the
type
data Map k v (inst :: Ord k) = ...
union :: Map k v inst -> Map k v inst -> Map k v instBut we cannot do that in Haskell (yet?), so we need a few more tricks. What we absolutely need to do is to somehow move the used instance to the type level, so let’s start there. While we cannot directly depend on the instance value, we can define a dummy type as a stand-in.
With this new type, we can write a version of Ord that
carries its concrete instance in the type. Every function with an
Ord constraint should now be polymorphic over this instance
parameter.
class OrdI inst a | inst -> a where
compareI :: a -> a -> Ordering
data RegularOrdInt
instance OrdI RegularOrdInt Int where
compareI = compare
data ReverseOrdInt
instance OrdI ReverseOrdInt Int where
compareI x y = case compare x y of
LT -> GT
EQ -> EQ
GT -> LTAnd now we can finally parameterize Map over the used
Ord instance!
data IMap inst k v = ...
insert :: OrdI inst k => k -> v -> IMap inst k v -> IMap inst k v
union :: OrdI inst k => IMap inst k v -> IMap inst k v -> IMap inst k vunion statically ensures that both maps agree on their
instances.
The only ‘issue’ here is that we lose some inference. This is more or
less unavoidable if we want multiple instances to coexist (well, more on
that later). We need to tell Haskell what instance to use at some point,
but it’s not that bad, since at least for IMaps, which
carry the instance in their type, we only need to do this once.
let map1 = insert 1 1 $ insert 2 2 $ empty @RegularOrdInt
let map2 = insert @ReverseOrdInt 3 3 $ insert 4 4 $ empty
union map1 map2 -- Type error: Couldn't match type 'RegularOrdInt' with `ReverseOrdInt`How do local instances fit into this?
Quite well as it turns out! We can combine two of my favorite Haskell tricks to implement them in current Haskell (You can stop pretending that they exist. We are going to implement them for real now)
Local instances might differ between different executions of the same
code, so we need to make sure that the instance types can only be used
locally and cannot escape. If this sounds familiar that is because there
is a decent chance that you have used something similar before: runST.
ST is a monad that is used for local mutability inside
pure code. You can use it to create STRefs and mutate them,
just like you would in your favorite imperative language2,
except that Haskell, by virtue of being a pure language, needs to make
absolutely sure that you never ever return an STRef from
runST. Doing so would allow effects in one usage of the
supposedly pure runST to affect the result of another
one.
How does ST do this? It uses higher-rank types!
newSTRef :: a -> ST s (STRef s a)
runST :: (forall s. ST s a) -> aIf you have never seen this before, you’re probably quite confused
right now. The type forall s. ST s a specifies that the
argument to runST needs to be an ST value that
works for any possible instantiation of s. Crucially for
this case, this means that the s variable is, unlike
a, not a type parameter of runST
itself, but actually one of the arguments to runST. This is
much clearer if we write runST with an explicit outer
forall.
runST :: forall a. (forall s. ST s a) -> aNow, the reason this ensures that no STRefs escape is that the
s parameter of the STRef is the same as that
of the containing ST monad. If you were able to
return an STRef from the argument to runST,
e.g. by passing something of type
forall s. ST s (STRef s Int), what type would the result
have? Blindly substituting would yield STRef s Int, but
what is s now? Previously, s was bound by the
forall in the type of the argument to runST,
but that forall does not exist anymore! This is why GHC
will not accept this code and complain about an ‘escaping skolem’.
We can use exactly the same trick for local type class instances to invent an instance type that only exists locally! If we can implement a function of the following type, then this will ensure that instance types cannot possibly escape.
withOrdI :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => b) -> bNow, how do we implement this function? If you read the first post in this series, you will probably know the answer already.
At runtime, GHC represents type classes via a technique called
dictionary passing. This means that a function with an Ord
constraint like Ord a => a -> a -> Ordering will
be turned into a function that takes an implementation of that type
class as an argument (Ord a -> a -> a). This
implementation, called a dictionary, is just a regular record-like data
type that contains an implementation for every method.
Using GADTs, we can capture this dictionary in a
value.
data OrdIDict a where
OrdIDict :: Ord a => OrdIDict aIf we are now able to replicate the structure of this
OrdIDict exactly, but with a custom record of methods, we
can use unsafeCoerce to convert it to a functional
OrdIDict a. By pattern matching on the result, we can
release the dictionary back into a regular instance, which now contains
our handwritten type class instance!
Instinctively, your definition of OrdIDict would
probably look like this
data OrdIDict inst a where
OrdIDict :: OrdI inst a => OrdIDict inst aThis is not going to work though. We need to invent a new type for
inst, so OrdIDict cannot take it as a
parameter. Thanks to Haskell’s support for existential types, this is
not actually an issue, since we can just leave it off3.
data OrdIDict a where
OrdIDict :: OrdI inst a => OrdIDict aOrdI only has a single method, so it will be represented
by the equivalent of a newtype record at runtime. This is erased
entirely, so we do not need to build a record around our
a -> a -> Ordering function.
The OrdIDict wrapper adds some indirection though, so we
need to replicate that.
data FakeDict a = FakeDict aAll preparations are complete. We are ready for the magic4!
withOrdI :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => b) -> b
withOrdI dict body =
case unsafeCoerce (FakeDict dict) :: OrdIDict a of
(OrdIDict @inst) -> body @instAnd that’s… it!
Well, not quite. If you paid very close attention, you may notice
that we have no way to bind the inst type variable in the
(forall inst. OrdI inst a => b) argument. This means if
we put a lambda there, we cannot actually mention inst
anywhere. This is easy enough to solve by introducing a Proxy
value.
withOrdIProxy :: forall a b. (a -> a -> Ordering) -> (forall inst. OrdI inst a => Proxy inst -> b) -> b
withOrdIProxy dict body =
case unsafeCoerce (FakeDict dict) :: OrdIDict a of
(OrdIDict @inst) -> body @inst (Proxy @inst)Let’s try it out!
We can start with some concrete instances
map1 = insert 1 1 (empty @Int)
map2 = insert 2 2 (empty @Int)
map3 = insert 3 3 (empty @ReverseIntOrd)
map4 = insert 4 4 (empty @ReverseIntOrd)
fine1 = union map1 map2
fine2 = union map3 map4
notFine = union map1 map3 -- fails!Looking good. Now let’s try local instances
local = withOrdIProxy compare \(Proxy @inst) -> do
let localMap1 :: IMap inst Int Int = insert @inst 1 1 (empty @inst)
()$ ghc-9.2 insttypes.hs
insttypes.hs:80:42: error:
• Could not deduce (OrdI (*) Int) arising from a use of ‘insert’
from the context: OrdI inst a0
...Huh. That is… strange. It looks like for some strange reason, GHC
defaults the inst parameter to… (*)5? Even if we add as many type
annotations as physically possible?
If this smells like a bug in GHC, that is because it is! Or well, was.
If you run the same example with GHC 9.4, it will compile without complaining.
Now, let’s try that again
local = withOrdIProxy compare \(Proxy @inst) -> do
let localMap1 = insert 1 1 (empty @inst)
let localMap2 = insert 2 2 (empty @inst)
let perfectlyFine = union localMap1 localMap2
let notFine :: IMap RegularIntOrd Int Int = insert @inst 2 2 (empty @inst) -- fails!
let notFine2 = union localMap1 map1 -- also fails!
localMap1 -- fails!Perfect!
Do we have to annotate the instance every time?
In this specific example with IMap, the number of type
annotations required was quite manageable, but in code that uses
OrdI like regular Ord, this is much less
pleasant
min3 :: IOrd inst a => a -> a -> a -> a
min3 x y z = case compare @inst x y of -- needs a type application!
GT -> case compare @inst y z of -- this one as well
GT -> z
_ -> y
_ -> case compare @inst x z of -- same here
GT -> case compare @inst y z of -- also here
GT -> z
_ -> y
_ -> xUsed directly like this, this technique is really just Scrap your type classes with extra steps.
This is frustrating because we are just running up against GHC’s
stubbornness here. There is only a single possible instance for
IOrd inst a in scope, but GHC doesn’t want to choose it.
Usually, this behavior might make sense, but it’s really not helpful for
us.
Fortunately for us though, we are not the first to run into this
issue! Polysemy, the
popular effect system library, had the exact same problem. Polysemy
makes it possible to define effects of the form
Member Effect r over a set of effects r. The
issue here is that, unlike mtl, Polysemy allows
duplicate effects, so GHC again doesn’t trust that you really wanted to
use the constraint from the signature and not another, currently
unwritten one.
How did Polysemy solve that? They wrote a type checker plugin that disambiguates Polysemy constraints whenever there is exactly one relevant instance in scope.
This is exactly what we want, so we could probably copy most of it.
But that is a topic for a future blog post.
Conclusion
We covered quite a bit of ground there.
- We parameterized type classes over a type representing the instance
- This made it possible to statically ensure that maps can safely be merged in logarithmic time, even in the presence of local instances
- We managed to implement local type class instances that are actually safe, even in the presence of optimizations
- We discovered a bug in GHC 9.2
- There is probably a way to avoid unnecessary type applications with a GHC plugin
This is the first time in this series that I can write a conclusion
without begging you to ‘please never ever use this anywhere near
production’. This technique still relies on GHC’s internal type class
representation (and you should always be careful around
unsafeCoerce) but so does the popular reflection
package, so you should be fine.
If you want to try this for yourself, you can get the code in a gist. Just make sure to use GHC 9.4 or above.
You don’t need to have read any of the previous posts to understand this, don’t worry. On the other hand, if you like this one, you will probably enjoy the others as well ;)↩︎
If your favorite imperative language is OCaml or Standard ML that is.↩︎
In fact, pattern matching on a value containing an existentially quantified type variable will turn that variable into a ‘skolem’, just like the higher rank type in the
runSTexample, so we are truly conjuring a new type from thin air with this.↩︎This is an OCaml joke. Don’t worry about it.↩︎
(*)is another way of writingType, the type of types. The only reason this is even valid is that modern GHC Haskell has dependent kinds andType : Type.↩︎