Is Haskell Missing A Non-Instantiating Parametric Case?

The problem

Consider a GADT for handling abstract “stream processors”:

{-# LANGUAGE GADTs, RankNTypes #-}

-- "StreamProcessor prim a b" is the type of a stream-processing element
-- with input type "a", output type "b", and in which atomic elements
-- with input type "x" and output type "y" have type "prim x y".
data StreamProcessor prim a b
  Prim :: prim x y ->
          StreamProcessor prim x y
  Id   :: StreamProcessor prim a a
  Comp :: StreamProcessor prim a b ->
          StreamProcessor prim b c ->
          StreamProcessor prim a c

Now, suppose we want to define a recursive function on this type. Suppose also that the function relies on its argument being polymorphic in prim (the example below doesn't need this, but there are lots that do in Washburn+Weirich's “Boxes Go Bananas” and Chlipala's PHOAS):

foo :: (forall prim . StreamProcessor prim a b) -> 
       (forall prim . StreamProcessor prim a b)
foo Id         = Id
foo (Comp f g) = Comp (foo f) (foo g)

It turns out to be impossible to define a function like foo; the code above gives:

  Couldn't match type `prim0' with `prim1'
    because type variable `prim1' would escape its scope
  This (rigid, skolem) type variable is bound by
    a type expected by the context: StreamProcessor prim1 a b1
  The following variables have types that mention prim0
    g :: StreamProcessor prim0 b b1 (bound at Gripe.hs:25:13)
    f :: StreamProcessor prim0 a b (bound at Gripe.hs:25:11)
  In the first argument of `foo', namely `f'
  In the first argument of `Comp', namely `(foo f)'
  In the expression: Comp (foo f) (foo g)

The issue here is that in order to case-match on its argument foo must instantiate the quantifier over prim. Unfortunately once this happens you've no longer got a polymorphic value, so you've lost the ability to make the recursive call.


In situations that don't use existential types you can often hack around this using inexhaustive pattern matching. For example, you would try something like the following code (which doesn't actually work in this situation):

foo q = case q of
         Comp f g -> let
                       f' :: forall prim . StreamProcessor prim a x
                       f' = case q of { Comp f' _ -> f' }
                       g' :: forall prim . StreamProcessor prim x b
                       g' = case q of { Comp _ g' -> g' }
                     in undefined

By the time we've taken this branch of the case we know here that q must be Comp f g, so the two case matches inside the let cannot fail. Because of parametricity the choice of prim cannot affect whether q is a Comp, an Id, or a Prim.

Unfortunately because an existential type is involved (the “output” of f and “input” of g) this won't work – any attempt to get your hands on a polymorphic version of f and a polymorphic version of g at the same time will involve an existential type variable escaping its scope. Even if you work around that, you still wind up in a situation where the typechecker has “forgotten” that the output type of f must match the input type of g, so after the recursive call you won't be able to Comp the results back together.

I posted a highly-simplified version of this problem on StackExchange.

Is Haskell Missing A Parametric Case?

I think the answer might be that Haskell is missing a non-instantiating “parametric case”. This language construct would make the type system aware of its own parametricity to some extent.

In System FC, a case expression is defined with respect to a particular type constructor (tycon). Each tycon has zero or more data constructors (datacon). The case may have at most one branch per datacon, and in each branch the arguments of the corresponding datacon are bound as variables.

For example, since StreamProcessor (a tycon) has Comp as a datacon, the case is allowed to have a branch for Comp; two variables are bound within this branch because the Case datacon takes two arguments.

Now, a case may be applied only to a value whose type is an instance of the tycon. For example, you may do this:

sp :: StreamProcessor prim x y
case sp of ...

but you may not do this in raw GHC Core:

sp :: forall prim . StreamProcessor prim x y
case sp of ...

The program above will work in the surface syntax because type application is implicit in Haskell, so the compiler elaborates to:

sp :: forall prim . StreamProcessor prim x y
case sp {| GHC.Any |} of ...

Where {|t|} is ad-hoc syntax for type application. We want a version of case that doesn't do this.

Possible Solution

I suspect that Haskell (and System FC) is “missing” a non-instantiating polymorphic version of case, in the sense that it is useful and adding it does not compromise the soundness of the type system in any way. I will write the proposed “parametric case” as pcase. The differences between case and pcase are:

  • The kind of pcase's discriminant is expected to be k->* for some k, so elaboration of the surface syntax into Core will not cook up a type application.

  • pcase v of … is well-typed only if v::forall x . T x for some algebraic data type T.

  • Within the branches of pcase, any variables bound in that arm are bound with types forall x . Q x rather than Q GHC.Any.

  • Any existential type variables on the branch in question are skolemized.

Regarding the last point, see the example below, where the kind of b goes from * to *->*; it becomes a Skolem function).

foo :: (forall prim . StreamProcessor prim x z) -> ...
foo sp = pcase sp of
           Comp f g ->
                 -- this branch is typed in an environment where:
                 --   fresh tyvar "y :: * -> *" is introduced
                 --   f :: forall prim . StreamProcessor prim x (y prim)
                 --   g :: forall prim . StreamProcessor prim (y prim) z

We can't have y::* inside the case branches because it's possible that the existential tyvar depends on prim (hence the skolemization). We don't want to create two separate fresh tyvars because this would “forget” the important fact that for any choice of prim the output type of f matches the input type of g.

To be completely specific about the skolemization, any of the datacon's existential tyvars of kind k2 become fresh tyvars of kind k1->k2 where k1 is the kind of the quantified type variable (prim in the example above).