Feb 4th, 2012

Monoidal instances for pipes

In this post, I’m going to introduce a new class of combinators for pipes, with an interesting categorical interpretation. I will be using the pipe implementation of my previous post.

{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Blog.Pipes.MonoidalInstances where

import Blog.Pipes.Guarded hiding (groupBy)
import qualified Control.Arrow as A
import Control.Category
import Control.Categorical.Bifunctor
import Control.Category.Associative
import Control.Category.Braided
import Control.Category.Monoidal
import Control.Monad (forever)
import Control.Monad.Free
import Data.Maybe
import Data.Void
import Prelude hiding ((.), id, filter, until)

When pipes were first released, some people noticed the lack of an Arrow instance. In fact, it is not hard to show that, even identifying pipes modulo some sort of observational equality, there is no Arrow instance that satisfies the arrow laws.

The problem, of course, is with first, because we already have a simple implementation of arr. If we try to implement first we immediately discover that there’s a problem with the Yield case:

first (Yield x c) = yield (x, ???) >> first c

Since ??? can be of any type, the only possible value is bottom, which of course we don’t want to introduce. Alternative definitions of first that alter the structure of a yielding pipe are not possible if we want to satisfy the law:

first p >+> pipe fst == pipe fst >+> p

Concretely, the problem is that the cartesian product in the type of first forces a sort of “synchronization point” that doesn’t necessarily exist. This is better understood if we look at the type of (***), of which first can be thought of as a special case:

(***) :: Arrow k => k a b -> k a' b' -> k (a, a') (b, b')

first = (*** id)

If the two input pipes yield at different times, there is no way to faithfully match their yielded values into a pair. There are hacks around that, but they don’t behave well compositionally, and exhibit either arbitrarily large space leaks or data loss.

This has been addressed before: stream processors, like those of the Fudgets library, being very similar to Pipes, have the same problem, and some resolutions have been proposed, although not entirely satisfactory.

Arrows as monoidal categories

It is well known within the Haskell community that Arrows correspond to so called Freyd categories, i.e. premonoidal categories with some extra structures.

Using the Monoidal class by Edward Kmett (now in the categories package on Hackage), we can try to make this idea precise.

Unfortunately, we have to use a newtype to avoid overlapping instances in the case of the Hask category:

newtype ACat a b c = ACat { unACat :: a b c }
  deriving (Category, A.Arrow)

First, cartesian products are a bifunctor in the category determined by an Arrow.

instance A.Arrow a => PFunctor (,) (ACat a) (ACat a) where
  first = ACat . A.first . unACat
instance A.Arrow a => QFunctor (,) (ACat a) (ACat a) where
  second = ACat . A.second . unACat
instance A.Arrow a
      => Bifunctor (,) (ACat a) (ACat a) (ACat a) where
  bimap (ACat f) (ACat g) = ACat $ f A.*** g

Now we can say that products are associative, using the associativity of products in Hask:

instance A.Arrow a => Associative (ACat a) (,) where
  associate = ACat $ A.arr associate
instance A.Arrow a => Disassociative (ACat a) (,) where
  disassociate = ACat $ A.arr disassociate

Where we use the Disassociative instance to express the inverse of the associator. And finally, the Monoidal instance:

type instance Id (ACat a) (,) = ()
instance A.Arrow a => Monoidal (ACat a) (,) where
  idl = ACat $ A.arr idl
  idr = ACat $ A.arr idr
instance A.Arrow a => Comonoidal (ACat a) (,) where
  coidl = ACat $ A.arr coidl
  coidr = ACat $ A.arr coidr

Where, again, the duals are actually inverses. Also, products are symmetric:

instance A.Arrow a => Braided (ACat a) (,) where
  braid = ACat $ A.arr braid
instance A.Arrow a => Symmetric (ACat a) (,)

As you see, everything is trivially induced by the cartesian structure on Hask, since A.arr gives us an identity-on-objects functor. Note, however, that the Bifunctor instance is legitimate only if we assume a strong commutativity law for arrows:

first f >>> second g == second g >>> first f

which we will, for the sake of simplicity.

Replacing products with arbitrary monoidal structures

Once we express the Arrow concept in terms of monoidal categories, it is easy to generalize it to arbitrary monoidal structures on Hask.

In particular, coproducts work particularly well in the category of pipes:

instance Monad m
      => PFunctor Either (PipeC m r) (PipeC m r) where
  first = PipeC . firstP . unPipeC

firstP :: Monad m => Pipe a b m r
       -> Pipe (Either a c) (Either b c) m r
firstP (Pure r) = return r
firstP (Free (M m)) = lift m >>= firstP

Yielding a sum is now easy: just yield on the left component.

firstP (Free (Yield x c)) = yield (Left x) >> firstP c

Awaiting is a little bit more involved, but still easy enough: receive left and null values normally, and act like an identity on the right.

firstP (Free (Await k)) = go
        where
          go = tryAwait
           >>= maybe (firstP $ k Nothing)
                     (either (firstP . k . Just)
                             (\x -> yield (Right x) >> go))

And of course we have an analogous instance on the right:

instance Monad m
      => QFunctor Either (PipeC m r) (PipeC m r) where
  second = PipeC . secondP . unPipeC

secondP :: Monad m => Pipe a b m r
        -> Pipe (Either c a) (Either c b) m r
secondP (Pure r) = return r
secondP (Free (M m)) = lift m >>= secondP
secondP (Free (Yield x c)) = yield (Right x) >> secondP c
secondP (Free (Await k)) = go
        where
          go = tryAwait
           >>= maybe (secondP $ k Nothing)
                     (either (\x -> yield (Left x) >> go)
                             (secondP . k . Just))

And a bifunctor instance obtained by composing first and second in arbitrary order:

instance Monad m
      => Bifunctor Either (PipeC m r)
                   (PipeC m r) (PipeC m r) where
  bimap f g = first f >>> second g

At this point we can go ahead and define the remaining instances in terms of the identity-on-objects functor given by pipe:

instance Monad m => Associative (PipeC m r) Either where
  associate = PipeC $ pipe associate
instance Monad m => Disassociative (PipeC m r) Either where
  disassociate = PipeC $ pipe disassociate

type instance Id (PipeC m r) Either = Void
instance Monad m => Monoidal (PipeC m r) Either where
  idl = PipeC $ pipe idl
  idr = PipeC $ pipe idr
instance Monad m => Comonoidal (PipeC m r) Either where
  coidl = PipeC $ pipe coidl
  coidr = PipeC $ pipe coidr

instance Monad m => Braided (PipeC m r) Either where
  braid = PipeC $ pipe braid
instance Monad m => Symmetric (PipeC m r) Either

Multiplicative structures

There is still a little bit of extra structure that we might want to exploit. Since PipeC m r is a monoidal category, it induces a (pointwise) monoidal structure on its endofunctor category, so we can speak of monoid objects there. In particular, if the identity functor is a monoid, it means that we can define a “uniform” monoid structure for all the objects of our category, given in terms of natural transformations (i.e. polymorphic functions).

We can represent this specialized monoid structure with a type class (using kind polymorphism and appropriately generalized category-related type classes, it should be possible to unify this class with Monoid and even Monad, similarly to how it’s done here):

class Monoidal k p => Multiplicative k p where
  unit :: k (Id k p) a
  mult :: k (p a a) a

Dually, we can have a sort of uniform coalgebra:

class Comonoidal k p => Comultiplicative k p where
  counit :: k a (Id k p)
  comult :: k a (p a a)

The laws for those type classes are just the usual laws for a monoid in a (not necessarily strict) monoidal category:

first unit . mult == idl
second unit . mult == idr
mult . first mult == mult . second mult . associate

first counit . comult == coidl
second counit . comult == coidr
first diag . diag == disassociate . second diag . diag

Now, products have a comultiplicative structure on Hask (as in every category with finite products), given by the terminal object and diagonal natural transformation:

instance Comultiplicative (->) (,) where
  counit = const ()
  comult x = (x, x)

while coproducts have a multiplicative structure:

instance Multiplicative (->) Either where
  unit = absurd
  mult = either id id

that we can readily transport to PipeC m r using pipe:

instance Monad m => Multiplicative (PipeC m r) Either where
  unit = PipeC $ pipe absurd
  mult = PipeC $ pipe mult

Somewhat surprisingly, pipes also have a comultiplicative structure of their own:

instance Monad m => Comultiplicative (PipeC m r) Either where
  counit = PipeC discard
  comult = PipeC . forever $ do
    x <- await
    yield (Left x)
    yield (Right x)

Heterogeneous metaprogramming

All the combinators we defined can actually be used in practice, and the division in type classes certainly sheds some light on their structure and properties, but there’s actually something deeper going on here.

The fact that the standard Arrow class uses (,) as monoidal structure is not coincidental: Hask is a cartesian closed category, so to embed Haskell’s simply typed λ-calculus into some other category structure, we need at the very least a way to transport cartesian products, i.e. a premonoidal functor.

However, as long as our monoidal structure is comultiplicative and symmetric, we can always recover a first-order fragment of \(\lambda\)-calculus inside the “guest” category, and we don’t even need an identity-on-objects functor (see for example this paper).

The idea is that we can use the monoidal structure of the guest category to represent contexts, where weakening is given by counit, contraction by comult, and exchange by swap.

There is an experimental GHC branch with a preprocessor which is able to translate expressions written in an arbitrary guest language into Haskell, given instances of appropriate type classes , which correspond exactly to the ones we have defined above.

Examples

This exposition was pretty abstract, so we end with some examples.

We first need to define a few wrappers for our monoidal combinators, so we don’t have to deal with the PipeC newtype:

split :: Monad m => Pipe a (Either a a) m r
split = unPipeC comult

join :: Monad m => Pipe (Either a a) a m r
join = unPipeC mult

(*+*) :: Monad m => Pipe a b m r -> Pipe a' b' m r
      -> Pipe (Either a a') (Either b b') m r
f *+* g = unPipeC $ bimap (PipeC f) (PipeC g)

discardL :: Monad m => Pipe (Either Void a) a m r
discardL = unPipeC idl

discardR :: Monad m => Pipe (Either a Void) a m r
discardR = unPipeC idr

Now let’s write a tee combinator, similar to the tee command for shell pipes:

tee :: Monad m => Pipe a Void m r -> Pipe a a m r
tee p = split >+> firstP p >+> discardL

printer :: Show a => Pipe a Void IO r
printer = forever $ await >>= lift . print

ex6 :: IO ()
ex6 = do
  (sourceList [1..5] >+>
    tee printer >+>
    (fold (+) 0 >>= yield) $$
    printer)
  return ()
{- ex6 == mapM_ print [1,2,3,4,5,15] -}

Another interesting exercise is reimplementing the groupBy combinator of the previous post:

groupBy :: Monad m => (a -> a -> Bool) -> Pipe a [a] m r
groupBy p =
   -- split the stream in two
   split >+>

   -- yield Nothing whenever (not (p x y))
   -- for consecutive x y
  ((consec >+>
    filter (not . uncurry p) >+>
    pipe (const Nothing)) *+*
  
  -- at the same time, let everything pass through
  pipe Just) >+>

  -- now rejoin the two streams
  join >+>
  
  -- then accumulate results until a Nothing is hit
  forever (until isNothing >+>
           pipe fromJust >+>
           (consume >>= yield))

-- yield consecutive pairs of values
consec :: Monad m => Pipe a (a, a) m r
consec = await >>= go
  where
    go x = await >>= \y -> yield (x, y) >> go y

ex7 :: IO ()
ex7 = do (sourceList [1,1,2,2,2,3,4,4]
          >+> groupBy (==)
          >+> pipe head
           $$ printer)
         return ()
{- ex7 == mapM_ print [1,2,3,4] -}

Paolo Capriotti's blog

Type theory, category theory, functional programming