Pipes 2.0 vs pipes-core

With the release of pipes 2.0 by Gabriel Gonzalez, I feel it’s time to address the question of whether my fork will eventually be merged or not.

The short answer is no, I will continue to maintain my separate incarnation pipes-core. In this post, I will discuss the reasoning behind this decision, and hopefully explain the various trade-offs that the two libraries make.

The issue with termination

pipes 1.0 can be quite accurately described as “composable monadic stream processors”. “Composable” alludes to horizontal composition (i.e. the Category instance), while “monadic” refers to vertical composition.

The existence of a Monad instance has a number of consequences, the most important being the fact that pipes can carry a “return value”, and, in particular, they can terminate.

The fact that pipes can terminate poses the greatest challenge when reasoning about the properties of (horizontal) composition, but termination is also one of the nicest features of pipes, so we want to deal with this difficulty appropriately.

Termination implies that any pipe has to deal somehow with the fact that its upstream pipe can exit before yielding a value, which basically means that an await can fail.

Gabriel’s pipes address this issue by simply “propagating termination downstream”. A pipe awaiting on a terminated pipe is forcibly terminated itself, and the upstream return value is returned.

My guarded pipes idea (later integrated into pipes-core), proposes a new primitive

tryAwait :: Pipe a b m (Maybe a)

that returns Nothing when upstream terminates before providing a value.

Using tryAwait, a pipe can then handle a failure caused by termination, and either return a value, or use the upstream value (the latter can be accomplished by simply awaiting again).

Exception handling

Once you realize that pipes should be able to handle failure on await, it becomes very natural to extend the idea to other kinds of failure.

That’s exactly the rationale behind pipes-core. It introduces slightly more involved primitives that take into account the fact that actions in the base monad, as well as pipes themselves, can throw an exception at any time.

One very interesting consequence of built-in exception handling is that the “guarded pipes” concept can be integrated seamlessly by introducing a special BrokenPipe exception.

The exception handling implementation in pipes-core works in any monad, and deals with asynchronous exceptions correctly. Of course, actual exceptions thrown from Haskell code can only be caught when the base monad is IO.

What about finalization?

Since all the finalization primitives in Control.Exception are implemented on top of exception handling primitives like catch and mask, I initially believed that finalization would follow automatically from exception handling capabilities in pipes.

Unfortunately, there is a fundamental operational difference between IO and Pipe, which makes exception handling alone insufficient to guarantee finalization of resources.

The problem is that some of the pipes in a pipeline are not guaranteed to be executed at all. In fact, a pipe only plays a role in pipeline execution if its downstream pipe awaits at some point (or if it is the last one).

The same applies to “portions” of pipes, so a pipe can execute partially, and then be completely forgotten, even if no exceptional condition occurs.

After a number of failed attempts (including the broken 0.0.1 release of pipes-core), I realized that Gabriel’s finalizer passing idea was the right one, and used it to replace my flawed ensure primitive.

Balancing safety and dynamicity

The question remains of how to guarantee that a pipe never awaits again after its upstream terminated.

My solution is dynamic: if upstream terminated because of an exception (that has been handled), just throw the exception again on await; if upstream terminated normally, throw a BrokenPipe exception.

Gabriel’s solution is static: a pipe is not allowed to await again after termination, and the invariant is enforced by the types.

The static solution has obvious advantages, but, on closer inspection, it reveals a number of downsides:

  1. It prevents Pipe from forming a Monad; the solution implemented in pipes 2.0 is to separate the Monad instance from the Category instance, and suggesting that the Monad instance should actually be replaced with an indexed monad.
  2. It doesn’t provide any exception handling mechanism, and doesn’t guarantee that finalizers will be called in case any exception occurs. I imagine that some sort of exception support could be layered on top of the current solution, but I’m guessing it’s not going to be straightforward.
  3. Folds are not compositional. This can be clearly seen in the tutorial, where strict is not defined in terms of toList. With pipes-core, you would simply have:
strict = consume >>= mapM yield
-- note that toList is called consume in pipes-core

What’s next for pipes-core

The current version of pipes-core already provides exception handling and guaranteed finalization in the face of asynchronous exceptions. Things that could be improved in its finalization support are:

  1. Finalization is currently guaranteed, but not always prompt. When an exception handler is provided, upstream finalization gets delayed unnecessarily.
  2. It is not possible to prematurely force finalization. I haven’t yet seen an example where this would be useful, but it would be nice to have it for completeness.

I think I know how these points can be addressed, and hopefully they will make it into the next release.

For future releases, I’d like to focus on performance. Aside from micro-optimizations, I can see two main areas that would benefit from improvements: the Monad instance and the Category instance.

The current monadic bind unfortunately displays a quadratic behavior, since it basically works like a naive list concatenation function. The Codensity transformation should address that.

For the Category instance, it would be interesting to explore whether it is possible to achieve some form of fusion of intermediate data structures, similarly to classic stream fusion for lists.

This is probably going to be more of a challenge, and will likely require some significant restructuring, but the prospective benefits are enormous. There is some research on this topic and an initial attempt I plan to draw ideas from.

My last point is about the absence of an unawait primitive for Pipe. There has been quite a lot of discussion on this topic, but I remain unconvinced that having builtin parsing capabilities is a good thing.

Whenever there is a need to chain unconsumed input, there are a few viable options already:

  1. Return leftover data, and add some manual wiring so that it’s passed to the “next” pipe.
  2. Use PutbackPipe from pipes-extra.
  3. Use an actual parser library and convert the parser to a Pipe (see pipes-attoparsec).

In all the examples I have seen, however, pipes are composable enough that all the special logic to deal with boundaries of chunked streams can be implemented in a single “filter” pipe, and the rest of the pipeline can ignore the issue altogether.

Comments

Applicative option parser

There are quite a few option parsing libraries on Hackage already, but they either depend on Template Haskell, or require some boilerplate. Although I have nothing against the use of Template Haskell in general, I’ve always found its use in this case particularly unsatisfactory, and I’m convinced that a more idiomatic solution should exist.

In this post, I present a proof of concept implementation of a library that allows you to define type-safe option parsers in Applicative style.

The only extension that we actually need is GADT, since, as will be clear in a moment, our definition of Parser requires existential quantification.

{-# LANGUAGE GADTs #-}
import Control.Applicative

Let’s start by defining the Option type, corresponding to a concrete parser for a single option:

data Option a = Option
  { optName :: String
  , optParser :: String -> Maybe a
  }

instance Functor Option where
  fmap f (Option name p) = Option name (fmap f . p)

optMatches :: Option a -> String -> Bool
optMatches opt s = s == '-' : '-' : optName opt

For simplicity, we only support “long” options with exactly 1 argument. The optMatches function checks if an option matches a string given on the command line.

We can now define the main Parser type:

data Parser a where
  NilP :: a -> Parser a
  ConsP :: Option (a -> b)
        -> Parser a -> Parser b

instance Functor Parser where
  fmap f (NilP x) = NilP (f x)
  fmap f (ConsP opt rest) = ConsP (fmap (f.) opt) rest

instance Applicative Parser where
  pure = NilP
  NilP f <*> p = fmap f p
  ConsP opt rest <*> p =
    ConsP (fmap uncurry opt) ((,) <$> rest <*> p)

The Parser GADT resembles a heterogeneous list, with two constructors.

The NilP r constructor represents a “null” parser that doesn’t consume any arguments, and always returns r as a result.

The ConsP constructor is the combination of an Option returning a function, and an arbitrary parser returning an argument for that function. The combined parser applies the function to the argument and returns a result.

The definition of (<*>) probably needs some clarification. The variables involved have types:

opt :: Option (a -> b -> c)
rest :: Parser a
p :: Parser b

and we want to obtain a parser of type Parser c. So we uncurry the option, obtaining:

fmap uncurry opt :: Option ((a, b) -> c)

and compose it with a parser for the (a, b) pair, obtained by applying the (<*>) operator recursively:

(,) <$> rest <*> p :: Parser (a, b)

This is already enough to define some example parsers. Let’s first add a couple of convenience functions to help us create basic parsers:

option :: String -> (String -> Maybe a) -> Parser a
option name p = ConsP (fmap const (Option name p)) (NilP ())
optionR :: Read a => String -> Parser a
optionR name = option name p
  where
    p arg = case reads arg of
      [(r, "")] -> Just r
      _       -> Nothing

And a record to contain the result of our parser:

data User = User
  { userName :: String
  , userId :: Integer
  } deriving Show

A parser for User is easily defined in applicative style:

parser :: Parser User
parser = User <$> option "name" Just <*> optionR "id"

To be able to actually use this parser, we need a “run” function:

runParser :: Parser a -> [String] -> Maybe (a, [String])
runParser (NilP x) args = Just (x, args)
runParser (ConsP _ _) [] = Nothing
runParser p (arg : args) =
  case stepParser p arg args of
    Nothing -> Nothing
    Just (p', args') -> runParser p' args'

stepParser :: Parser a -> String -> [String] -> Maybe (Parser a, [String])
stepParser p arg args = case p of
  NilP _ -> Nothing
  ConsP opt rest
    | optMatches opt arg -> case args of
        [] -> Nothing
        (value : args') -> do
          f <- optParser opt value
          return (fmap f rest, args')
    | otherwise -> do
        (rest', args') <- stepParser rest arg args
        return (ConsP opt rest', args')

The idea is very simple: we take the first argument, and we go over each option of the parser, check if it matches, and if it does, we replace it with a NilP parser wrapping the result, consume the option and its argument from the argument list, then call runParser recursively.

Here is an example of runParser in action:

ex1 :: Maybe User
ex1 = fst <$> runParser parser ["--name", "fry", "--id", "1"]
{- Just (User {userName = "fry", userId = 1}) -}

The order of arguments doesn’t matter:

ex2 :: Maybe User
ex2 = fst <$> runParser parser ["--id", "2", "--name", "bender"]
{- Just (User {userName = "bender", userId = 2}) -}

Missing arguments will result in a parse error (i.e. Nothing). We don’t support default values but they are pretty easy to add.

ex3 :: Maybe User
ex3 = fst <$> runParser parser ["--name", "leela"]
{- Nothing -}

I think the above Parser type represents a pretty clean and elegant solution to the option parsing problem. To make it actually usable, I would need to add a few more features (boolean flags, default values, a help generator) and improve error handling and performance (right now parsing a single option is quadratic in the size of the Parser), but it looks like a fun project.

Does anyone think it’s worth adding yet another option parser to Hackage?

Comments

Monoidal instances for pipes

In this post, I’m going to introduce a new class of combinators for pipes, with an interesting categorical interpretation. I will be using the pipe implementation of my previous post.

{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Blog.Pipes.MonoidalInstances where

import Blog.Pipes.Guarded hiding (groupBy)
import qualified Control.Arrow as A
import Control.Category
import Control.Categorical.Bifunctor
import Control.Category.Associative
import Control.Category.Braided
import Control.Category.Monoidal
import Control.Monad (forever)
import Control.Monad.Free
import Data.Maybe
import Data.Void
import Prelude hiding ((.), id, filter, until)

When pipes were first released, some people noticed the lack of an Arrow instance. In fact, it is not hard to show that, even identifying pipes modulo some sort of observational equality, there is no Arrow instance that satisfies the arrow laws.

The problem, of course, is with first, because we already have a simple implementation of arr. If we try to implement first we immediately discover that there’s a problem with the Yield case:

first (Yield x c) = yield (x, ???) >> first c

Since ??? can be of any type, the only possible value is bottom, which of course we don’t want to introduce. Alternative definitions of first that alter the structure of a yielding pipe are not possible if we want to satisfy the law:

first p >+> pipe fst == pipe fst >+> p

Concretely, the problem is that the cartesian product in the type of first forces a sort of “synchronization point” that doesn’t necessarily exist. This is better understood if we look at the type of (***), of which first can be thought of as a special case:

(***) :: Arrow k => k a b -> k a' b' -> k (a, a') (b, b')

first = (*** id)

If the two input pipes yield at different times, there is no way to faithfully match their yielded values into a pair. There are hacks around that, but they don’t behave well compositionally, and exhibit either arbitrarily large space leaks or data loss.

This has been addressed before: stream processors, like those of the Fudgets library, being very similar to Pipes, have the same problem, and some resolutions have been proposed, although not entirely satisfactory.

Arrows as monoidal categories

It is well known within the Haskell community that Arrows correspond to so called Freyd categories, i.e. premonoidal categories with some extra structures.

Using the Monoidal class by Edward Kmett (now in the categories package on Hackage), we can try to make this idea precise.

Unfortunately, we have to use a newtype to avoid overlapping instances in the case of the Hask category:

newtype ACat a b c = ACat { unACat :: a b c }
  deriving (Category, A.Arrow)

First, cartesian products are a bifunctor in the category determined by an Arrow.

instance A.Arrow a => PFunctor (,) (ACat a) (ACat a) where
  first = ACat . A.first . unACat
instance A.Arrow a => QFunctor (,) (ACat a) (ACat a) where
  second = ACat . A.second . unACat
instance A.Arrow a
      => Bifunctor (,) (ACat a) (ACat a) (ACat a) where
  bimap (ACat f) (ACat g) = ACat $ f A.*** g

Now we can say that products are associative, using the associativity of products in Hask:

instance A.Arrow a => Associative (ACat a) (,) where
  associate = ACat $ A.arr associate
instance A.Arrow a => Disassociative (ACat a) (,) where
  disassociate = ACat $ A.arr disassociate

Where we use the Disassociative instance to express the inverse of the associator. And finally, the Monoidal instance:

type instance Id (ACat a) (,) = ()
instance A.Arrow a => Monoidal (ACat a) (,) where
  idl = ACat $ A.arr idl
  idr = ACat $ A.arr idr
instance A.Arrow a => Comonoidal (ACat a) (,) where
  coidl = ACat $ A.arr coidl
  coidr = ACat $ A.arr coidr

Where, again, the duals are actually inverses. Also, products are symmetric:

instance A.Arrow a => Braided (ACat a) (,) where
  braid = ACat $ A.arr braid
instance A.Arrow a => Symmetric (ACat a) (,)

As you see, everything is trivially induced by the cartesian structure on Hask, since A.arr gives us an identity-on-objects functor. Note, however, that the Bifunctor instance is legitimate only if we assume a strong commutativity law for arrows:

first f >>> second g == second g >>> first f

which we will, for the sake of simplicity.

Replacing products with arbitrary monoidal structures

Once we express the Arrow concept in terms of monoidal categories, it is easy to generalize it to arbitrary monoidal structures on Hask.

In particular, coproducts work particularly well in the category of pipes:

instance Monad m
      => PFunctor Either (PipeC m r) (PipeC m r) where
  first = PipeC . firstP . unPipeC

firstP :: Monad m => Pipe a b m r
       -> Pipe (Either a c) (Either b c) m r
firstP (Pure r) = return r
firstP (Free (M m)) = lift m >>= firstP

Yielding a sum is now easy: just yield on the left component.

firstP (Free (Yield x c)) = yield (Left x) >> firstP c

Awaiting is a little bit more involved, but still easy enough: receive left and null values normally, and act like an identity on the right.

firstP (Free (Await k)) = go
        where
          go = tryAwait
           >>= maybe (firstP $ k Nothing)
                     (either (firstP . k . Just)
                             (\x -> yield (Right x) >> go))

And of course we have an analogous instance on the right:

instance Monad m
      => QFunctor Either (PipeC m r) (PipeC m r) where
  second = PipeC . secondP . unPipeC

secondP :: Monad m => Pipe a b m r
        -> Pipe (Either c a) (Either c b) m r
secondP (Pure r) = return r
secondP (Free (M m)) = lift m >>= secondP
secondP (Free (Yield x c)) = yield (Right x) >> secondP c
secondP (Free (Await k)) = go
        where
          go = tryAwait
           >>= maybe (secondP $ k Nothing)
                     (either (\x -> yield (Left x) >> go)
                             (secondP . k . Just))

And a bifunctor instance obtained by composing first and second in arbitrary order:

instance Monad m
      => Bifunctor Either (PipeC m r)
                   (PipeC m r) (PipeC m r) where
  bimap f g = first f >>> second g

At this point we can go ahead and define the remaining instances in terms of the identity-on-objects functor given by pipe:

instance Monad m => Associative (PipeC m r) Either where
  associate = PipeC $ pipe associate
instance Monad m => Disassociative (PipeC m r) Either where
  disassociate = PipeC $ pipe disassociate

type instance Id (PipeC m r) Either = Void
instance Monad m => Monoidal (PipeC m r) Either where
  idl = PipeC $ pipe idl
  idr = PipeC $ pipe idr
instance Monad m => Comonoidal (PipeC m r) Either where
  coidl = PipeC $ pipe coidl
  coidr = PipeC $ pipe coidr

instance Monad m => Braided (PipeC m r) Either where
  braid = PipeC $ pipe braid
instance Monad m => Symmetric (PipeC m r) Either

Multiplicative structures

There is still a little bit of extra structure that we might want to exploit. Since PipeC m r is a monoidal category, it induces a (pointwise) monoidal structure on its endofunctor category, so we can speak of monoid objects there. In particular, if the identity functor is a monoid, it means that we can define a “uniform” monoid structure for all the objects of our category, given in terms of natural transformations (i.e. polymorphic functions).

We can represent this specialized monoid structure with a type class (using kind polymorphism and appropriately generalized category-related type classes, it should be possible to unify this class with Monoid and even Monad, similarly to how it’s done here):

class Monoidal k p => Multiplicative k p where
  unit :: k (Id k p) a
  mult :: k (p a a) a

Dually, we can have a sort of uniform coalgebra:

class Comonoidal k p => Comultiplicative k p where
  counit :: k a (Id k p)
  comult :: k a (p a a)

The laws for those type classes are just the usual laws for a monoid in a (not necessarily strict) monoidal category:

first unit . mult == idl
second unit . mult == idr
mult . first mult == mult . second mult . associate

first counit . comult == coidl
second counit . comult == coidr
first diag . diag == disassociate . second diag . diag

Now, products have a comultiplicative structure on Hask (as in every category with finite products), given by the terminal object and diagonal natural transformation:

instance Comultiplicative (->) (,) where
  counit = const ()
  comult x = (x, x)

while coproducts have a multiplicative structure:

instance Multiplicative (->) Either where
  unit = absurd
  mult = either id id

that we can readily transport to PipeC m r using pipe:

instance Monad m => Multiplicative (PipeC m r) Either where
  unit = PipeC $ pipe absurd
  mult = PipeC $ pipe mult

Somewhat surprisingly, pipes also have a comultiplicative structure of their own:

instance Monad m => Comultiplicative (PipeC m r) Either where
  counit = PipeC discard
  comult = PipeC . forever $ do
    x <- await
    yield (Left x)
    yield (Right x)

Heterogeneous metaprogramming

All the combinators we defined can actually be used in practice, and the division in type classes certainly sheds some light on their structure and properties, but there’s actually something deeper going on here.

The fact that the standard Arrow class uses (,) as monoidal structure is not coincidental: Hask is a cartesian closed category, so to embed Haskell’s simply typed λ-calculus into some other category structure, we need at the very least a way to transport cartesian products, i.e. a premonoidal functor.

However, as long as our monoidal structure is comultiplicative and symmetric, we can always recover a first-order fragment of \(\lambda\)-calculus inside the “guest” category, and we don’t even need an identity-on-objects functor (see for example this paper).

The idea is that we can use the monoidal structure of the guest category to represent contexts, where weakening is given by counit, contraction by comult, and exchange by swap.

There is an experimental GHC branch with a preprocessor which is able to translate expressions written in an arbitrary guest language into Haskell, given instances of appropriate type classes , which correspond exactly to the ones we have defined above.

Examples

This exposition was pretty abstract, so we end with some examples.

We first need to define a few wrappers for our monoidal combinators, so we don’t have to deal with the PipeC newtype:

split :: Monad m => Pipe a (Either a a) m r
split = unPipeC comult

join :: Monad m => Pipe (Either a a) a m r
join = unPipeC mult

(*+*) :: Monad m => Pipe a b m r -> Pipe a' b' m r
      -> Pipe (Either a a') (Either b b') m r
f *+* g = unPipeC $ bimap (PipeC f) (PipeC g)

discardL :: Monad m => Pipe (Either Void a) a m r
discardL = unPipeC idl

discardR :: Monad m => Pipe (Either a Void) a m r
discardR = unPipeC idr

Now let’s write a tee combinator, similar to the tee command for shell pipes:

tee :: Monad m => Pipe a Void m r -> Pipe a a m r
tee p = split >+> firstP p >+> discardL

printer :: Show a => Pipe a Void IO r
printer = forever $ await >>= lift . print

ex6 :: IO ()
ex6 = do
  (sourceList [1..5] >+>
    tee printer >+>
    (fold (+) 0 >>= yield) $$
    printer)
  return ()
{- ex6 == mapM_ print [1,2,3,4,5,15] -}

Another interesting exercise is reimplementing the groupBy combinator of the previous post:

groupBy :: Monad m => (a -> a -> Bool) -> Pipe a [a] m r
groupBy p =
   -- split the stream in two
   split >+>

   -- yield Nothing whenever (not (p x y))
   -- for consecutive x y
  ((consec >+>
    filter (not . uncurry p) >+>
    pipe (const Nothing)) *+*
  
  -- at the same time, let everything pass through
  pipe Just) >+>

  -- now rejoin the two streams
  join >+>
  
  -- then accumulate results until a Nothing is hit
  forever (until isNothing >+>
           pipe fromJust >+>
           (consume >>= yield))

-- yield consecutive pairs of values
consec :: Monad m => Pipe a (a, a) m r
consec = await >>= go
  where
    go x = await >>= \y -> yield (x, y) >> go y

ex7 :: IO ()
ex7 = do (sourceList [1,1,2,2,2,3,4,4]
          >+> groupBy (==)
          >+> pipe head
           $$ printer)
         return ()
{- ex7 == mapM_ print [1,2,3,4] -}
Comments

An introduction to guarded pipes

Pipes are a very simple but powerful abstraction which can be used to implement stream-based IO, in a very similar fashion to iteratees and friends, or conduits. In this post, I introduce guarded pipes: a slight generalization of pipes which makes it possible to implement a larger class of combinators.

{-# LANGUAGE NoMonomorphismRestriction #-}
module Blog.Pipes.Guarded where

import Control.Category
import Control.Monad.Free
import Control.Monad.Identity
import Data.Maybe
import Data.Void
import Prelude hiding (id, (.), until, filter)

The idea behind pipes is straightfoward: fix a base monad m, then construct the free monad over a specific PipeF functor:

data PipeF a b m x = M (m x)
                   | Yield b x
                   | Await (Maybe a -> x)

instance Monad m => Functor (PipeF a b m) where
  fmap f (M m) = M $ liftM f m
  fmap f (Yield x c) = Yield x (f c)
  fmap f (Await k) = Await (f . k)

type Pipe a b m r = Free (PipeF a b m) r

Generally speaking, a free monad can be thought of as an embedded language in CPS style: every summand of the base functor (PipeF in this case), is a primitive operation, while the x parameter represents the continuation at each step.

In the case of pipes, M corresponds to an effect in the base monad, Yield produces an output value, and Await blocks until it receives an input value, then passes it to its continuation. You can see that the Await continuation takes a Maybe a type: this is the only thing that distinguishes guarded pipes from regular pipes (as implemented in the pipes package on Hackage). The idea is that Await will receive Nothing whenever the pipe runs out of input values. That will give it a chance to do some cleanup or yield extra outputs. Any additional Await after that point will terminate the pipe immediately.

We can write a simplistic list-based (strict) interpreter formalizing the semantics I just described:

evalPipe :: Monad m => Pipe a b m r -> [a] -> m [b]
evalPipe p xs = go False xs [] p

The boolean parameter is going to be set to True as soon as we execute an Await with an empty input list.

A Pure value means that the pipe has terminated spontaneously, so we return the accumulated output list:

  where
    go _ _ ys (Pure r) = return (reverse ys)

Execute inner monadic effects:

    go t xs ys (Free (M m)) = m >>= go t xs ys

Save yielded values into the accumulator:

    go t xs ys (Free (Yield y c)) = go t xs (y : ys) c

If we still have values in the input list, feed one to the continuation of an Await statement.

    go t (x:xs) ys (Free (Await k)) = go t xs ys $ k (Just x)

If we run out of inputs, pass Nothing to the Await continuation…

    go False [] ys (Free (Await k)) = go True [] ys (k Nothing)

… but only the first time. If the pipe awaits again, terminate it.

    go True [] ys (Free (Await _)) = return (reverse ys)

To simplify the implementation of actual pipes, we define the following basic combinators:

tryAwait :: Monad m => Pipe a b m (Maybe a)
tryAwait = wrap $ Await return

yield :: Monad m => b -> Pipe a b m ()
yield x = wrap $ Yield x (return ())

lift :: Monad m => m r -> Pipe a b m r
lift = wrap . M . liftM return

and a couple of secondary combinators, very useful in practice. First, a pipe that consumes all input and never produces output:

discard :: Monad m => Pipe a b m r
discard = forever tryAwait

then a simplified await primitive, that dies as soon as we stop feeding values to it.

await :: Monad m => Pipe a b m a
await = tryAwait >>= maybe discard return

now we can write a very simple pipe that sums consecutive pairs of numbers:

sumPairs :: (Monad m, Num a) => Pipe a a m ()
sumPairs = forever $ do
  x <- await
  y <- await
  yield $ x + y

we get:

ex1 :: [Int]
ex1 = runIdentity $ evalPipe sumPairs [1,2,3,4]
{- ex1 == [3, 7] -}

Composing pipes

The usefulness of pipes, however, is not limited to being able to express list transformations as monadic computations using the await and yield primitives. In fact, it turns out that two pipes can be composed sequentially to create a new pipe.

infixl 9 >+>
(>+>) :: Monad m => Pipe a b m r -> Pipe b c m r -> Pipe a c m r
(>+>) = go False False
  where

When implementing evalPipe, we needed a boolean parameter to signal upstream input exhaustion. This time, we need two boolean parameters, one for the input of the upstream pipe, and one for its output, i.e. the input of the downstream pipe. First, if downstream does anything other than waiting, we just let the composite pipe execute the same action:

    go _ _ p1 (Pure r) = return r
    go t1 t2 p1 (Free (Yield x c)) = yield x >> go t1 t2 p1 c
    go t1 t2 p1 (Free (M m)) = lift m >>= \p2 -> go t1 t2 p1 p2

then, if upstream is yielding and downstream is waiting, we can feed the yielded value to the downstream pipe and continue from there:

    go t1 t2 (Free (Yield x c)) (Free (Await k)) =
      go t1 t2 c $ k (Just x)

if downstream is waiting and upstream is running a monadic computation, just let upstream run and keep downstream waiting:

    go t1 t2 (Free (M m)) p2@(Free (Await _)) =
      lift m >>= \p1 -> go t1 t2 p1 p2

if upstream terminates while downstream is waiting, finalize downstream:

    go t1 False p1@(Pure _) (Free (Await k)) =
      go t1 True p1 (k Nothing)

but if downstream awaits again, terminate the whole composite pipe:

    go _ True (Pure r) (Free (Await _)) = return r

now, if both pipes are waiting, we keep the second pipe waiting and we feed whatever input we get to the first pipe. If the input is Nothing, we set the first boolean flag, so that next time the first pipe awaits, we can finalize the downstream pipe.

    go False t2 (Free (Await k)) p2@(Free (Await _)) =
      tryAwait >>= \x -> go (isNothing x) t2 (k x) p2
    go True False p1@(Free (Await _)) (Free (Await k)) =
      go True True p1 (k Nothing)
    go True True p1@(Free (Await _)) p2@(Free (Await _)) =
      tryAwait >>= \_ -> {- unreachable -} go True True p1 p2

This composition can be shown to be associative (in a rather strong sense), with identity given by:

idP :: Monad m => Pipe a a m r
idP = forever $ await >>= yield

So we can define a Category instance:

newtype PipeC m r a b = PipeC { unPipeC :: Pipe a b m r }

instance Monad m => Category (PipeC m r) where
  id = PipeC idP
  (PipeC p2) . (PipeC p1) = PipeC $ p1 >+> p2

Running pipes

A runnable pipe, also called Pipeline, is a pipe that doesn’t yield any value and doesn’t wait for any input. We can formalize this in the types as follows:

type Pipeline m r = Pipe () Void m r

Disregarding bottom, calling await on such a pipe does not return any useful value, and yielding is impossible. Another way to think of Pipeline is as an arrow (in PipeC) from the terminal object to the initial object of Hask1.

Running a pipeline is straightforward:

runPipe :: Monad m => Pipeline m r -> m r
runPipe (Pure r) = return r
runPipe (Free (M m)) = m >>= runPipe
runPipe (Free (Await k)) = runPipe $ k (Just ())
runPipe (Free (Yield x c)) = absurd x

where the impossibility of the last case is guaranteed by the types, unless of course the pipe introduced a bottom value at some point.

The three primitive operations tryAwait, yield and lift, together with pipe composition and the runPipe function above, are basically all we need to define most pipes and pipe combinators. For example, the simple pipe interpreter evalPipe can be easily rewritten in terms of these primitives:

evalPipe' :: Monad m => Pipe a b m r -> [a] -> m [b]
evalPipe' p xs = runPipe $
  (mapM_ yield xs >> return []) >+>
  (p >> discard) >+>
  collect id
  where
    collect xs =
      tryAwait >>= maybe (return $ xs [])
                         (\x -> collect (xs . (x:)))

Note that we use the discard pipe to turn the original pipe into an infinite one, so that the final return value will be taken from the final pipe.

Extra combinators

The rich structure on pipes (category and monad) makes it really easy to define new higher-level combinators. For example, here are implementations of some of the combinators in Data.Conduit.List, translated to pipes:

sourceList = mapM_ yield
sourceNull = return ()
fold f z = go z
  where
    go x = tryAwait >>= maybe (return x) (go . f x)
consume = fold (\xs x -> xs . (x:)) id >>= \xs -> return (xs [])
sinkNull = discard
take n = (isolate n >> return []) >+> consume
drop n = replicateM n await >> idP
pipe f = forever $ await >>= yield . f -- called map in conduit
concatMap f = forever $ await >>= mapM_ yield . f
until p = go
  where
    go = await >>= \x -> if p x then return () else yield x >> go
groupBy (~=) = p >+>
  forever (until isNothing >+>
           pipe fromJust >+>
           (consume >>= yield))
  where 
    -- the pipe p yields Nothing whenever the current item y
    -- and the previous one x do not satisfy x ~= y, and behaves
    -- like idP otherwise
    p = await >>= \x -> yield (Just x) >> go x
    go x = do
      y <- await
      unless (x ~= y) $ yield Nothing
      yield $ Just y
      go y
isolate n = replicateM_ n $ await >>= yield
filter p = forever $ until (not . p)

To work with the equivalent of sinks, it is useful to define a source to sink composition operator:

infixr 2 $$
($$) :: Monad m => Pipe () a m r' -> Pipe a Void m r -> m (Maybe r)
p1 $$ p2 = runPipe $ (p1 >> return Nothing) >+> liftM Just p2

which ignores the source return type, and just returns the sink return value, or Nothing if the source happens to terminate first. So we have, for example:

ex2 :: Maybe [Int]
ex2 = runIdentity $ sourceList [1..10] >+> isolate 4 $$ consume
{- ex2 == Just [1,2,3,4] -}

ex3 :: Maybe [Int]
ex3 = runIdentity $ sourceList [1..10] $$ discard
{- ex3 == Nothing -}

ex4 :: Maybe Int
ex4 = runIdentity $ sourceList [1,1,2,2,2,3,4,4]
                >+> groupBy (==)
                >+> pipe head
                 $$ fold (+) 0
{- ex4 == Just 10 -}

ex5 :: Maybe [Int]
ex5 = runIdentity $ sourceList [1..10]
                >+> filter (\x -> x `mod` 3 == 0)
                 $$ consume
{- ex5 == Just [3, 6, 9] -}

Pipes in practice

You can find an implementation of guarded pipes in my fork of pipes. There is also a pipes-extra repository where you can find some pipes to deal with chunked ByteStreams and utilities to convert conduits to pipes.

I hope to be able to merge this into the original pipes package once the guarded pipe concept has proven its worth. Without the tryAwait primitive, combinators like fold and consume cannot be implemented, nor even a simple stateful pipe like one to split a chunked input into lines. So I think there are enough benefits to justify a little extra complexity in the definition of composition.


  1. In reality, Hask doesn’t have an initial object, and the terminal object is actually Void, because of non-strict semantics.

Comments

Reinversion of control with continuations

In my last post I mentioned how it is possible to achieve a form of “reinversion of control” by using (green) threads. Some commenters noted how this is effectively a solved problem, as demonstrated for example by Erlang, as well as the numerous variations on CSP currently gaining a lot of popularity.

I don’t disagree with that, but it’s just not the point of this series of posts. This is about understanding the computational structure of event-driven code, and see how it’s possible to transform it into a less awkward form without introducing concurrency (or at least not in the traditional sense of the term).

Using threads to solve what is essentially a control flow problem is cheating. And you pay in terms of increased complexity, and code which is harder to reason about, since you introduced a whole lot of interleaving opportunities and possible race conditions. Using a non-preemptive concurrency abstraction with manual yield directives (like my Python gist does) will solve that, but then you’d have to think of how to schedule your coroutines, so that is also not a complete solution.

Programmable semicolons

To find an alternative to the multitask-based approach, let’s focus on two particular lines of the last example:

reply = start_request();
get_data(reply)

where I added an explicit semicolon at the end of the first line. A semicolon is an important component of an imperative program, even though, syntactically, it is often omitted in languages like Python. It corresponds to the sequencing operator: execute the instruction on the left side, then pass the result to the right side and execute that.

If the instruction on the left side corresponds to an asynchronous operation, we want to alter the meaning of sequencing. Given a sequence of statements of the form

x = A(); B(x)

we want to interpret that as: call A, then return control back to the main loop; when A is finished, bind its result to x, then run B.

So what we want is to be able to override the sequencing operator: we want programmable semicolons.

The continuation monad

Since it is often really useful to look at the types of functions to understand how exactly they fit together, we’ll leave Python and start focusing on Haskell for our running example.

We can make a very important observation immediately by looking at the type of the callback registration function that our framework offers, and try to interpret it in the context of controlled side effects (i.e. the IO monad). For Qt, it could look something like:

connect :: Object -> String -> (a -> IO ()) -> IO ()

to be used, for example, like this:

connect httpReply "finished()" $ \_ -> do
    putStrLn "request finished"

so the first argument is the object, the second is the C++ signature of the signal, and the third is a callback that will be invoked by the framework whenever the specified signal is emitted. Now, we can get rid of all the noise of actually connecting to a signal, and define a type representing just the act of registering a callback.

newtype Event a = Event { on :: (a -> IO ()) -> IO () }

Doesn’t that look familiar? It is exactly the continuation monad transformer applied to the IO monad! The usual monad instance for ContT perfectly captures the semantics we are looking for:

instance Monad Event where
  return x = Event $ \k -> k x
  e >>= f = Event $ \k ->
    on e $ \x ->
      on (f x) k

The return function simply calls the callback immediately with the provided value, no actual connection is performed. The bind operator represents our custom semicolon: we connect to the first event, and when that fires, we take the value it yielded, apply it to f, and connect to the resulting event.

Now we can actually translate the Python code of the previous example to Haskell:

ex :: Event ()
ex = forever $ do
  result <- untilRight . replicate 2 $ do
    reply <- startRequest
    either (return . Left) (liftM Right . getData) reply
  either handleError displayData result

untilRight :: Monad m => [m (Either a b)] -> m (Either a b)
untilRight [m] = m
untilRight (m : ms) = m >>= either (const (untilRight ms)) (return . Right)

Again, this could be cleaned up by adding some error reporting functionality into the monad stack.

Implementing the missing functions in terms of connect is straightforward. For example, startRequest will look something like this:

startRequest :: Event (Either String Reply)
startRequest = Event $ \k -> do
  reply <- AccessManager.get "http://example.net"
  connect reply "finished()" $ \_ -> k (Right reply)
  connect reply "error(QString)" $ \e -> k (Left e)

where I took the liberty of glossing over some irrelevant API details.

How do we run such a monad? Well, the standard runContT does the job:

runEvent :: Event () -> IO ()
runEvent e = on $ \k -> return ()

so

runEvent ex

will run until the first connection, return control to the main loop, resume when an event occurs, and so on.

Conclusion

I love the simplicity and elegance of this approach, but unfortunately, it is far from a complete solution. So far we have only dealt with “one-shot” events, but what happens when an event fires multiple times? Also, as this is still very imperative in nature, can we do better? Is it possible to employ a more functional style, with emphasis on composability?

I’ll leave the (necessarily partial) answers to those questions for a future post.

Comments