In which I introduce the power of monadic programming without discussing functors, category theory, and other things that probably aren't real.
Mark Cohen, July 8, 2018
Monadic programming is something I’ve run across often but never understood. And
really, how could anyone? Just take a look at
the Haskell Wiki’s description. Haskell’s
Control.Monad class is easily the
most widespread use of monadic programming, and yet this wiki page is impossible
to understand if you don’t already understand it. The Haskell Wiki maintainers
even admit as much right in the page:
Monads are known for being deeply confusing to lots of people, so there are plenty of tutorials specifically related to monads. Each takes a different approach to Monads, and hopefully everyone will find something useful.
In this post, we’ll build on the interpreter discussed in my previous post. Make sure to read that one first if you haven’t yet. As before, I assume no knowledge on your part, aside from general programming experience.
In my previous post, I outlined several places where errors should occur. For example, input that doesn’t match the concrete syntax would raise a scan error. What I didn’t discuss is how exactly those errors should be raised! This is the primary topic of today’s post.
Let’s discuss some easy methods, taking stock of the advantages and disadvantages of each.
The first and simplest approach is to just raise exceptions whenever errors
occur. This has the advantage of not affecting any other parts of our program:
in all sensible languages, exceptions are implemented as a sort of “chameleon”
type, which makes them valid return values for any function. That is to say, we
don’t have to modify the types of any of our functions. Furthermore, downstream
parts of the interpreter don’t have to worry about upstream errors. To better
illustrate this point, let’s recall the
fun interpret program = let val tokens = scan program val ast = parse tokens val checked-ast = typecheck ast val result = eval checked-ast in result end
Now, let’s suppose that we call
interpret on the program
which is undefined in our concrete syntax. So,
scan should raise a scan error.
But the downstream functions (
eval) don’t have to
worry about this case: unhandled exceptions immediately terminate the
interpreter. Thus, the raise-exceptions approach allows us to write each phase
of the interpreter under the assumption that the previous phase succeeded.
We’ve actually already discussed the main disadvantage of this approach; it’s the fact that unhandled exceptions immediately terminate the interpreter. However, with this approach, it’s impossible to do things like print out extra context, suggest possible remedies, or underline the relevant piece of source code when an error occurs. This is a pretty big drawback; an interpreter that is unable to provide useful output is pretty unusable for any kind of serious development.
A small modification to the above approach is to handle any exceptions in the
interpret function. This solves the problem of being unable to do anything
once an exception is raised, but now,
interpret suddenly becomes really messy:
fun interpret program = let val tokens = (scan program) handle (ScanError msg) => ...
I’ll let you imagine what the rest of the function might look like, but hopefully it’s clear that this approach heavily pollutes the top level of the interpreter.
Let’s take a minute to conceptualize exactly what we’re trying to accomplish with our error handling. We want a way for a given phase of the interpreter to either succeed or fail. In the success case, we want to transparently pass the result to the next phase; in the failure case, we want to gracefully print out a message. Let’s define an abstract type that satisfies this model.
Consider the following Standard ML type definition:
datatype 'a attempt = Success of 'a | Failure of string
First, let’s unpack a bit of the Standard ML syntax here. Read aloud,
“alpha”, and it represents a generic type variable. Standard ML writes its
composed types in the reverse order of most languages, so this definition gives
us types like:
int list attempt
bool attempt list
Then, let’s discuss the anatomy of an
attempt. For a given type, say
we can have either a
Success of bool or a
Failure of string. So, the
following are valid
Success (not (not false))
Failure "another error!"
So let’s return to the first part of the approach we’re trying to model:
We want a way for a given phase of the interpreter to either succeed or fail.
We can now completely solve this by simply modifying the type signatures of each phase. Let’s recall the old type signatures from the previous post:
scan: string -> token list parse: token list -> ast typecheck: ast -> ast eval: ast -> ast
Now, all we need to do is stick
attempt on the end:
scan: string -> token list attempt parse: token list -> ast attempt typecheck: ast -> ast attempt eval: ast -> ast attempt
So, for example, a successful call to
scan might return something like
Success [LParen, Succ, Zero, RParen], whereas a scan error might return
Failure "Scan error: invalid identifier 198asldfk".
So we’ve taken care of the first part of the model: allowing a given phase of the interpreter to succeed or fail. The real magic comes when we compose these functions.
Let’s recall the second part of the model:
In the success case, we want to transparently pass the result to the next phase; in the failure case, we want to gracefully print out a message.
First, we could of course do something like the following:
fun interpret program = let val tokens? = scan program val ast? = (case tokens? of Success tokens => parse tokens | Failure msg => Failure msg) ...
But, as you may have noticed, this approach becomes messy really fast. However, we can actually neatly abstract this approach into something much more powerful.
Let’s shift gears a little bit and the model again. We want a function that allows us to compose the result of one phase with the next phase. What would the type signature of this function be? I encourage you to pause here and try working it out on your own - struggling through this type signature was 2⁄3 of the battle for me.
So, without further ado:
fun continueIfPossible (att: 'a attempt) (f: 'a -> 'b attempt) : 'b attempt
Let’s discuss this signature a bit. This function takes two arguments: an
attempt, and a function that takes an
'a (read: “alpha”) and returns a
attempt (read: “beta attempt”). Then, its return type is
'b attempt, the same
as the function it takes in. The crucial connection here is that each phase of
our interpreter satisfies the type signature of the function argument
'a -> 'b attempt. Notice, in the case of
token list. Then, in the case of
token list and
Now, let’s just pull out the messy part of our above attempt (no pun intended)
interpret to make the body of
fun continueIfPossible att f = (case att of Success x => f x | Failure msg => Failure msg)
This function satisfies our whole model. The
attempt type allows each phase of
the interpreter to succeed or fail, and the
case statement supports continuing
the computation in the success case or gracefully printing an error message in
the failure case (by simply propagating that error message along). Let’s define
a bit of syntactic sugar…
infix >>= fun att >>= f = continueIfPossible att f
…and watch the magic happen.
fun compile program = (Success program) >>= scan >>= parse >>= typecheck >>= eval
First, regarding the above example, I encourage you to check on your own that the types work out here; that exercise was the last step for me in truly understanding monadic programming.
Now, while I’ve gone through this whole post without discussing what a monad is,
I haven’t neglected to use them. In functional programming parlance, the
attempt type, together with the
continueIfPossible function is a monad (one
would say that together, the two are “the attempt monad”). Strictly speaking, a
monad is any one of an infinite number of similar type definitions and
associated composition rules. I find this definition completely unhelpful; even
now, articles and wiki pages on monads are extremely dense to me. I think it’s
more helpful to think instead of monadic programming, which is merely the
style of programming that builds computations in a similar manner to what we’ve
Okay, you got me there. I can’t say that any of these definitions are directly useful; pure functional languages (let alone Standard ML) are rare enough as it is. However, I’ve found the pattern of thought that goes into designing a monadic program to be extremely useful, even outside of pure-functional contexts. Hopefully you’ll be able to use this knowledge to similar ends.
Discuss on lobste.rs. Thanks for reading!