While implementing Haskell functions with deeply nested logic, I’ve discovered a particular trick that allows writing more modular code by solving annoying indentation errors at the same time.
The described technique uses the CPS transformation and Haskell layout parsing rules. It sounds scary but in its core it’s just a refactoring using Higher-Order Functions, no fancy features involved.
Problem
To understand the solution, let’s first look at the problem.
📜 DISCLAIMER: I’m using one single example throughout the entire blog post. This example introduces a specific problem and solves it using the described technique. It serves only in educational purposes of showcasing the particular refactoring methods. You could’ve have a similar problem and solved it differently. You could’ve even avoid having this problem at all by designing your software differently. I’m not arguing here that the proposed solution is the best solution. So, please, don’t argue with me by saying I don’t understand what I’m doing because you didn’t have a similar problem.
Let’s say we’re working on the backend part of a web-application. We want to write a function that associates a user in our system with with their email if the user doesn’t already have an email. To be more specific, this function should do the following:
- Take a user session token and get the user ID from it.
- Check if the given user already has an email.
- Check if someone else already associated this email with them.
- If all good, associate the user with an email by inserting the data in the database.
In a simplified case, let’s say that we already have the following types and functions:
data UserSession = ...
data UserId = ...
data Email = ...
data ID = ...
data InternalDbError = ...
validateUserSession :: UserSession -> IO (Maybe UserId)
getEmailByUserId :: UserId -> IO (Maybe Email)
getUserIdByEmail :: Email -> IO (Maybe UserId)
insertUserEmail :: UserId -> Email -> IO (Either InternalDbError ID)And now we want to glue our API functions together into the following function and report all possible error cases as well:
data AppError
= UserSessionIsInvalid
| UserAlreadyHasEmail
| UserHasDifferentEmail
| EmailIsTaken
| DbError InternalDbError
associateEmail :: UserSession -> Email -> IO (Either AppError ID)Baseline
The simplest and straighforward implementation of the desired function would be to use all the API methods directly and pattern-match on their results as we go. This is demonstrated by the following code snippet:
associateEmail :: UserSession -> Email -> IO (Either AppError ID)
associateEmail userSession email =
validateUserSession userSession >>= \case
Nothing -> pure $ Left UserSessionIsInvalid
Just userId -> getEmailByUserId userId >>= \case
Just otherEmail
| email == otherEmail -> pure $ Left UserAlreadyHasEmail
| otherwise -> pure $ Left UserHasDifferentEmail
Nothing -> getUserIdByEmail email >>= \case
Just otherUserId -> pure $ Left EmailIsTaken
Nothing -> insertUserEmail userId email >>= \case
Left dbErr -> pure $ Left $ DbError dbErr
Right id' -> pure $ Right id'👩🔬 The above code snippet uses the LambdaCase Haskell feature.
This simple implementation doesn’t use any advanced features and is easy to read by Haskell beginners. However, it has several problems that we would like to solve:
- The code is deeply nested. It slowly turns into a weird staircase as we add more steps.
- It mixes business logic with pesky implementation details. The high-level description of this function has only four clear steps but it’s extremely difficult to extract them quickly from this particular implementation. The problem becomes more severe as you add more steps or start introducing more complicated logic (e.g. logging, monitoring, etc.).
Either-isation
A one possible approach to improving the situation would be to
introduce a separate function for each individual step. Those functions
would return Either AppError smth and will hide the
implementation details. The code will look like this:
checkUserSession :: UserSession -> IO (Either AppError UserId)
checkUserSession userSession = validateUserSession userSession >>= \case
Nothing -> pure $ Left UserSessionIsInvalid
Just userId -> pure $ Right userId
checkUserEmail :: UserId -> Email -> IO (Either AppError ())
checkUserEmail userId email = getEmailByUserId userId >>= \case
Just otherEmail
| email == otherEmail -> pure $ Left UserAlreadyHasEmail
| otherwise -> pure $ Left UserHasDifferentEmail
Nothing -> pure $ Right ()
checkOtherUserEmail :: Email -> IO (Either AppError ())
checkOtherUserEmail email = getUserIdByEmail email >>= \case
Just otherUserId -> pure $ Left EmailIsTaken
Nothing -> pure $ Right ()
checkEmailInsert :: UserId -> Email -> IO (Either AppError ID)
checkEmailInsert userId email = insertUserEmail userId email >>= \case
Left dbErr -> pure $ Left $ DbError dbErr
Right id' -> pure $ Right id'
associateEmail
:: UserSession
-> Email
-> IO (Either AppError ID)
associateEmail userSession email =
checkUserSession userSession >>= \case
Left err -> pure $ Left err
Right userId -> checkUserEmail userId email >>= \case
Left err -> pure $ Left err
Right () -> checkOtherUserEmail email >>= \case
Left err -> pure $ Left err
Right () -> checkEmailInsert userId email👩🔬 Here we explicitly prefer to return
Either AppError ()instead ofMaybe AppErroror something else for consistency. This will also become handy as we explore other approaches.
This refactoring is simple and pretty straightforward but it doesn’t
solved any of two problems we have above despite writing much more code.
Though, the code of associateEmail became slightly more
understandable due to uniform handling of errors and we’re now able to
add more implementation details to each step without polluting the
implementation of associateEmail.
ExceptT
The problem with out-of-control nested indentation annoyed thousands
of Haskell developers even before you and me. In this particular case,
where all your functions return
IO (Either TheSameErrorType anything), you can use the ExceptT
monad transformer.
I won’t go into much details about ExcepT. This blog
post is not a monad transformer tutorial. But I’m going to mention this
approach as it’s quite common and still pretty basic in terms of
language features.
Once you performed Either-isation of your functions, you can wrap
each function into the ExceptT type. This change is quite
mechanical in our case:
-checkUserSession :: UserSession -> IO (Either AppError UserId)
+checkUserSession :: UserSession -> ExceptT AppError IO UserId
-checkUserSession userSession = validateUserSession userSession >>= \case
+checkUserSession userSession = ExceptT $ validateUserSession userSession >>= \caseAfter applying this refactoring to each function, we can update the
implementation of associateEmail:
associateEmail
:: UserSession
-> Email
-> IO (Either AppError ID)
associateEmail userSession email = runExceptT $ do
userId <- checkUserSession userSession
checkUserEmail userId email
checkOtherUserEmail email
checkEmailInsert userId emailNow we finally solved our both problems! However, at what cost 😢
We’ve introduced a rather complicated concept of monad transformers to
the codebase which makes the code less beginner-friendly. On top of
that, ExceptT brings different problems such as broken behaviour when
used with concurrency.
CPS transformation
The ExceptT solution allows us to achieve the desired
shape of the associateEmail function but this approach has
several drawbacks. Let’s use a different approach — Continuation-Passing
Style (CPS) transformation.
The term CPS itself is not modern. It exists since 1975. It’s usually mentioned a lot in compiler development or in relation to callback hell. However, we’re going to put a new perspective on this good old trick.
The main idea behind the CPS transformation is simple: instead of returning a value from a function, we take a callback (also called “continutation”) as an argument and run it on the result instead of returning the result.
For example, here is what we had before:
checkUserSession :: UserSession -> IO (Either AppError UserId)
checkUserSession userSession = validateUserSession userSession >>= \case
Nothing -> pure $ Left UserSessionIsInvalid
Just userId -> pure $ Right userIdAnd here is checkUserSession after applying the CPS
transformation:
withUserSession
:: UserSession
-> (UserId -> IO (Either AppError a))
-> IO (Either AppError a)
withUserSession userSession next = validateUserSession userSession >>= \case
Nothing -> pure $ Left UserSessionIsInvalid
Just userId -> next userIdIn this transformation, I performed several things:
- Renamed the function from
checkUserSessiontowithUserSessionto convey the intention behind this function: it takes an action to be performed “with user session”. - Added extra argument — continuation — a function of type
UserId -> IO (Either AppError a). SowithUserSessionis a Higher-Order Function now. ⚠️ Important: the return type of our continuation is polymorphic (as well as the return type of our function) to make usage more flexible. - Split type signature into multiple lines as it’s became longer here.
- Returned
next userIdinstead ofpure $ Right userIdin the end.
Our transformed withUserSession function now either
short-circuits with an error or runs a given continuation with
userId in case of success. It takes a continuation and
passes the result to it, hence the name — Continuation-Passing
Style.
Now, we apply the CPS transformation to every function:
withCheckedUserEmail
:: UserId
-> Email
-> IO (Either AppError a)
-> IO (Either AppError a)
withCheckedUserEmail userId email next = getEmailByUserId userId >>= \case
Just otherEmail
| email == otherEmail -> pure $ Left UserAlreadyHasEmail
| otherwise -> pure $ Left UserHasDifferentEmail
Nothing -> next
withCheckedOtherUserEmail
:: Email
-> IO (Either AppError a)
-> IO (Either AppError a)
withCheckedOtherUserEmail email next = getUserIdByEmail email >>= \case
Just otherUserId -> pure $ Left EmailIsTaken
Nothing -> next
withEmailInsert
:: UserId
-> Email
-> (ID -> IO (Either AppError a))
-> IO (Either AppError a)
withEmailInsert userId email next = insertUserEmail userId email >>= \case
Left dbErr -> pure $ Left $ DbError dbErr
Right id' -> next id'A few notes on the implementation:
- When we don’t have a meaningful argument to pass to continuation
(e.g. the function returns a value of the unit type
()), we can simply pass an action instead of a function. withEmailInsertwill be called last but we still apply the CPS transformation to it. This approach is better for composability in case we want to use this function in the middle of another function and not necessarily in the end.
After we’ve performed this refactoring, we can now use our
CPS-transformed functions in associateEmail. We’re going to
use the chain of nested continuations:
associateEmail
:: UserSession
-> Email
-> IO (Either AppError ID)
associateEmail userSession email =
withUserSession userSession $ \userId ->
withCheckedUserEmail userId email $
withCheckedOtherUserEmail email $
withEmailInsert userId email $ \id' ->
pure $ Right id'What will happen is that withUserSession will validate
user session and call the \userId -> ... lambda if
everything is good. The lambda starts with the call to
withCheckedUserEmail. This function will then check if the
user already has an email and call the next action (starting with
withCheckedOtherUserEmail) after the validation passes. And
so on.
Wait a minute! You can say that the implementation of
associateEmail is still deeply nested and you’ll be
absolutely right. We see the four steps clearly but we still have this
annoying alignment requirement.
The solution to this problem is simply to not indent the code at all:
associateEmail
:: UserSession
-> Email
-> IO (Either AppError ID)
associateEmail userSession email =
withUserSession userSession $ \userId ->
withCheckedUserEmail userId email $
withCheckedOtherUserEmail email $
withEmailInsert userId email $ \id' ->
pure $ Right id'
This works because Haskell parsing rules around layouts are quite
flexible in this case. Semantically, everything “to the right of” lambda
belongs to this lambda. So we don’t need this extra indentation here.
This doesn’t work for case-of expressions and
do-notation blocks but surprisingly works for lambdas and
arguments after $ outside do blocks.
You can go one step further in making this code look cleaner. I’m not a fan of the BlockArguments extension in Haskell but if you like it, you can write the final function with slightly less operator noise:
associateEmail
:: UserSession
-> Email
-> IO (Either AppError ID)
associateEmail userSession email =
withUserSession userSession \userId ->
withCheckedUserEmail userId email $
withCheckedOtherUserEmail email $
withEmailInsert userId email \id' ->
pure $ Right id'You can check the entire code after applying the CPS transformation here:
Conclusion
In this blog post I’ve described how you can make code cleaner by using the fundamental Haskell feature Higher-Order Functions with combination of 50-years old technique. Sometimes, you don’t need fancy tools to improve your life when fundamentals are solid and already powerful enough.
If you liked this blog post, consider supporting my work on GitHub Sponsors, or following me on the Internet: