Using a 50-years old technique for solving modern issues

Created: June 30, 2022

While implementing Haskell functions with deeply nested logic, I’ve discovered a particular trick that allows writing more modular code by solving annoying indentation errors at the same time.

The described technique uses the CPS transformation and Haskell layout parsing rules. It sounds scary but in its core it’s just a refactoring using Higher-Order Functions, no fancy features involved.

Problem

To understand the solution, let’s first look at the problem.

📜 DISCLAIMER: I’m using one single example throughout the entire blog post. This example introduces a specific problem and solves it using the described technique. It serves only in educational purposes of showcasing the particular refactoring methods. You could’ve have a similar problem and solved it differently. You could’ve even avoid having this problem at all by designing your software differently. I’m not arguing here that the proposed solution is the best solution. So, please, don’t argue with me by saying I don’t understand what I’m doing because you didn’t have a similar problem.

Let’s say we’re working on the backend part of a web-application. We want to write a function that associates a user in our system with with their email if the user doesn’t already have an email. To be more specific, this function should do the following:

  1. Take a user session token and get the user ID from it.
  2. Check if the given user already has an email.
  3. Check if someone else already associated this email with them.
  4. If all good, associate the user with an email by inserting the data in the database.

In a simplified case, let’s say that we already have the following types and functions:

data UserSession = ...
data UserId = ...
data Email = ...
data ID = ...
data InternalDbError = ...

validateUserSession :: UserSession -> IO (Maybe UserId)

getEmailByUserId :: UserId -> IO (Maybe Email)

getUserIdByEmail :: Email -> IO (Maybe UserId)

insertUserEmail :: UserId -> Email -> IO (Either InternalDbError ID)

And now we want to glue our API functions together into the following function and report all possible error cases as well:

data AppError
    = UserSessionIsInvalid
    | UserAlreadyHasEmail
    | UserHasDifferentEmail
    | EmailIsTaken
    | DbError InternalDbError

associateEmail :: UserSession -> Email -> IO (Either AppError ID)

Baseline

The simplest and straighforward implementation of the desired function would be to use all the API methods directly and pattern-match on their results as we go. This is demonstrated by the following code snippet:

associateEmail :: UserSession -> Email -> IO (Either AppError ID)
associateEmail userSession email =
    validateUserSession userSession >>= \case
        Nothing -> pure $ Left UserSessionIsInvalid

        Just userId -> getEmailByUserId userId >>= \case
            Just otherEmail
                | email == otherEmail -> pure $ Left UserAlreadyHasEmail
                | otherwise -> pure $ Left UserHasDifferentEmail

            Nothing -> getUserIdByEmail email >>= \case
                Just otherUserId -> pure $ Left EmailIsTaken
                Nothing -> insertUserEmail userId email >>= \case
                    Left dbErr -> pure $ Left $ DbError dbErr
                    Right id' -> pure $ Right id'

👩‍🔬 The above code snippet uses the LambdaCase Haskell feature.

This simple implementation doesn’t use any advanced features and is easy to read by Haskell beginners. However, it has several problems that we would like to solve:

  • The code is deeply nested. It slowly turns into a weird staircase as we add more steps.
  • It mixes business logic with pesky implementation details. The high-level description of this function has only four clear steps but it’s extremely difficult to extract them quickly from this particular implementation. The problem becomes more severe as you add more steps or start introducing more complicated logic (e.g. logging, monitoring, etc.).

Either-isation

A one possible approach to improving the situation would be to introduce a separate function for each individual step. Those functions would return Either AppError smth and will hide the implementation details. The code will look like this:

checkUserSession :: UserSession -> IO (Either AppError UserId)
checkUserSession userSession = validateUserSession userSession >>= \case
    Nothing -> pure $ Left UserSessionIsInvalid
    Just userId -> pure $ Right userId

checkUserEmail :: UserId -> Email -> IO (Either AppError ())
checkUserEmail userId email = getEmailByUserId userId >>= \case
    Just otherEmail
        | email == otherEmail -> pure $ Left UserAlreadyHasEmail
        | otherwise -> pure $ Left UserHasDifferentEmail
    Nothing -> pure $ Right ()

checkOtherUserEmail :: Email -> IO (Either AppError ())
checkOtherUserEmail email = getUserIdByEmail email >>= \case
    Just otherUserId -> pure $ Left EmailIsTaken
    Nothing -> pure $ Right ()

checkEmailInsert :: UserId -> Email -> IO (Either AppError ID)
checkEmailInsert userId email = insertUserEmail userId email >>= \case
    Left dbErr -> pure $ Left $ DbError dbErr
    Right id'  -> pure $ Right id'

associateEmail
    :: UserSession
    -> Email
    -> IO (Either AppError ID)
associateEmail userSession email =
    checkUserSession userSession >>= \case
        Left err -> pure $ Left err
        Right userId -> checkUserEmail userId email >>= \case
            Left err -> pure $ Left err
            Right () -> checkOtherUserEmail email >>= \case
                Left err -> pure $ Left err
                Right () -> checkEmailInsert userId email

👩‍🔬 Here we explicitly prefer to return Either AppError () instead of Maybe AppError or something else for consistency. This will also become handy as we explore other approaches.

This refactoring is simple and pretty straightforward but it doesn’t solved any of two problems we have above despite writing much more code. Though, the code of associateEmail became slightly more understandable due to uniform handling of errors and we’re now able to add more implementation details to each step without polluting the implementation of associateEmail.

ExceptT

The problem with out-of-control nested indentation annoyed thousands of Haskell developers even before you and me. In this particular case, where all your functions return IO (Either TheSameErrorType anything), you can use the ExceptT monad transformer.

I won’t go into much details about ExcepT. This blog post is not a monad transformer tutorial. But I’m going to mention this approach as it’s quite common and still pretty basic in terms of language features.

Once you performed Either-isation of your functions, you can wrap each function into the ExceptT type. This change is quite mechanical in our case:

-checkUserSession :: UserSession -> IO (Either AppError UserId)
+checkUserSession :: UserSession -> ExceptT AppError IO UserId
-checkUserSession userSession = validateUserSession userSession >>= \case
+checkUserSession userSession = ExceptT $ validateUserSession userSession >>= \case

After applying this refactoring to each function, we can update the implementation of associateEmail:

associateEmail
    :: UserSession
    -> Email
    -> IO (Either AppError ID)
associateEmail userSession email = runExceptT $ do
    userId <- checkUserSession userSession
    checkUserEmail userId email
    checkOtherUserEmail email
    checkEmailInsert userId email

Now we finally solved our both problems! However, at what cost 😢 We’ve introduced a rather complicated concept of monad transformers to the codebase which makes the code less beginner-friendly. On top of that, ExceptT brings different problems such as broken behaviour when used with concurrency.

CPS transformation

The ExceptT solution allows us to achieve the desired shape of the associateEmail function but this approach has several drawbacks. Let’s use a different approach — Continuation-Passing Style (CPS) transformation.

The term CPS itself is not modern. It exists since 1975. It’s usually mentioned a lot in compiler development or in relation to callback hell. However, we’re going to put a new perspective on this good old trick.

The main idea behind the CPS transformation is simple: instead of returning a value from a function, we take a callback (also called “continutation”) as an argument and run it on the result instead of returning the result.

For example, here is what we had before:

checkUserSession :: UserSession -> IO (Either AppError UserId)
checkUserSession userSession = validateUserSession userSession >>= \case
    Nothing -> pure $ Left UserSessionIsInvalid
    Just userId -> pure $ Right userId

And here is checkUserSession after applying the CPS transformation:

withUserSession
    :: UserSession
    -> (UserId -> IO (Either AppError a))
    -> IO (Either AppError a)
withUserSession userSession next = validateUserSession userSession >>= \case
    Nothing -> pure $ Left UserSessionIsInvalid
    Just userId -> next userId

In this transformation, I performed several things:

  1. Renamed the function from checkUserSession to withUserSession to convey the intention behind this function: it takes an action to be performed “with user session”.
  2. Added extra argument — continuation — a function of type UserId -> IO (Either AppError a). So withUserSession is a Higher-Order Function now. ⚠️ Important: the return type of our continuation is polymorphic (as well as the return type of our function) to make usage more flexible.
  3. Split type signature into multiple lines as it’s became longer here.
  4. Returned next userId instead of pure $ Right userId in the end.

Our transformed withUserSession function now either short-circuits with an error or runs a given continuation with userId in case of success. It takes a continuation and passes the result to it, hence the name — Continuation-Passing Style.

Now, we apply the CPS transformation to every function:

withCheckedUserEmail
    :: UserId
    -> Email
    -> IO (Either AppError a)
    -> IO (Either AppError a)
withCheckedUserEmail userId email next = getEmailByUserId userId >>= \case
    Just otherEmail
        | email == otherEmail -> pure $ Left UserAlreadyHasEmail
        | otherwise -> pure $ Left UserHasDifferentEmail
    Nothing -> next

withCheckedOtherUserEmail
    :: Email
    -> IO (Either AppError a)
    -> IO (Either AppError a)
withCheckedOtherUserEmail email next = getUserIdByEmail email >>= \case
    Just otherUserId -> pure $ Left EmailIsTaken
    Nothing -> next

withEmailInsert
    :: UserId
    -> Email
    -> (ID -> IO (Either AppError a))
    -> IO (Either AppError a)
withEmailInsert userId email next = insertUserEmail userId email >>= \case
    Left dbErr -> pure $ Left $ DbError dbErr
    Right id'  -> next id'

A few notes on the implementation:

  • When we don’t have a meaningful argument to pass to continuation (e.g. the function returns a value of the unit type ()), we can simply pass an action instead of a function.
  • withEmailInsert will be called last but we still apply the CPS transformation to it. This approach is better for composability in case we want to use this function in the middle of another function and not necessarily in the end.

After we’ve performed this refactoring, we can now use our CPS-transformed functions in associateEmail. We’re going to use the chain of nested continuations:

associateEmail
    :: UserSession
    -> Email
    -> IO (Either AppError ID)
associateEmail userSession email =
    withUserSession userSession $ \userId ->
        withCheckedUserEmail userId email $
            withCheckedOtherUserEmail email $
                withEmailInsert userId email $ \id' ->
                    pure $ Right id'

What will happen is that withUserSession will validate user session and call the \userId -> ... lambda if everything is good. The lambda starts with the call to withCheckedUserEmail. This function will then check if the user already has an email and call the next action (starting with withCheckedOtherUserEmail) after the validation passes. And so on.

Wait a minute! You can say that the implementation of associateEmail is still deeply nested and you’ll be absolutely right. We see the four steps clearly but we still have this annoying alignment requirement.

The solution to this problem is simply to not indent the code at all:

associateEmail
    :: UserSession
    -> Email
    -> IO (Either AppError ID)
associateEmail userSession email =
    withUserSession userSession $ \userId ->
    withCheckedUserEmail userId email $
    withCheckedOtherUserEmail email $
    withEmailInsert userId email $ \id' ->
    pure $ Right id'
Haskell indentation: The Neat part

This works because Haskell parsing rules around layouts are quite flexible in this case. Semantically, everything “to the right of” lambda belongs to this lambda. So we don’t need this extra indentation here. This doesn’t work for case-of expressions and do-notation blocks but surprisingly works for lambdas and arguments after $ outside do blocks.

You can go one step further in making this code look cleaner. I’m not a fan of the BlockArguments extension in Haskell but if you like it, you can write the final function with slightly less operator noise:

associateEmail
    :: UserSession
    -> Email
    -> IO (Either AppError ID)
associateEmail userSession email =
    withUserSession userSession \userId ->
    withCheckedUserEmail userId email $
    withCheckedOtherUserEmail email $
    withEmailInsert userId email \id' ->
    pure $ Right id'

You can check the entire code after applying the CPS transformation here:

Conclusion

In this blog post I’ve described how you can make code cleaner by using the fundamental Haskell feature Higher-Order Functions with combination of 50-years old technique. Sometimes, you don’t need fancy tools to improve your life when fundamentals are solid and already powerful enough.