This is a series of articles. All parts:
You’ll notice the following Functional Programming pattern many times:
- You define trivial fundamental blocks.
- You define trivial ways to compose blocks.
- Suddenly, you end up with something extremely powerful.
I don’t know how it works but it works every time. Trust the process.
Thus, it’s important to develop the skill of noticing a particular pattern in diverse situations. We need to pump that Tetris Effect of yours.
To develop an even further understanding of the Semigroup concept, we will explore more Semigroup examples and learn a composition of Semigroups to solve a real-world problem.
⚠️ CONTENT WARNING ⚠️ This section may contain some advanced OCaml code. I’ll do my best to explain the concepts in a beginner-friendly way though.
All code snippets can be found on GitHub:
Minimum and Maximum
We learned about two ways of appending numbers:
- Adding
- Multiplying
What if I told you, there are more ways? In fact, getting the minimum of two numbers is also a valid Semigroup that satisfies associativity!
In OCaml, this looks similar to what we had before:
module IntMin = struct
type t = int
let append x y = if x <= y then x else y
end
Verifying in utop
that things work:
5 3 ;;
utop # IntMin.append int = 3
- : 7 10 ;;
utop # IntMin.append int = 7 - :
Reasonably, if the minimum operation is a Semigroup, maximum should be a Semigroup too:
module IntMax = struct
type t = int
let append x y = if x >= y then x else y
end
Again, verifying in utop
that things work:
5 3 ;;
utop # IntMax.append int = 5
- : 7 10 ;;
utop # IntMax.append int = 10 - :
Congratulations! You learned two more examples of Semigroups!
Generalising Min and Max
This all sounds cute and nice but this series is called “Pragmatic Category Theory”. And nothing can be further from being pragmatic as implementing a gazillion min and max modules for every single type.
Indeed, let’s take this one step further here. Instead of saying
int
with the minimum operation is a Semigroup
We want to be able to say (in OCaml):
- Any type that supports comparison is a Semigroup with the minimum operation
Fortunately, the OCaml’s module system is powerful enough to express this idea in the code.
Other mainstream languages will use their specific features (generics and interfaces in Java, traits in Rust, typeclasses in Haskell, and in Python… just a vibe check on the value).
First, how can we say that values of a given type can be compared? We’re going to use a module again! Similar to the SEMIGROUP
module type signature, we’ll have the COMPARABLE
module type signature:
module type COMPARABLE = sig
type t
val compare : t -> t -> int
end
Here
compare
returnsint
following the popular convention. The result ofcompare
has the following meaning: -compare x y = 0
meansx = y
-compare x y < 0
meansx < y
-compare x y > 0
meansx > y
Now we need a way to express that the implementation of the minimum Semigroup depends on the implementation of COMPARABLE
. This is relatively straightforward to do in OCaml using module functors:
👩🔬 Fun fact! Functor is another popular and extremely useful concept from Category Theory. We’ll get to it later but module functors are called functors for a reason.
module Min(C: COMPARABLE) = struct (* 1 *)
type t = C.t (* 2 *)
let append x y = if C.compare x y <= 0 then x else y (* 3 *)
end
Let’s digest this piece of code. Here we say the following:
- Line 1: We define a new module
Min
. - Line 1: The
Min
module needs another moduleC
that implementsCOMPARABLE
. - Line 2: The type
t
inside theMin
module is the same as insideCOMPARABLE
. This makes sense because we take the minimum between the values of the type we can compare. - Line 3: We use the
compare
function from the moduleC
to compare values and return the smallest one.
Normally, now we would have to implement the COMPARABLE
module signature for all types where we want to get the minimum. Fortunately, due to how modules work, every existing module that defines type t
and val compare
with needed signatures will do!
And, surprisingly, all common modules implement the desired interface. Did we get lucky or was it intentional? 😉
Let’s see how the usage looks in utop
:
module IntMin = Min(Int) ;;
utop # 3 5 ;;
utop # IntMin.append int = 3
- :
module FloatMin = Min(Float) ;;
utop # 5.2 3.1 ;;
utop # FloatMin.append float = 3.1 - :
The main motivation for this section is that we want to build reusable abstractions that we can leverage in diverse scenarios. If we have to redefine lots of trivial stuff from scratch every time, this quickly gets tiresome. Fortunately, we can avoid this in many cases.
Besides, we can see how such abstractions play well with the rest of the language ecosystem.
First and Last
We learned about four ways of appending numbers:
- Adding
- Multiplying
- Minimum
- Maximum
What if I told you, there are more ways?? In fact, getting the first of two numbers is also a valid Semigroup that satisfies associativity!
In OCaml, this looks similar to what we had before:
module IntFirst = struct
type t = int
let append x _ = x
end
Verifying in utop
that things work:
3 5 ;;
utop # IntFirst.append int = 3
- : 5 3 ;;
utop # IntFirst.append int = 5 - :
Similarly, getting the last element of two is also a valid Semigroup! Hmm, it’s almost like every example has a twin…
module IntLast = struct
type t = int
let append _ y = y
end
Again, verifying in utop
that things work:
3 5 ;;
utop # IntLast.append int = 5
- : 5 3 ;;
utop # IntLast.append int = 3 - :
An observant eye may notice, that there’s nothing special about int
that allows one to take the first element. There’s nothing special needed at all! You can take the first of any value, you don’t need anything!
The definitions of IntFirst
and IntLast
can be generalised to work with every type similarly to minimum and maximum. I won’t go into the details here but you can expand the following section to read the code:
Generalised First and Last
module First(T : sig type t end) = struct
type t = T.t
let append x _ = x
end
module Last(T : sig type t end) = struct
type t = T.t
let append _ y = y
end
Taking a step back
We learned about six ways of appending numbers:
- Adding
- Multiplying
- Minimum
- Maximum
- First
- Last
What if I told you, there are more ways???
But let’s take a step back. You might start thinking, “Aren’t we stretching the definition of append too much?”
Indeed, I can follow this train of thought:
- Concatenating two strings is kinda like append, ok.
- Adding two numbbers — fine, append too.
- Minimum and Maximum — well, we’re not really appending, we’re choosing.
- Last and First — we can’t be further from appending! We’re, in fact, discarding values!
You can see that English doesn’t precisely describe the concept. Math is strict. It doesn’t try to be fancy. It just says that a Semigroup has a binary associative operation. It doesn’t name it but we need names to communicate ideas with each other.
If we think about types as sets (e.g. bool
is a set that has only two elements: true
and false
; int
is a set that has 2^64 elements, etc.), we can view Semigroup is a function that picks two elements from a set and returns another element. This element can be one of the given two or completely different.
This concept visualised:
This is not the true Category Theory definition of a Semigroup but it’s equivalent and helps build the intuition.
You can see that when presented like this, the binary associative operation in Semigroup is not always the strict append in the English meaning. For this reason, such an operation doesn’t usually have a name in OCaml or other FP languages.
OCaml can define custom operators. So, another common definition of SEMIGROUP
is the following:
module type SEMIGROUP = sig
type t
val (<+>) : t -> t -> t
end
But let’s stick with append
for now. Abusing operators can get out of hand pretty quickly.
List
Let’s look at another example of a Semigroup. We know that we can concatenate strings and this is a valid Semigroup. But why stop here? We can also concatenate lists, arrays, vectors, sequences, trees, and so on.
So List is also a valid Semigroup with append
being list concatenation (the @
operator in OCaml).
If we fix the list element type to something like int
, we can define the list Semigroup trivially:
module IntList = struct
type t = int list
let append = ( @ )
end
To go one step further, and define a single List
Semigroup for all lists, we need to parametrise our List
module with a module that has just type and this type will be our list element:
module List(T : sig type t end) = struct
type t = T.t list
let append = ( @ )
end
I won’t go into the details of what’s going on but because we can’t say type t = 'a list
, we need to bring the type of a list element externally, so we’ll depend on an anonymous module that has only type t
inside.
The implementation looks scary but the usage in utop
is no different from our Min
module from before:
module IntList = List(Int) ;;
utop # 3; 1; 2] [4; 5] ;;
utop # IntList.append [int list = [3; 1; 2; 4; 5] - :
Pair
You’ll be laughing to learn that all of the content before was just a preamble to this section. The title of this part is “Composing Semigroups”, so let’s finally learn how to compose Semigroups.
So, okay, we can concatenate two strings. We also can append two numbers. What if I want to concatenate two strings AND append two numbers AT THE SAME TIME?
If I can append things, it’s natural to desire to append multiple different things simultaneously.
In other words, if I have a pair of things, I want to append two pairs, where the first elements of a pair are appended, and the second elements of the pairs are appended correspondingly.
So, to rephrase: if a type a
is a Semigroup and type b
is a Semigroup then a pair of types a * b
is naturally a Semigroup.
In OCaml, this can be implemented straightforwardly using module functors again.
module Pair(S1: SEMIGROUP) (S2: SEMIGROUP) = struct (* 1 *)
type t = S1.t * S2.t (* 2 *)
(* 3 *)
let append (a1, b1) (a2, b2) = (* 4 *)
(* 5 *)
(S1.append a1 a2, S2.append b1 b2) end
What it says:
- Line 1: We define a Semigroup called
Pair
. - Line 1: It depends on two other Semigroups called
S1
andS2
respectively. - Line 2: Our type is a pair of types
S1.t
andS2.t
. So we just create a pair of two given types. - Line 4: Our
append
takes two pairs, so we pattern match on them immediately. - Line 5: When we append two pairs, we append the first elements using the
append
operation fromS1
and the second elements usingappend
fromS2
.
⚠️ SPOILER ALERT: Did I mention that two components of a pair are appended independently, meaning that they can be appended in parallel for a performance increase? 🤫
This may look scary but I hope that the usage example in utop
clarifies things:
module PairStringInt = Pair(String)(IntAdd) ;;
utop # "foo", 3) ("bar", 5) ;;
utop # PairStringInt.append (string * int = ("foobar", 8) - :
Another cute usage is finding the minimum and maximum among multiple numbers simultaneously.
We can easily find the minimum and maximum among three numbers using a composition of the Min
and Max
semigroups:
module MinMax = Pair(Min(Int))(Max(Int)) ;;
utop # 3, 3) (MinMax.append (7, 7) (5, 5)) ;;
utop # MinMax.append (int * int = (3, 7) - :
This may look intimidating, so let’s debug this code using the famous FP technique called equational reasoning. We will just apply functions step-by-step to arrive at our result.
3, 3) (MinMax.append (7, 7) (5, 5))
MinMax.append (3, 3) (5, 7)
= MinMax.append (3, 7) = (
The min operation knows nothing about max, and max knows nothing about min. They work on different parts of a pair independently. And by composing them, we’re able to calculate both operations at the same time.
Generalising this usage from three pairs to a list, we can find the minimum, maximum, sum, product, first, and last element of the list in ONE SINGLE TRAVERSAL! It becomes just a matter of composing the needed Semigroups, and we’ll look into this example in detail in one of the future parts.
This is getting “chunky”
Now, let’s look at a promised real-world example.
I’m developing a GitHub TUI in OCaml. The TUI rendering might get complex since at the end of the day, everything needs to be printed as lines to the terminal. But a single line might contain different parts formatted differently (some are bold, some are not; some are coloured, some are not);
So I defined a chunk
type to represent a part of the string with formatting:
(* --- chunk.mli --- *)
type t = {
styles : Style.t;string : string;
}
And so my line of text is a list of chunks. But here’s the catch.
During rendering, I need to know the length of strings, so I can do padding and alignment properly. Traversing the entire list of chunks to calculate its length every time is quite expensive. So I’m just storing the length of the line alongside the list of chunks. And when I’m appending two lines, I’m appending their lengths respectively.
In the code, it looks like this:
(* --- line.mli --- *)
type t
(** Append two lines into a single line. *)
val append : t -> t -> t
(* --- line.ml --- *)
type t = {
list;
chunks : Chunk.t int;
length :
}
let append line1 line2 =
let chunks = line1.chunks @ line2.chunks in
let length = line1.length + line2.length in
{ chunks; length }
Essentially, I created a Pair Semigroup by composing the Int Add Semigroup and List Semigroup.
This series is called “Pragmatic Category Theory”, and it’s more pragmatic to create a custom record with the custom append
operation rather than using the Pair
module functors machinery directly. So an important lesson here:
- Ideas > Implementations
If you have enough programming experience, you can come up with this simple line
type on your own without knowing the concept of Semigroup. After all, then something is really good, you naturally tend to use it more, even if you’re not aware of all the underlying concepts.
However, in this rendering example, the associativity property of a Semigroup becomes extremely crucial. And since this part is reaching its limit, we’ll look into associativity closely in the next part.
Conclusion
I hope this section was still interesting! Maybe some things look not so useful, or too trivial while others look too complex.
Semigroup is a deep concept, so I hope that a smooth introduction will help to demystify it even if unloading the entire context takes a while.
In the next section, we’ll finally learn why associativity matters. And there’ll be many more pragmatic examples. Be patient!
Acknowledgement
Many thanks to people who proofread the early draft of this article and shared their invaluable feedback: _____C sverien
If you liked this blog post, consider supporting my work on GitHub Sponsors, or following me on the Internet: