Intro
With great power comes great responsibility!
Record data types are vital for developing libraries and applications. However,
there is a popular opinion that records in Haskell are not well-designed. The
Haskell ecosystem has multiple approaches to deal with records pitfalls: a bunch
of language extensions, multiple lens
libraries, best-practices and naming
conventions. But there is still no consensus on the best way to use records.
RecordWildCards is one of the language extensions that improve the situation with records. However, it’s one of the most controversial extensions at the same time. Some people suggest avoiding this extension no matter what. Some prefer to use it everywhere. In this blog post, I’m going to review this extension under any possible angle and tell you when to use and when not to use it.
What is RecordWildCards?
Let’s start with talking about how records are implemented in Haskell. When you define the following data type:
data User = User
name :: Text
{ age :: Int
, }
In Haskell it’s actually syntax sugar for the following code:
data User = User Text Int
name :: User -> Text
User n _) = n
name (
age :: User -> Int
User _ a) = a age (
NOTE: in addition to generated functions each record also allows you to use record update syntax.
As you can see, getter functions are generated with the same names and types as the corresponding fields. And you can operate with them as ordinary functions when you write code:
canBuyVodka :: User -> Bool
= age user >= 18 canBuyVodka user
Deconstruction
The first feature that RecordWildCards
allows you to do is to pattern-match on
the constructor in a special way by bringing all its fields into scope not as
functions but as values instead. So, using this extension we can rewrite code
above in the following way:
canBuyVodka :: User -> Bool
User{..} = age >= 18 canBuyVodka
In the snippet above age
would be the value taken from User
and it has
type Int
. It’s hard to see benefits in this small example. However, when you
have a lot of fields and use them multiple times inside a single function, this
extension becomes really handy.
Construction
The second feature of RecordWildCards
is the ability to construct values of
the record type from identifiers in scope. Like this:
readUser :: IO User
= do
readUser <- getLine
name <- readLn
age pure User{..}
Values name
and age
are used as corresponding fields of the User
constructor. This helps to avoid code duplication and eliminates the need to
come up with different variable names.
In the following sections, I’m going to highlight common concerns about this extension and recommend best-practices.
Implicit scope
One of the reasons why some people don’t like RecordWildCards
is because it’s
not clear where the identifiers come from. Consider the following code:
nameOnCard :: User -> Job -> Text
User{..} Job{..} = name <> " | " <> title nameOnCard
The problem with this code is that it’s not obvious from what data types these
fields come from: is name
a field of User
or Job
? Hard to tell without
looking at the definitions of the corresponding types. This makes code hard to
read and maintain.
One of the possible solutions some people recommend is to use the NamedFieldPuns extension. When this extension enabled, you can write the following code instead:
nameOnCard :: User -> Job -> Text
User{name} Job{title} = name <> " | " <> title nameOnCard
NamedFieldPuns
is similar to RecordWildCards
but it forces you to specify
explicitly what fields you are using. In this particular case, the extension
solves the problem of figuring out where the variables come from, however, it
has its own drawbacks:
- When your records have a lot of fields and you use most of them, usage of this extension increases the size of your code significantly.
- It introduces code duplication. You write field names twice: on the pattern-matching side and on the call side.
Let’s see how all these problems can be solved with RecordWildCards
. Because
record fields are top-level functions and because there is no function
overloading in Haskell, you can’t have two data types with the same field names
in scope (though see the section about
DuplicateRecordFields). One of the popular solutions
to this difficulty is to prefix field names with the data type name or its
abbreviation if the data type name is too long. Turns out that this approach
also solves the above problem with RecordWildCards
. This naming convention is
so common that JSON and lens
libraries provide options to strip prefixes
automatically. If we define our data type like this:
data User = User
userName :: Text
{ userAge :: Int
, }
Then the function from our example becomes more readable!
nameOnCard :: User -> Job -> Text
User{..} Job{..} = userName <> " | " <> jobTitle nameOnCard
Conclusion: prefix field names with the type name to solve two problems at the same time.
Strict construction
If you construct values using RecordWildCards
, you might forget to specify all
fields like in the code below:
defaultUser :: User
=
defaultUser let userName = "Ivan"
in User{..}
When GHC sees similar code, it outputs a warning that not all fields are initialised. But it’s very easy to miss this warning and get a runtime error later. The answer to this problem is to mark every field of your data type with the strict annotation:
data User = User
userName :: !Text
{ userAge :: !Int
, }
NOTE: you can make all your types strict by default by enabling the StrictData language extension.
If you add !
in front of each type, then all fields will become strict and you
will see a compiler error instead of a warning when you forget to initialise some
fields. Adding bangs is also considered one of the best-practices to avoid space
leaks. It’s very rare wanting to have lazy fields of records.
NOTE: you can add
{-# OPTIONS_GHC -Werror=missing-fields #-}
to get a compile time error on unitialised lazy fields.
Conclusion: mark fields as strict to have more compile time checks and to avoid potential performance problems.
Compileless
Another popular concern about RecordWildCards
is that you lose compile time
checks during pattern-matching when you add more fields. For example, we want to
implement a ToJSON
instance from the aeson library for our User
data type:
instance ToJSON User where
User{..} = ["name" .= userName, "age" .= userAge] toJSON
Now, if we add one more field to the User
type, GHC wouldn’t warn us that we
need to update this instance. If we want to see a compile time error we need to
write this instance in a different way:
instance ToJSON User where
User name age) = ["name" .= name, "age" .= age] toJSON (
But let’s look at this problem closer. This is the case where we want to use
each field of the constructor. However, not all functions are like that. In
our nameOnCard
function from the previous paragraph, we don’t want to use all
fields, we’re interested only in a subset of them. And we don’t want to update
that function when we change definitions of the User
or Job
types. However,
in the ToJSON
instance, we want to use all fields. So, the problem is not
actually in RecordWildCards
. We need to know where to apply this extension,
though even here you can use RecordWildCards
to make your life easier and here
is why:
- If you also define a
FromJSON
instance, you should implement roundtrip property-based tests to make sure that yourFromJSON
andToJSON
satisfy this property. It’s not possible to skip aFromJSON
instance update because you will see a compile time error if you don’t initialise all fields of the type. Thus, if you forget to updateToJSON
instance, you will observe a test failure. - If your
FromJSON/ToJSON
instances are trivial, you can use generics or TemplateHaskell to derive these instances automatically. - If your
ToJSON
instance is a part of your exposed API then you probably should care about not changing it accidentally. And for this, you need to provide golden tests.
Forgetting to add a field is not the scariest problem actually. A scarier problem
is that you can change the type of some field, your roundtrip tests are still
passing, but consumers of your JSON API will observe errors. So
RecordWildCards
is not the most dangerous thing you should worry about here.
You must avoid RecordWildCards
only when you really need compile time
guarantees to use all fields of the type and when tests are not good. For
example, when implementing binary serialisation. If you convert your data type
to a sequence of 0s and 1s then failed test output won’t help you much to find
where is the problem.
Conclusion: not using RecordWildCards
doesn’t help you to avoid all your
problems, so implement tests to prevent your code from spontaneous breakages.
ApplicativeDo
We talked about concerns with RecordWildCards
but let’s talk about its
advantages. Turns out that RecorldWildCards
plays nicely with another language
extension —
ApplicativeDo.
Let’s say we want to build CLI for a tool that allows to query some data and
filter it by from
and to
entries. Terminal command for this tool may look
like this:
my-tool query --from 3 --to 42
We can use optparse-applicative library to implement a parser for these options easily. Let’s start with creating our data type for the options:
data Options = Options
optionsFrom :: !Int
{ optionsTo :: !Int
, }
optparse-applicative
is built around Applicative
functors. So in order to
implement a parser for the Options
data type you need to write code like this:
toP :: Parser Int
fromP,...
optionsP :: Parser Options
= Options
optionsP <$> fromP
<*> toP
One problem with writing code in this style is that it’s very easy to use the wrong
order of fromP
and toP
parsers when defining a parser for Options
and this
can lead to bugs. In a CLI you can write either --from 3 --to 42
or --to 42 --from 3
and both work correctly. But in code Options <$> fromP <*> toP
is
not the same as Options <$> toP <*> fromP
. This semantic difference between
real-world and expectations from code can lead to unexpected bugs.
This is true in general for such applicative-style code but it’s more important with regards to a CLI. Because it’s not that easy to test a CLI and to my knowledge, not many people really write automatic tests for their CLIs. So in this area of our code, we want to be more careful not to introduce extra bugs.
One of the solutions to the described problem is to introduce newtype
s. But it
might be too tedious to deal with lots of newtype
s. Fortunately, we can use
RecordWildCards
and the ApplicativeDo
extension to solve this problem easier!
optionsP :: Parser Options
= do
optionsP <- fromP
optionsFrom <- toP
optionsTo pure Options{..}
Now, even if you change the order of optionsFrom
and optionsTo
variables, the
code still works.
Conclusion: RecordWildCards
combined with ApplicativeDo
allows you to
write type-safe and maintainable code.
DuplicateRecordFields
Due to the records implementation details, it’s not possible to have data types with the same field names in scope in standard Haskell code (as per Haskell2010). However, if you enable the DuplicateRecordFields extension, it becomes possible. You can leverage this extension to convert between data types easily:
data Man = Man { name :: !Text }
data Cat = Cat { name :: !Text }
evilMagic :: Man -> Cat
Man{..} = Cat{..} evilMagic
However, such automatic conversion works only if fields of different types have
the exact same names. So, if data types have different prefixes, you need to
write a mapping between fields explicitly. But if you decide not to add prefixes
for the field names, some pieces of your code that do something else besides
mere conversion between data types, can become less readable if you use
RecordWildCards
in them.
Conclusion: if you convert between data types more often than you use them,
you can leverage the combination of RecordWildCards
and
DuplicateRecordFields
extensions.
Summary
RecordWildCards
is a very useful and convenient extension. It can be used in
the wrong way. However, if you follow best-practices, this extension can become
your best friend in writing elegant and maintainable code.