Intro
With great power comes great responsibility!
Record data types are vital for developing libraries and
applications. However, there is a popular opinion that records in
Haskell are not well-designed. The Haskell ecosystem has multiple
approaches to deal with records pitfalls: a bunch of language
extensions, multiple lens
libraries, best-practices and
naming conventions. But there is still no consensus on the best way to
use records.
RecordWildCards is one of the language extensions that improve the situation with records. However, it’s one of the most controversial extensions at the same time. Some people suggest avoiding this extension no matter what. Some prefer to use it everywhere. In this blog post, I’m going to review this extension under any possible angle and tell you when to use and when not to use it.
What is RecordWildCards?
Let’s start with talking about how records are implemented in Haskell. When you define the following data type:
data User = User
name :: Text
{ age :: Int
, }
In Haskell it’s actually syntax sugar for the following code:
data User = User Text Int
name :: User -> Text
User n _) = n
name (
age :: User -> Int
User _ a) = a age (
NOTE: in addition to generated functions each record also allows you to use record update syntax.
As you can see, getter functions are generated with the same names and types as the corresponding fields. And you can operate with them as ordinary functions when you write code:
canBuyVodka :: User -> Bool
= age user >= 18 canBuyVodka user
Deconstruction
The first feature that RecordWildCards
allows you to do
is to pattern-match on the constructor in a special way by bringing all
its fields into scope not as functions but as values instead. So, using
this extension we can rewrite code above in the following way:
canBuyVodka :: User -> Bool
User{..} = age >= 18 canBuyVodka
In the snippet above age
would be the value taken from
User
and it has type Int
. It’s hard to see
benefits in this small example. However, when you have a lot of fields
and use them multiple times inside a single function, this extension
becomes really handy.
Construction
The second feature of RecordWildCards
is the ability to
construct values of the record type from identifiers in scope. Like
this:
readUser :: IO User
= do
readUser <- getLine
name <- readLn
age pure User{..}
Values name
and age
are used as
corresponding fields of the User
constructor. This helps to
avoid code duplication and eliminates the need to come up with different
variable names.
In the following sections, I’m going to highlight common concerns about this extension and recommend best-practices.
Implicit scope
One of the reasons why some people don’t like
RecordWildCards
is because it’s not clear where the
identifiers come from. Consider the following code:
nameOnCard :: User -> Job -> Text
User{..} Job{..} = name <> " | " <> title nameOnCard
The problem with this code is that it’s not obvious from what data
types these fields come from: is name
a field of
User
or Job
? Hard to tell without looking at
the definitions of the corresponding types. This makes code hard to read
and maintain.
One of the possible solutions some people recommend is to use the NamedFieldPuns extension. When this extension enabled, you can write the following code instead:
nameOnCard :: User -> Job -> Text
User{name} Job{title} = name <> " | " <> title nameOnCard
NamedFieldPuns
is similar to
RecordWildCards
but it forces you to specify explicitly
what fields you are using. In this particular case, the extension solves
the problem of figuring out where the variables come from, however, it
has its own drawbacks:
- When your records have a lot of fields and you use most of them, usage of this extension increases the size of your code significantly.
- It introduces code duplication. You write field names twice: on the pattern-matching side and on the call side.
Let’s see how all these problems can be solved with
RecordWildCards
. Because record fields are top-level
functions and because there is no function overloading in Haskell, you
can’t have two data types with the same field names in scope (though see
the section about DuplicateRecordFields). One of the
popular solutions to this difficulty is to prefix field names with the
data type name or its abbreviation if the data type name is too long.
Turns out that this approach also solves the above problem with
RecordWildCards
. This naming convention is so common that
JSON and lens
libraries provide options to strip prefixes
automatically. If we define our data type like this:
data User = User
userName :: Text
{ userAge :: Int
, }
Then the function from our example becomes more readable!
nameOnCard :: User -> Job -> Text
User{..} Job{..} = userName <> " | " <> jobTitle nameOnCard
Conclusion: prefix field names with the type name to solve two problems at the same time.
Strict construction
If you construct values using RecordWildCards
, you might
forget to specify all fields like in the code below:
defaultUser :: User
=
defaultUser let userName = "Ivan"
in User{..}
When GHC sees similar code, it outputs a warning that not all fields are initialised. But it’s very easy to miss this warning and get a runtime error later. The answer to this problem is to mark every field of your data type with the strict annotation:
data User = User
userName :: !Text
{ userAge :: !Int
, }
NOTE: you can make all your types strict by default by enabling the StrictData language extension.
If you add !
in front of each type, then all fields will
become strict and you will see a compiler error instead of a warning
when you forget to initialise some fields. Adding bangs is also
considered one of the best-practices to avoid space leaks. It’s very
rare wanting to have lazy fields of records.
NOTE: you can add
{-# OPTIONS_GHC -Werror=missing-fields #-}
to get a compile time error on unitialised lazy fields.
Conclusion: mark fields as strict to have more compile time checks and to avoid potential performance problems.
Compileless
Another popular concern about RecordWildCards
is that
you lose compile time checks during pattern-matching when you add more
fields. For example, we want to implement a ToJSON
instance
from the aeson library for our User
data type:
instance ToJSON User where
User{..} = ["name" .= userName, "age" .= userAge] toJSON
Now, if we add one more field to the User
type, GHC
wouldn’t warn us that we need to update this instance. If we want to see
a compile time error we need to write this instance in a different
way:
instance ToJSON User where
User name age) = ["name" .= name, "age" .= age] toJSON (
But let’s look at this problem closer. This is the case where we want
to use each field of the constructor. However, not all
functions are like that. In our nameOnCard
function from
the previous paragraph, we don’t want to use all fields, we’re
interested only in a subset of them. And we don’t want to update that
function when we change definitions of the User
or
Job
types. However, in the ToJSON
instance, we
want to use all fields. So, the problem is not actually in
RecordWildCards
. We need to know where to apply this
extension, though even here you can use RecordWildCards
to
make your life easier and here is why:
- If you also define a
FromJSON
instance, you should implement roundtrip property-based tests to make sure that yourFromJSON
andToJSON
satisfy this property. It’s not possible to skip aFromJSON
instance update because you will see a compile time error if you don’t initialise all fields of the type. Thus, if you forget to updateToJSON
instance, you will observe a test failure. - If your
FromJSON/ToJSON
instances are trivial, you can use generics or TemplateHaskell to derive these instances automatically. - If your
ToJSON
instance is a part of your exposed API then you probably should care about not changing it accidentally. And for this, you need to provide golden tests.
Forgetting to add a field is not the scariest problem actually. A
scarier problem is that you can change the type of some field, your
roundtrip tests are still passing, but consumers of your JSON API will
observe errors. So RecordWildCards
is not the most
dangerous thing you should worry about here.
You must avoid RecordWildCards
only when you
really need compile time guarantees to use all fields of the type and
when tests are not good. For example, when implementing binary
serialisation. If you convert your data type to a sequence of 0s and 1s
then failed test output won’t help you much to find where is the
problem.
Conclusion: not using RecordWildCards
doesn’t help you to avoid all your problems, so implement tests
to prevent your code from spontaneous breakages.
ApplicativeDo
We talked about concerns with RecordWildCards
but let’s
talk about its advantages. Turns out that RecorldWildCards
plays nicely with another language extension — ApplicativeDo.
Let’s say we want to build CLI for a tool that allows to query some
data and filter it by from
and to
entries.
Terminal command for this tool may look like this:
my-tool query --from 3 --to 42
We can use optparse-applicative library to implement a parser for these options easily. Let’s start with creating our data type for the options:
data Options = Options
optionsFrom :: !Int
{ optionsTo :: !Int
, }
optparse-applicative
is built around
Applicative
functors. So in order to implement a parser for
the Options
data type you need to write code like this:
toP :: Parser Int
fromP,...
optionsP :: Parser Options
= Options
optionsP <$> fromP
<*> toP
One problem with writing code in this style is that it’s very easy to
use the wrong order of fromP
and toP
parsers
when defining a parser for Options
and this can lead to
bugs. In a CLI you can write either --from 3 --to 42
or
--to 42 --from 3
and both work correctly. But in code
Options <$> fromP <*> toP
is not the same as
Options <$> toP <*> fromP
. This semantic
difference between real-world and expectations from code can lead to
unexpected bugs.
This is true in general for such applicative-style code but it’s more important with regards to a CLI. Because it’s not that easy to test a CLI and to my knowledge, not many people really write automatic tests for their CLIs. So in this area of our code, we want to be more careful not to introduce extra bugs.
One of the solutions to the described problem is to introduce
newtype
s. But it might be too tedious to deal with lots of
newtype
s. Fortunately, we can use
RecordWildCards
and the ApplicativeDo
extension to solve this problem easier!
optionsP :: Parser Options
= do
optionsP <- fromP
optionsFrom <- toP
optionsTo pure Options{..}
Now, even if you change the order of optionsFrom
and
optionsTo
variables, the code still works.
Conclusion: RecordWildCards
combined
with ApplicativeDo
allows you to write type-safe and
maintainable code.
DuplicateRecordFields
Due to the records implementation details, it’s not possible to have data types with the same field names in scope in standard Haskell code (as per Haskell2010). However, if you enable the DuplicateRecordFields extension, it becomes possible. You can leverage this extension to convert between data types easily:
data Man = Man { name :: !Text }
data Cat = Cat { name :: !Text }
evilMagic :: Man -> Cat
Man{..} = Cat{..} evilMagic
However, such automatic conversion works only if fields of different
types have the exact same names. So, if data types have
different prefixes, you need to write a mapping between fields
explicitly. But if you decide not to add prefixes for the field names,
some pieces of your code that do something else besides mere conversion
between data types, can become less readable if you use
RecordWildCards
in them.
Conclusion: if you convert between data types more
often than you use them, you can leverage the combination of
RecordWildCards
and DuplicateRecordFields
extensions.
Summary
RecordWildCards
is a very useful and convenient
extension. It can be used in the wrong way. However, if you follow
best-practices, this extension can become your best friend in writing
elegant and maintainable code.
If you liked this blog post, consider supporting my work on GitHub Sponsors, or following me on the Internet: