Haskell Refactoring

Posted 2018-04-03 by Rowan Bohde

When we write Haskell, we love to talk about the correctness of our code. The focus on using equational reasoning let's us reason about our code. This is amazing thing to have in our toolbox when we have a piece of code we want to analyze, but what about when we want to change some piece of code?

We often use the term refactoring, to loosely mean "changing a program", but it's original definition requires that the new code is functionally equivalent to the old code. If we were model this in Haskell, we'd get the following:

data Program = Program
 
run :: Program -> Input -> Output
 
type Refactoring = (Program -> Program)
 
-- A given Refactoring r must satisfy the following property
-- run program input ≡ run (r program) input

If we require our refactorings to have this property, we know that we did not accidently introduce a bug (or accidently fix a bug!) whenever we apply it. This means it's always safe to apply a refactoring, and we don't need to think about the correctness of it, only how it impacts the design.

Refactorings also compose:

a :: Refactoring
b :: Refactoring
 
c :: Refactoring
c program = b (a program)
-- equivalent to
-- c = b . a
 
-- We still satisy the property
-- run program input ≡ run (a program) input ≡ run (b (a program)) input

Because of this, we know we can apply any number of refactorings to our program, and we'll never change the correctness of it. Feasibly, if we could make huge changes to our program safely by defining a set of refactorings to get us what we wanted.

Types of Refactorings

Much of the refactorings defined has been done in languages like Java or Ruby, but they really aren't language specific. Almost all of the refactoring catalog apply to Haskell with some slight modifications (e.g. use function instead of methods and typeclasses instead of interfaces).

What makes Haskell really interesting from a refactoring standpoint, is that every equivalence we have is also a valid refactoring. For instance, in the documentation for id, there's the following line:

id x = x

This is the definition for id, but also an equivalence. I know that whenever I see id x I can replace it with x . I also know that for any arbitrary expression x, I can replace it with id x and it will work.

Various libraries will define these, and they are useful to have when refactoring.

An Example

Let's consider a small module that we want to refactor to use lens.

module User (
  User(..), hasValidEmail
) where
 
import qualified Email
 
data User { email :: Email.Email }
 
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . email

lens is documents a lot of the properties in the comments of the function. For example, view defines the equivalence view . to = id

Let's use this to refactor hasValidEmail to use lens:

-- id x = x
 
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . (id email)
 
-- view . to = id
 
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . ((view . to) email)
 
-- (f (g a)) = (f . g) a
 
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . (view (to email))
 
-- Extract Function
 
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . (view emailGetter)
  where
    emailGetter = to email
 
-- `view` only requires a `Getting`, so we can
-- use a `Lens` instead of a `Getter`
-- https://hackage.haskell.org/package/lens-4.16.1/docs/Control-Lens-Getter.html#t:Getting
 
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . (view emailGetter)
  where
    emailGetter = lens email (\u e -> u{email=e})
 
-- since `emailGetter` is no longer a `Getter`, rename it
-- Rename Function
 
hasValidEmail :: User -> Bool
hasValidEmail = Email.isValid . (view emailLens)
  where
    emailLens = lens email (\u e -> u{email=e})

That was a lot of work for a relatively small change! But, since each change was just applying a refactoring, we know that we never broke the code.

Combining Refactorings

Doing that every time is really tedious, but we don't need to work at the level every time. Once we have a higher level refactoring combining multiple steps, we can use that as a refactoring itself.

If we wanted to make our entire program use lens, here are the 5 steps we'd need to do:

1. Add lens for a record field email

emailLens = lens email (\r a -> r{email=a})

2. Change every use of the email field as a function to use the lens

email = view emailLens

3. Change every use of the email field to assign to use the lens

r { email = v } = set emailLens v r

4. At this point, there should be a single consumer of the record field: the lens. We can rename it to _email

emailLens = lens _email (\r a -> r{_email=a})

5. With the email identifier no longer in use, we can rename our lens.

email = lens _email (\r a -> r{_email=a})

With that, our code is as if we designed it as a lens from the beginning, and we have a high level process to do this in other places.

Types of Refactorings

An Example

Combining Refactorings

More Reading