Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

What Makes Haskell Unique, Part 2

DZone's Guide to

What Makes Haskell Unique, Part 2

In Part 2 of this three-part series on Haskell, we look at several aspects of immutability and purity, as well as software transactional memory, in the language.

· Web Dev Zone ·
Free Resource

Deploying code to production can be filled with uncertainty. Reduce the risks, and deploy earlier and more often. Download this free guide to learn more. Brought to you in partnership with Rollbar.

Welcome back! If yo missed Part 1, check it out here.

Immutability and Purity

Most programming languages out there default to mutability: a variable or field in a data structure can be changed at any time. Haskell is different in two ways:

  1. Values are immutable by default, and mutability must be explicitly indicated with a variable type.
  2. Mutating a mutable variable is considered a side effect, and that mutable is tracked by the type system.

For example, the following Haskell-like code is impossible:

let mut total = 0
    loop i =
      if i > 1000000
        then total
        else total += i; loop (i + 1)
 in loop 1

From pure code, we cannot create, read, or modify a mutable variable. We also need to say what kind of mutable variable we want:

total <- newIORef 0
let loop i =
      if i > 1000000
        then readIORef total
        else do
          modifyIORef total (+ i)
          loop (i + 1)
loop 1

This is a lot of ceremony for a simple algorithm. Of course, the recommended Haskell way of doing this would be to avoid mutable variables, and use a more natural functional style.

let loop i total =
      if i > 1000000
        then total
        else loop (i + 1) (total + i)
 in loop 1 0

Besides pushing us towards this supposedly better functional approach, why is immutable, pure code such a nice thing?

Reasoning About Code

You'll often hear Haskellers throw around a phrase, "reasoning about code." Personally, I think the phrase is used to mean too many different things. But let me give you an example that I think is accurate. Let's look at some pseudocode:

// scores.txt
Alice,32
Bob,55
Charlie,22

func main() {
  results := readResultsFromFile("results.txt")
  printScoreRange(results)
  print("First result was by: " + results[0].name)
}

func printScoreRange(results: Vector<TestResult>) {
  ...
}

If you look at the code above, what do you expect the output to be? I think it would be reasonable to guess something like:

Lowest: 22
Highest: 55
First result was by: Alice

However, now let's throw in another piece of information: the definition of printScoreRange:

func printScoreRange(results: Vector<TestResult>) {
  results.sortBy(|result| => result.score)
  print("Lowest: " + results[0].score)
  print("Highest: " + results[results.len() - 1].score)
}

Suddenly our assumptions change. We can see that this function mutates the results value passed to it. If we're passing mutable references to vectors in this made up language, then our output is going to look more like:

Lowest: 22
Highest: 55
First result was by: Charlie

Since the original results value in our main function has been modified. This is what I mean by hurting our ability to reason about the code: it's no longer sufficient to look at just the main function to understand what will be happening. Instead, we're required to understand what may possibly be occurring in the rest of our program to mutate our variables.

In Haskell, the code would instead look like:

main :: IO ()
main = do
  results <- readResultsFromFile "results.txt"
  printScoreRange results
  putStrLn $ "First result was by: " ++ name (head results)

printScoreRange :: [TestResult] -> IO ()
printScoreRange results = do
  let results' = sortBy score results
  putStrLn $ "Lowest: " ++ show (score (head results'))
  putStrLn $ "Highest: " ++ show (score (last results'))

We know that it's impossible for printScoreRange to modify the results value we have in themain function. Looking at only this bit of code in main is sufficient to know what will happen with the results value.

Data Races

Even more powerful than the single threaded case is how immutability affects multithreaded applications. Ignoring the insanity of multiple threads trying to output to the console at the same time, we can easily parallelize our code:

main :: IO ()
main = do
  results <- readResultsFromFile "results.txt"
  concurrently_ printFirstResult printScoreRange

printFirstResult results =
  putStrLn $ "First result was by: " ++ name (head results)

printScoreRange results = do
  let results' = sortBy score results
  putStrLn $ "Lowest: " ++ show (score (head results'))
  putStrLn $ "Highest: " ++ show (score (last results'))

There's no need to worry about concurrent access to data structures. It's impossible for the other threads to alter our data. If you do want other threads to affect your local data, you'll need to be more explicit about it, which we'll get back to.

Mutability When Needed

One thing you may be worried about is how this affects performance. For example, it's much more efficient to sort a vector using mutable access instead of only pure operations. Haskell has two tricks for that. The first is the ability to explicitly create mutable data structures, and mutate them in place. This breaks all of the guarantees I already mentioned, but if you need the performance, it's available. And unlike mutable-by-default approaches, you now know exactly which pieces of data you need to handle with care when coding to avoid tripping yourself up.

The other approach is to create a mutable copy of the original data, perform your mutable algorithm on it, and then freeze the new copy into an immutable version. With sorting, this looks something like:

sortMutable :: MutableVector a -> ST (MutableVector a)
sortMutable = ... -- normal sorting algorithm

sortImmutable :: Vector a -> Vector a
sortImmutable orig = runST $ do
  mutable <- newMutableVector (length orig)
  copyValues orig mutable
  sort mutable
  freeze mutable

ST is something we use to have temporary and local mutable effects. Because of how it's implemented, we know that none of the effects can be visible from outside of our function, and that, for the same input, the sortImmutable function will always have the same output. While this approach requires an extra memory buffer and an extra copy of the elements in the vector, it avoids completely the worries of your data being changed behind your back.

Summary: Immutability and Purity

Advantages

  • Easier to reason about code.
  • Avoid many cases of data races.
  • Functions are more reliable, returning the same output for the same input.

Disadvantages

  • Lots of ceremony if you actually want mutation.
  • Some runtime performance hit for mutable algorithms.

Software Transactional Memory

Let's say you actually need to be able to mutate some values. And for fun, let's say you want to do this from multiple threads. A common example of this is a bank. Let's again play with some pseudocode:

runServer (|request| => {
  from := accounts.lookup(request.from)
  to := accounts.lookup(request.to)
  accounts.set(request.from, from - request.amt)
  accounts.set(request.to, to + request.amt)
})

This looks reasonable, except that if two requests come in at the same time for the same account, we can end up with a race condition. Consider something like this:

Thread 1: receive request: Alice gives $25
Thread 2: receive request: Alice receives $25
Thread 1: lookup that Alice has $50
Thread 2: lookup that Alice has $50
Thread 1: set Alice's account to $25
Thread 2: set Alice's account to $75

We know that we want Alice to end up with $50, but because of our data race, Alice ends up with $75. Or, if the threads ran differently, it could be $25. Neither of these is correct. In order to avoid this, we would typically deal with some kind of locking:

runServer (|request| => {
  accounts.lock(request.from)
  accounts.lock(request.to)
  // same code as before
  accounts.unlock(request.from)
  accounts.unlock(request.to)
})

Unfortunately, this leads to deadlocks! Consider this scenario:

Thread 1: receive request: $50 from Alice to Bob
Thread 2: receive request: $50 from Bob to Alice
Thread 1: lock Alice
Thread 2: lock Bob
Thread 1: try to lock Bob, but can't, so wait
Thread 2: try to lock Alice, but can't, so wait
...

This kind of problem is the bane of many concurrent programs. Let me show you another approach. As you may guess, here's some Haskell:

runServer $ \request -> atomically $ do
  let fromVar = lookup (from request) accounts
      toVar = lookup (to request) accounts
  origFrom <- readTVar fromVar
  writeTVar fromVar (origFrom - amt request)
  origTo <- readTVar toVar
  writeTVar toVar (origTo + amt request)

There are helper functions to make this shorter, but I wanted to do this the long way to prove a point. This looks like exactly the kind of race condition I described before. However, thatatomically function is vital here. It ensures that only a complete transaction is ever committed. If any of the variables we touch are mutated by another thread before our transaction is complete, all of our changes are rolled back, and the transaction is retried. No need for explicit locking, and therefore many less worries about data races and deadlocks.

TVar is a "transactional variable." It's an alternative to the IORef that I mentioned earlier. There are other kinds of mutable variables in Haskell, including channels and MVars which are like mutexes. This is what I meant when I said you need to be explicit about what kind of mutation you want in Haskell.

Purity's Role

What do you think will happen with this program?:

atomically $ do
  buyBitcoins 3 -- side effects on my bank account

  modifyTVar myBitcoinCount (+ 3)

Here, buyBitcoins is going off to some exchange and buying about $100,000 in Bitcoin (or whatever ridiculous amount they're selling for now). I said before that if the variables are modified while running, the transaction will be retried. It seems like this function is very dangerous, as it may result in me going about $10,000,000 into debt buying Bitcoins!

This is where purity steps in. Inside atomically, you are not allowed to perform any side effects outside of STM itself. That means you can modify TVars, but you cannot read or write files, print to the console, fire the missiles, or place multi-million dollar currency purchases. This may feel like a limitation, but the tradeoff is that it's perfectly safe for the runtime system to retry your transactions as many times as it wants.

Summary of STM

Advantages

  • Makes concurrent data modification much easier.
  • Bypass many race conditions and deadlocks.

Disadvantages

  • Depends on purity to work at all.
  • Not really a disadvantage, you're already stuck with purity in Haskell.
  • Not really any other disadvantages, so just use it!

That's all for Part 2. Tune in tomorrow when we'll cover laziness and a few miscellaneous topics. 

Deploying code to production can be filled with uncertainty. Reduce the risks, and deploy earlier and more often. Download this free guide to learn more. Brought to you in partnership with Rollbar.

Topics:
web dev ,haskell ,tutorial ,immutability ,software transactional memory

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}