Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# What Makes Haskell Unique, Part 2

### In Part 2 of this three-part series on Haskell, we look at several aspects of immutability and purity, as well as software transactional memory, in the language.

· Web Dev Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Access over 20 APIs and mobile SDKs, up to 250k transactions free with no credit card required

Welcome back! If yo missed Part 1, check it out here.

## Immutability and Purity

Most programming languages out there default to mutability: a variable or field in a data structure can be changed at any time. Haskell is different in two ways:

1. Values are immutable by default, and mutability must be explicitly indicated with a variable type.
2. Mutating a mutable variable is considered a side effect, and that mutable is tracked by the type system.

For example, the following Haskell-like code is impossible:

``````let mut total = 0
loop i =
if i > 1000000
then total
else total += i; loop (i + 1)
in loop 1``````

From pure code, we cannot create, read, or modify a mutable variable. We also need to say what kind of mutable variable we want:

``````total <- newIORef 0
let loop i =
if i > 1000000
else do
modifyIORef total (+ i)
loop (i + 1)
loop 1``````

This is a lot of ceremony for a simple algorithm. Of course, the recommended Haskell way of doing this would be to avoid mutable variables, and use a more natural functional style.

``````let loop i total =
if i > 1000000
then total
else loop (i + 1) (total + i)
in loop 1 0``````

Besides pushing us towards this supposedly better functional approach, why is immutable, pure code such a nice thing?

You'll often hear Haskellers throw around a phrase, "reasoning about code." Personally, I think the phrase is used to mean too many different things. But let me give you an example that I think is accurate. Let's look at some pseudocode:

``````// scores.txt
Alice,32
Bob,55
Charlie,22

func main() {
printScoreRange(results)
print("First result was by: " + results[0].name)
}

func printScoreRange(results: Vector<TestResult>) {
...
}``````

If you look at the code above, what do you expect the output to be? I think it would be reasonable to guess something like:

``````Lowest: 22
Highest: 55
First result was by: Alice``````

However, now let's throw in another piece of information: the definition of `printScoreRange`:

``````func printScoreRange(results: Vector<TestResult>) {
results.sortBy(|result| => result.score)
print("Lowest: " + results[0].score)
print("Highest: " + results[results.len() - 1].score)
}``````

Suddenly our assumptions change. We can see that this function mutates the `results` value passed to it. If we're passing mutable references to vectors in this made up language, then our output is going to look more like:

``````Lowest: 22
Highest: 55
First result was by: Charlie``````

Since the original `results` value in our `main` function has been modified. This is what I mean by hurting our ability to reason about the code: it's no longer sufficient to look at just the `main` function to understand what will be happening. Instead, we're required to understand what may possibly be occurring in the rest of our program to mutate our variables.

``````main :: IO ()
main = do
printScoreRange results
putStrLn \$ "First result was by: " ++ name (head results)

printScoreRange :: [TestResult] -> IO ()
printScoreRange results = do
let results' = sortBy score results
putStrLn \$ "Lowest: " ++ show (score (head results'))
putStrLn \$ "Highest: " ++ show (score (last results'))``````

We know that it's impossible for `printScoreRange` to modify the `results` value we have in the`main` function. Looking at only this bit of code in `main` is sufficient to know what will happen with the `results` value.

### Data Races

Even more powerful than the single threaded case is how immutability affects multithreaded applications. Ignoring the insanity of multiple threads trying to output to the console at the same time, we can easily parallelize our code:

``````main :: IO ()
main = do
concurrently_ printFirstResult printScoreRange

printFirstResult results =
putStrLn \$ "First result was by: " ++ name (head results)

printScoreRange results = do
let results' = sortBy score results
putStrLn \$ "Lowest: " ++ show (score (head results'))
putStrLn \$ "Highest: " ++ show (score (last results'))``````

There's no need to worry about concurrent access to data structures. It's impossible for the other threads to alter our data. If you do want other threads to affect your local data, you'll need to be more explicit about it, which we'll get back to.

### Mutability When Needed

One thing you may be worried about is how this affects performance. For example, it's much more efficient to sort a vector using mutable access instead of only pure operations. Haskell has two tricks for that. The first is the ability to explicitly create mutable data structures, and mutate them in place. This breaks all of the guarantees I already mentioned, but if you need the performance, it's available. And unlike mutable-by-default approaches, you now know exactly which pieces of data you need to handle with care when coding to avoid tripping yourself up.

The other approach is to create a mutable copy of the original data, perform your mutable algorithm on it, and then freeze the new copy into an immutable version. With sorting, this looks something like:

``````sortMutable :: MutableVector a -> ST (MutableVector a)
sortMutable = ... -- normal sorting algorithm

sortImmutable :: Vector a -> Vector a
sortImmutable orig = runST \$ do
mutable <- newMutableVector (length orig)
copyValues orig mutable
sort mutable
freeze mutable``````

`ST` is something we use to have temporary and local mutable effects. Because of how it's implemented, we know that none of the effects can be visible from outside of our function, and that, for the same input, the `sortImmutable` function will always have the same output. While this approach requires an extra memory buffer and an extra copy of the elements in the vector, it avoids completely the worries of your data being changed behind your back.

### Summary: Immutability and Purity

• Easier to reason about code.
• Avoid many cases of data races.
• Functions are more reliable, returning the same output for the same input.

• Lots of ceremony if you actually want mutation.
• Some runtime performance hit for mutable algorithms.

## Software Transactional Memory

Let's say you actually need to be able to mutate some values. And for fun, let's say you want to do this from multiple threads. A common example of this is a bank. Let's again play with some pseudocode:

``````runServer (|request| => {
from := accounts.lookup(request.from)
to := accounts.lookup(request.to)
accounts.set(request.from, from - request.amt)
accounts.set(request.to, to + request.amt)
})``````

This looks reasonable, except that if two requests come in at the same time for the same account, we can end up with a race condition. Consider something like this:

``````Thread 1: receive request: Alice gives \$25
Thread 1: lookup that Alice has \$50
Thread 2: lookup that Alice has \$50
Thread 1: set Alice's account to \$25
Thread 2: set Alice's account to \$75``````

We know that we want Alice to end up with \$50, but because of our data race, Alice ends up with \$75. Or, if the threads ran differently, it could be \$25. Neither of these is correct. In order to avoid this, we would typically deal with some kind of locking:

``````runServer (|request| => {
accounts.lock(request.from)
accounts.lock(request.to)
// same code as before
accounts.unlock(request.from)
accounts.unlock(request.to)
})``````

``````Thread 1: receive request: \$50 from Alice to Bob
Thread 1: try to lock Bob, but can't, so wait
Thread 2: try to lock Alice, but can't, so wait
...``````

This kind of problem is the bane of many concurrent programs. Let me show you another approach. As you may guess, here's some Haskell:

``````runServer \$ \request -> atomically \$ do
let fromVar = lookup (from request) accounts
toVar = lookup (to request) accounts
writeTVar fromVar (origFrom - amt request)
writeTVar toVar (origTo + amt request)``````

There are helper functions to make this shorter, but I wanted to do this the long way to prove a point. This looks like exactly the kind of race condition I described before. However, that`atomically` function is vital here. It ensures that only a complete transaction is ever committed. If any of the variables we touch are mutated by another thread before our transaction is complete, all of our changes are rolled back, and the transaction is retried. No need for explicit locking, and therefore many less worries about data races and deadlocks.

`TVar` is a "transactional variable." It's an alternative to the `IORef` that I mentioned earlier. There are other kinds of mutable variables in Haskell, including channels and `MVar`s which are like mutexes. This is what I meant when I said you need to be explicit about what kind of mutation you want in Haskell.

### Purity's Role

What do you think will happen with this program?:

``````atomically \$ do
buyBitcoins 3 -- side effects on my bank account

modifyTVar myBitcoinCount (+ 3)``````

Here, `buyBitcoins` is going off to some exchange and buying about \$100,000 in Bitcoin (or whatever ridiculous amount they're selling for now). I said before that if the variables are modified while running, the transaction will be retried. It seems like this function is very dangerous, as it may result in me going about \$10,000,000 into debt buying Bitcoins!

This is where purity steps in. Inside `atomically`, you are not allowed to perform any side effects outside of STM itself. That means you can modify `TVar`s, but you cannot read or write files, print to the console, fire the missiles, or place multi-million dollar currency purchases. This may feel like a limitation, but the tradeoff is that it's perfectly safe for the runtime system to retry your transactions as many times as it wants.

### Summary of STM

• Makes concurrent data modification much easier.
• Bypass many race conditions and deadlocks.

• Depends on purity to work at all.
• Not really any other disadvantages, so just use it!

That's all for Part 2. Tune in tomorrow when we'll cover laziness and a few miscellaneous topics.

#1 for location developers in quality, price and choice, switch to HERE.

Topics:
web dev ,haskell ,tutorial ,immutability ,software transactional memory

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.