Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

What Makes Haskell Unique, Part 3

DZone's Guide to

What Makes Haskell Unique, Part 3

In the final part of this introduction to Haskell, a developer explores laziness in code (and all that entails) and why Haskell works great for laziness.

· Web Dev Zone ·
Free Resource

Deploying code to production can be filled with uncertainty. Reduce the risks, and deploy earlier and more often. Download this free guide to learn more. Brought to you in partnership with Rollbar.

Welcome back! If you missed the first two article, you can fine Part 1 here and Part 2 here

Laziness

It's a little cheeky of me to get this far into a talk about unique features of Haskell and ignore one of its most notable features: laziness. Laziness is much more of a double-edged sword than the other features I've talked about, and let me prove that by revisiting one of our previous examples.

let loop i total =
      if i > 1000000
        then total
        else loop (i + 1) (total + i)
 in loop 1 0

I didn't describe it before, but this function will sum up the numbers from 1 to 1,000,000. There are two problems with this function:

  1. There's a major performance bug in it.
  2. It's much more cumbersome than it should be.

Space Leaks

The bane of laziness is space leaks, something you've probably heard about if you've read at all about Haskell. To understand this, let's look at how laziness is implemented. When you say something like:

let foo = 1 + 2

foo doesn't actually contain 3 right now. Instead, it contains an instruction to apply the operator+ to the values 1 and 2. This kind of instruction is called a thunk. And as you might guess, storing the thunk is a lot more expensive than storing a simple integer. We'll see why this helps in a bit, but for now we just care about why it sucks. Let's look at what happens in our loop function:

let loop i total =
      if i > 1000000
        then total
        else loop (i + 1) (total + i)
 in loop 1 0

Each time we step through the loop, we have to compare i to the number 1,000,000. Therefore, we are forced to evaluate it, which means turning it into a simple integer. But we never look at the value of total. Instead of storing a simple integer, which would be cheap, we end up building a huge tree that looks like "add 1 to the result of add 2 to the result of ... to 1,000,000." This is really bad: it uses more memory and more CPU than we'd like.

We can work around this in Haskell by being explicit about which values should be evaluated. There are a few ways to do this, but in our case, the easiest is:

let loop i !total =
      if i > 1000000
        then total
        else loop (i + 1) (total + i)
 in loop 1 0

All I've done is added an exclamation point in front of the total argument. This is known as a bang pattern, and says "make sure this is evaluated before running the rest of this function." The need to do this in some cases is definitely a downside to Haskell's laziness. On the other hand, as we'll see shortly, you often don't need to bother if you use the right kinds of functions.

Laziness Is Awesome

Let's go back to pseudocode and rewrite our summation:

total := 0
for(i := 1; i <= 1000000; i++) {
  total += i
}

Pretty simple. But now let's modify this to only sum up the even numbers:

total := 0
for(i := 1; i <= 1000000; i++) {
  if (isEven(i)) {
    total += i
  }
}

OK, that's fine. But now, let's sum up the indices modulus 13 (for some weird reason):

total := 0
for(i := 1; i <= 1000000; i++) {
  if (isEven(i)) {
    total += i % 13
  }
}

Each of these modifications is fine on its own, but at this point it's getting harder to see the forest for the trees. And fortunately each of these transformations was relatively simple. If some of the requirements were more complicated, fitting it into the for loop may be more challenging.

Let's go back to the beginning with Haskell. We saw how we could do it with a loop, but let's see the real way to sum the numbers from 1 to 1,000,000:

-- Bad
let loop i !total =
      if i > 1000000
        then total
        else loop (i + 1) (total + i)
 in loop 1 0

-- Awesome!
sum [1..1000000]

We use list range syntax to create a list with one million numbers in it. On its face, this looks terrible: we need to allocate about 8mb of data to hold onto these integers, when this should run in constant space. But this is exactly where laziness kicks in: instead of allocating all of these values immediately, we allocate a thunk. Each time we step through the list, our thunk generates one new integer and a new thunk for the rest of the list. We're never using more than a few machine words.

There are also other optimizations in GHC to avoid even allocating those thunks, but that's not something I'm going to cover today.

Anyway, let's continue. We can easily tweak this to only add up the even numbers:

sum (filter even [1..1000000])

This uses the filter higher order function, and likewise avoids allocating an entire list at once. And doing the silly modulus 13 trick:

sum (map (`mod` 13) (filter even [1..1000000]))

Laziness is definitely a mixed bag, but combined with the functional style of Haskell in general, it allows you to write higher level, declarative code, while keeping great performance.

Short-Circuiting for Free

Lots of languages define && and || operators which stop evaluation early, e.g.:

foo() && bar()

bar is only called if foo returns true. Haskell works the same way, but these operators aren't special; they just use laziness!

False && _ = False
True && x = x

True || _ = True
False || x = x

This even scales up to functions working on lists of values, such as andorall, and any.

Other Downsides

There's one other downside to laziness, and a historical artifact. Laziness means that exceptions can be hiding inside any thunk. This is also known as partial values and partial functions. For example, what does this mean?

head []

Generally speaking, partiality is frowned upon, and you should use total functions in Haskell.

The historical artifact is that many bad functions are still easily available, and they should be avoided. head is arguably an example of that. Another is the lazy left fold function, foldl. In virtually all cases, you should replace it with a strict left fold foldl'.

Summary of Laziness

Advantages

  • More composable code.
  • Get efficient results from combining high level functions.
  • Short-circuiting like && and || is no longer a special case.

Disadvantages

  • Need to worry about space leaks.
  • Exceptions can be hiding in many places.
  • Unfortunately some bad functions like foldl still hanging around.

Side note: There's a major overlap with Python generators or Rust iterators, but laziness in Haskell is far more pervasive than these other approaches.

Others

Due to time constraints, I'm not going to be able to go into detail on a bunch of other examples I wanted to talk about. Let me just throw out some quick thoughts on them.

Parser (and Other) DSLs

  • Operator overloading!
  • Abstract type classes like Applicative and Alternative a natural fit, e.g.:parseXMLElement <|> parseXMLText.
  • Able to reuse huge number of existing library functions, e.g. optionalmany
  • General purpose do-notation is great.
data Time = Time Hour Minutes Seconds (Maybe AmPm)
data AmPm = Am | Pm

parseAmPm :: Parser Time
parseAmPm = Time
  <$> decimal
  <*> (":" *> decimal)
  <*> (":" *> decimal)
  <*> optional (("AM" $> Am) <|> ("PM" $> Pm))

c/o @queertypes

Advanced Techniques

  • Free monads.
  • Monad transformer stacks.
  • Lens, conduit, pipes, ...
  • Lots of ways to do things in Haskell!
  • It's a plus and a minus.
  • Recommendation: choose a useful subset of Haskell and its libraries, and define some best practices.

Conclusion

  • Haskell combines a lot of uncommon features.
  • Very few of those features are unique.
  • Combining those features allows you to write code very differently than in other languages.
  • If you want readable, robust, easy to maintain code, I think it's a great choice.
  • Be aware of the sharp edges, they do exist!

Deploying code to production can be filled with uncertainty. Reduce the risks, and deploy earlier and more often. Download this free guide to learn more. Brought to you in partnership with Rollbar.

Topics:
web dev ,tutorial ,haskell ,laziness ,parsing

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}