Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Weird Ruby Part 4: Code Pods (Blocks, Procs, and Lambdas)

DZone's Guide to

Weird Ruby Part 4: Code Pods (Blocks, Procs, and Lambdas)

· Performance Zone
Free Resource

[This article was written by Jonan Scheffler]

This is part 4 of a series on Weird Ruby. Don’t miss Weird Ruby Part 1: The Beginning of the End,Weird Ruby Part 2: Exceptional Ensurance, and Weird Ruby Part 3: Fun with the Flip-Flop Phenom.

spacecraft podHello again, nerds! Welcome back for the fourth installment in the Weird Ruby series. This time we’re going to talk about closures in Ruby, specifically when and why you would want to select a specific type of closure for your magical code adventures.

Let’s start with this simple closure definition on loan from Wikipedia:

“In programming languages, a closure (also lexical closure or function closure) is a function or reference to a function together with a referencing environment – a table storing a reference to each of the non-local variables (also called free variables or upvalues) of that function. A closure – unlike a plain function pointer – enables a function to access those non-local variables even when invoked outside its immediate lexical scope.”

I’m sure there are some number of you that had no trouble grokking that explanation on the first try; unfortunately I can’t count myself amongst your esteemed membership. This may be one of those cases where an honest effort to be as accurate as possible results in an overly dense description. A much simpler way to look at closures is simply as portable code—tiny code pods that you can pass about like any other object and execute at will.

Three varieties of closures

Closures in Ruby are so important that we have 3 varieties: blocks, procs, and lambdas.

&block
Proc.new
lambda

All three of these constructs could be called “closures,” though some of them don’t fit the technical definition perfectly (pedants put your hands down, nobody is going to call on you). I’m going to step through each of these in order and discuss some of their differences to give you a better idea of how closures work in Ruby, and hopefully we’ll find a few things that surprise you.

Blocks

Blocks can be created a couple different ways in Ruby:

do 
  #this is a block
end

and

{ #this is a bloc

These two versions function in exactly the same way (except when they don’t, details shortly), they even share the same bytecode:

bytecode generated by running this line in irb:
RubyVM::InstructionSequence.compile(”10.times do; 1337807; end”).disassemble

== disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>========== 
== catch table 
| catch type: break st: 0002 ed: 0006 sp: 0000 cont: 0006 
|----------------------------------------------------------------------- 
0000 trace 1 ( 1) 
0002 putobject 10 
0004 send <callinfo!mid:times, argc:0, block:block in <compiled>> 
0006 leave 
== disasm: <RubyVM::InstructionSequence:block in <compiled>@<compiled>>= 
== catch table 
| catch type: redo st: 0002 ed: 0006 sp: 0000 cont: 0002 
| catch type: next st: 0002 ed: 0006 sp: 0000 cont: 0006 
|----------------------------------------------------------------------- 
0000 trace 256 ( 1) 
0002 trace 1 
0004 putobject 1337807 
0006 trace 512 
0008 leave

bytecode generated by running this line in irb: 
RubyVM::InstructionSequence.compile(”10.times { 1337807 }”).disassemble

== disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>========== 
== catch table 
| catch type: break st: 0002 ed: 0006 sp: 0000 cont: 0006 
|----------------------------------------------------------------------- 
0000 trace 1 ( 1) 
0002 putobject 10 
0004 send <callinfo!mid:times, argc:0, block:block in <compiled>> 
0006 leave 
== disasm: <RubyVM::InstructionSequence:block in <compiled>@<compiled>>= 
== catch table 
| catch type: redo st: 0002 ed: 0006 sp: 0000 cont: 0002 
| catch type: next st: 0002 ed: 0006 sp: 0000 cont: 0006 
|----------------------------------------------------------------------- 
0000 trace 256 ( 1) 
0002 trace 1 
0004 putobject 1337807 
0006 trace 512
0008 leave

In the Ruby community, multiline block bodies are conventionally contained within a do; end block and the {} syntax is reserved for single-line bodies. An alternative to this convention was proposed by the late Jim Weirich (we love you Jim, thank you for everything) and perhaps others before and after him, but I know it as Jim’s block convention.

Jim proposed that a block with a return value you planned to use (a functional block) should use the curly bracket syntax, and that a block that produced only output or side effects (a procedural block) should use the do-end syntax. This allows you to infer the purpose of a block at a glance, which certainly sounds useful.

While this convention is not used as often as the previously mentioned multiline convention, I find it quite compelling, though it initially seemed less intuitive. Often when I see something that I feel is unusual or counterintuitive I eventually discover that the concept was difficult to see clearly because I was gazing uphill.

As I showed earlier, these two syntaxes are identical with one exception; their precedence.

If two methods are chained and a block is passed as an argument, the argument will apply to either the first or the second method, depending on the syntax.

def first(arg = nil)
  yield('first') if block_given?
end
def second(arg = nil)
  yield('second') if block_given?
end
# This block will be passed to first
first second do |method_name|
  puts method_name
end
# => first
# This block will be passed to second
first second { |method_name| puts method_name }
# => second

If you’re going to chain a series of methods together and pass one of them a block be aware that the block may not end up where you think it will. Even if you get this code to do what you want, you’re making things unnecessarily complex for yourself and future programmers; do what you can to reduce the number of rage faces you add to the world and break this out into multiple calls with an intention-revealing local variable.

Passing arguments to blocks

When you pass an argument to a block you wrap the incoming arguments in | like this:

launch { |pod| pod.release }

Now you can call yield with an argument inside the launch method and it will be passed to the block as an argument called pod:

def launch   
  yield(Pod.new) 
end

You can include multiple arguments just like you would when calling a method, or you can omit the arguments entirely, as we commonly do with the Integer#times:

10.times do   
  puts “Preparing to launch...”
end

The block above does in fact have an argument, the counter for our loop, as you can see in the underlying C code:

for (i=0; i<end; i++) {     
    rb_yield(LONG2FIX(i)); 
}

If you want a reference to that counter inside of Integer#times you can reference the argument like this:

10.times do |count| 
  puts “Preparing to launch in #{10 - count}” 
end

This leads us to an important point about blocks and procs: arguments are optional. If you create a proc expecting to have arguments and you omit them from your actual call, the proc will execute normally:

pod = Proc.new do |missiles, lasers| 
  puts “Missiles loaded” if missiles 
  puts “Lasers charged” if lasers
end

The call() prints nothing and does not raise an error, even though we have agreed to pass two arguments and instead pass none.

Blocks and procs share this fast and loose argument behavior, and all other behaviors as well. Blocks and procs are identical constructs except that a proc is an actual Ruby object and a block is just part of the syntax of a method invocation.

Like blocks, a proc can be created in a couple of ways:

Proc.new { surprise_nuke } 
proc { surprise_nuke }

These examples are functionally identical on modern versions of Ruby, but the procsyntax in versions of Ruby prior to 1.9 actually created a Lambda instead, just to ensure that no one on Earth could ever possibly keep any of this straight in their heads. If you occasionally work in older Ruby versions, you might be wise to avoid the older syntax and just use Proc.new.

As I showed earlier, procs are not strict about arity (number of arguments) and neither are blocks. In contrast, our other flavor of code pod, the lambda, cares very much about the arity:

codepod = lambda { |promise| puts “I #{promise}, I’ll never die.” }
codepod.call()
ArgumentError: wrong number of arguments (0 for 1)

Our code pod raises an error when we call it without fulfilling our promise to provide one argument. It will raise a similar error if we pass too many arguments:

codepod = lambda { |promise| puts “I #{promise}, I’ll never die.” }
codepod.call(true, false) 
ArgumentError: wrong number of arguments (2 for 1)

If you tell your lambda that it takes a specific number of arguments it expects you to follow through on that promise. None of this “take any arguments I want” sort of proc tomfoolery.

Astute viewers may also have noticed that lambdas share a class with their rebellious siblings the procs; Lambda is not a class of object in Ruby.

Once again the C code gives us a hint as to how this works:

proc_new(VALUE klass, int8_t is_lambda)

All of the methods that generate procs and lambdas use this function: proc_new. When you want your proc to have lambda behavior you pass TRUE to proc_new and you get a lambda. If you want one of those crazy procs that don’t seem to care what you do, just pass FALSE. The int8_t is how C programmers type Boolean, both because they like to look 1337 and they don’t actually have a Boolean type. TRUE and FALSE in this case are actually just constants defined to be 1 and 0, and int8_t is a single byte representation of those values.

There is one other very important difference introduced by setting is_lambda to TRUEhere: the resulting lambda will return from it’s own context. Procs will return from the surrounding context instead.

Returning from pods

The easiest way to illustrate this difference in return behavior is by creating each of them without any surrounding context:

labamba = lambda { return } 
labamba.call 
=> nil
proctor = Proc.new { return } 
proctor.call 
=> LocalJumpError: unexpected return

The proc created the return in the same context where the proc itself was created, so calling return within a proc is essentially the same as typing return alone and trying to run it.

return => LocalJumpError: unexpected return

Returns need friends, they need something to return from. A return inside a method will return from that method, so a proc created inside that method will also return from that method.

def lawful_good   
  paladin = Proc.new { return }   
  chaotic_evil(paladin)   
  puts “Lawful good!” 
end
def chaotic_evil(recruit)   
  recruit.call   
  puts “Chaotic evil!” 
end
lawful_good 
=> nil

The lawful_good method never runs to completion here, so we never print anything from either method. When paladin is created in the lawful_good method its return is bound forever to returning from that method. No matter where paladin then travels in the world it will immediately fly back to the lawful_good method and execute the return when it’s called. In this example, paladin was called in the middle of the chaotic_evil method in an unsuccessful attempt to convert our noble paladin. The chaotic_evil method stopped executing and so did the lawful_good method, immediately returning nil.

A return inside of a lambda will have a very different result:

def lawful_good   
  paladin = lambda { return }   
  chaotic_evil(paladin)   
  puts “Lawful good!” 
end
def chaotic_evil(recruit)   
  recruit.call   
  puts “Chaotic evil!” 
end
lawful_good
Chaotic evil! 
Lawful good!

The lambda creates its own special snowflake of a context for its return, so callingpaladin really only causes the lambda to return from itself. After the lambda returns nothing, the next line of the chaotic_evil method proclaims, “Chaotic evil!” Fortunately, the lawful_good method will continue executing normally as well, and our noble paladin is once again back with the good team. The lawful_good method proudly announces, “Lawful good!”

In order to illustrate the need for procs to have this type of return behavior, let’s look at the Integer#times method again:

10.times do |count|   
  puts “#{count} little monkey(s) jumping on the bed.” 
end

Remember that the do-end syntax here is creating a block, which is effectively a proc. Each iteration of the loop yields the count to that block and puts our string with the count until we run out of numbers.

Let’s say we have a huge bias against the seventh monkey. That seventh monkey always gets too rowdy and we’re worried one of the other monkeys is going to get hurt.

def monkeys!   
  10.times do |count|     
    return if count == 7    
    puts “#{count} little monkey(s) jumping on the bed.”   
  end 
end

When we get to that return, we expect to return from the monkeys! method that got all of this jumping started in the first place. The block that we created bound the return to the local context and it returns from there, so we will return from monkeys!, ending our shenanigans promptly.

Now let’s pretend that block we’ve created has lambda behavior instead. This is not really an option in Ruby, but that’s why it’s pretend. The return would bind to the context of the lambda we created, and when the count reached seven we would simply return from the lambda itself; we would return from that iteration of our loop. We would get to seven and skip that count moving straight to eight.

Not only is that behavior counterintuitive, it duplicates the behavior of another keyword in Ruby, the next keyword:

10.times do |count| 
  next if count == 7 
  puts “#{count} little monkeys jumping on the bed.” 
end

That’s why blocks have the return behavior that they do; so you can return from a method even if you’re in the middle of your loop. Sometimes you just need to eject, like when you see that seventh monkey coming.

Creating closures

Each of the methods below is a valid means of constructing a closure in Ruby:

Proc.new {} 
proc {} 
lambda {} 
-> {}

The first two methods create procs, though the proc keyword has worked this way only since Ruby 1.9. Before that, proc created a lambda, which rightfully confused everyone who has ever used it.

The second two examples both create lambdas, and the last of these is probably the most popular. The -> syntax was introduced in Ruby 1.9 and is commonly referred to as the stabby lambda, quite an aggressive name for such a cuddly little code pod. I think we should call this the “baby rocket.”

Calling closures

There are a number of ways to call closures in Ruby, some of them clearer than others.

codepod = Proc.new {} 
codepod.call 
codepod() 
codepod.() 
codepod[] 
codepod.[]

Each of these methods for calling a closure is valid, but let’s just agree to stick to call, as this topic is confusing enough. Remember your code can only get so bad before future-you is driven to build a time machine to come back and beat you up.

Review

There are three types of closures in Ruby, but given that blocks and procs have similar behaviors, you can treat them as one. So we have procs and lambdas, differentiated by their arity and return behavior:

  • proc
    • arbitrary arguments
    • returns from home (the method where it was created)
  • lambda
    • strict arguments
    • returns from self

There are some very good reasons why we have these distinct types of closures, and understanding how to use each of them effectively will treat you well along your path to mastering Ruby.

If you have any questions or comments, feel free to reach out: jonan@newrelic.com. It’s always nice to know I’m not just typing into the void. Also, join the discussion about this topic over at the New Relic Community Forum.

Proc.new do   
  puts “<3 Jonan”
end



Topics:

Published at DZone with permission of Fredric Paul, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}