Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Implement a Ruby Hash Like Grammar in Parslet, Part 2

DZone's Guide to

How to Implement a Ruby Hash Like Grammar in Parslet, Part 2

We wrap up this two-part series on Ruby, parslet, and regex by looking at transformers, repeated rules, and data structures in this application.

· Web Dev Zone ·
Free Resource

Code something amazing with the IBM library of open source blockchain patterns. Content provided by IBM.

Welcome back! If you missed Part 1, you can check it out here.

Transformers - Abstract Syntax Trees in Disguise

I mentioned transformers previously, but they're such an abstract concept that it helps to look at an example. Think of them as a way to reduce leaf nodes on our tree so that they make more sense. One way to represent our current parse tree is this:

- key_value
  - key: "Hello"
  - val:
    - string: "world"

You'll notice that the key and val results are not at the same level. Wouldn't it be great if there was a way for us to transform {:string=>"world"} into something more usable? With a transformer, we can. We will convert that leaf hash into a simple string. Here's our test.

def test_parses_a_key_value_pair
  input = %Q{ hello: "world" }
  parser = MyParser.new.key_value
  tree = parser.parse_with_debug(input)
  refute_equal nil, tree

  actual = MyTransformer.new.apply(tree)
  expected = {key_value: {key: "hello", val: "world"}}
  assert_equal expected, actual
end

Transformers match on a tree's hash key and value type and then modify the result. Here's the transformer that will allow this test to pass:

class MyTransformer < Parslet::Transform
  rule(string: simple(:st)) { st.to_s }
end

What is going on here? We are telling our transformer to match the key string when you see that, and a simple value, then make a match, and name that match st and call our block. What is a simple value? A value such as a string or an integer, and not something complex like a hash or an array.

Given this definition it will match {:string => "foo"} but not { :string => {:complex => "value"}}.

Once the transformer matches {:string => "foo"}, the simple(:st) is captured. In this case, that is a string of "foo". That value is then named st and passed to our block. We want to ensure it's a string so we call to_s on it and then it is returned. The entire hash of { :string => foo } would then become a simple string "foo"

So, a leaf node that looked like this:

{string: "world"}

Would now look like this:

"world"

There is one tricky point in this example. Not only was the value of "world" modified (we called to_s on it) but the entire hash went away, and was replaced by the value we returned. This is a subtlety that is kind of pointed out in the documentation for transformers, but not really that well illustrated.

Here's the example they give, but with more details. If you have a tree like this:

{
  dog: 'terrier',
  cat: 'suit'
}

You cannot match dog by itself; i.e., a rule like this makes no sense:

rule(:dog => 'terrier') { 'foo' } 

Because you're not only modifying the value, you're modifying the entire match (the key and the value). If this was a legal match it would produce something like this:

{
  'foo',
  cat: 'suit'
}

Which is not a legal Ruby object (the 'foo' in the example is not a key/value pair, and a hash can only hold key/value pairs). If you wanted to make this modification to a parse tree, you have to match the entire hash and then return a new hash object:

class MyTransformer < Parslet::Transform
  rule(dog: 'terrier', cat: simple(:cat_value)) {
    out = {}
    out[:dog] = 'foo'
    out[:cat] = cat_value
    out
  }
end

Here we're only matching when a dog key has a value of terrier, but matching when the cat key can be any simple object (such as a string).

Now a test like this would pass:

def test_dog_cat
  tree = { dog: 'terrier', cat: 'suit'}

  actual = MyTransformer.new.apply(tree)
  expected = { dog: 'foo', cat: 'suit' }
  assert_equal expected, actual
end

While the transformers demonstrated here aren't the most useful, they can save you work down the road when your tree starts to get extremely nested. We'll come back later.

Repeated Rules

At this point, we can parse a single key/value pair, but what about multiple pairs? Here's a test:

def test_parses_multiple_key_value_pairs
  input = %Q{ hello: "world", hi: "there" }
  parser = MyParser.new.key_value
  tree = parser.parse_with_debug(input)
  refute_equal nil, tree
end

In this case, we already have a key_value rule, and a comma rule, we either want to match a single key/value pair, or a key/value pair followed by a comma and another key/value. This latter pattern can repeat indefinitely. Here's how this looks as a parslet rule:

rule(:named_args) {
  spaces? >> (
    key_value >> (
      comma >> key_value
    ).repeat
  ).as(:named_args) >> spaces?
}

This rule gets the tests to pass, though what kind of tree do you think our test string (hello: "world", hi: "there") returns?

It's a hash, yes. The first layer only has one key, named :named_args. One key points to one value. How do you think the two key_value's are stored? If it was only one key_value, it could be represented as one hash:

{ :named_args =>
  { :key_value =>
     { :key  => "hello",  :val=>{:string=>"world"} }
  }
}

However, we have two elements with the same key, key_value. The only way to store them both is via an Array. The output of parsing hello: "world", hi: "there" would look like this:

{
  :named_args =>
    [
      { :key_value =>
        { :key => "hello", :val => {:string=>"world"}}
      },
      {
        :key_value =>
        { :key=>"hi", :val => {:string=>"there"}}
      }
    ]
}

This is a subtle but important point in parslet, depending on how many times an element is matched, the value of a hash might be a hash or an array of hashes.

Our First True Data Structure

Now that we have repeated key/value pairs, let's add a little syntactic sugar on them to make a Ruby style hash:

def test_parses_hash
  input = '{ hello: "world", hi: "there" }'
  parser = MyParser.new.hash_obj
  tree = parser.parse_with_debug(input)
  refute_equal nil, tree
end

How can we match this? Hopefully, you've got a reasonable guess. Here's my answer:

rule(:hash_obj) {
  spaces? >> (
    str('{') >> named_args >> str('}')
  ) >> spaces?
}

We're essentially letting the named_args do the heavy lifting. You might also notice that I chose not to match anything with an as method. In this case, nesting our key/value pair one level deeper in a hash key, doesn't really buy anything for us right now.

While this gets the tests to pass, I want to go one step further.

actual = MyTransformer.new.apply(tree)
expected = { hello: "world", hi: "there" }
assert_equal expected, actual

In this case, we're transforming a string representation of a hash into an actual hash. While it might not seem that exciting, considering we could have simply run eval to produce the same thing, keep in mind that this ability to implement a grammar and build a transformer to produce the objects we want allows us to implement virtually any language of our choosing, not just replicate Ruby features.

For now, this is only a failing test. Here's the tree we can manipulate with our transformer:

{ :named_args => [
      {:key_value => { :key => "hello", :val => {:string=>"world"}}},
      {:key_value => { :key => "hi", :val => {:string=>"there"}}}
    ]
}

In our transformer we need to match this entire hash, there is only one key in the top-level hashnamed_args, however, it's pointing at a pretty complex value, an array holding hashes that point to hashes.

We already have a transformer that will convert the {:string =>"there"} into simply "there".

Parslet provides the ability to match ANY values with the keyword subtree. This is dangerous because you are now responsible for handling any inconsistencies in the input. For example, remember we looked at how this rule could return either a single hash or an array of hashes, when you choose to match via subtree you're now responsible for knowing that and doing the right thing.

Let's make sure that we can match this hash, then we'll add logic.

class MyTransformer < Parslet::Transform
  # ...

  rule(:named_args => subtree(:na)) {
    puts na.inspect
  }
end

The output should be an array like this:

[
  {:key_value => { :key => "hello", :val => "world" }},
  {:key_value => { :key => "hi", :val => "there" }}
]

Notice that our val is simply a string here and not the complex hash object. This is because our prior transformation was already applied.

To transform this into a hash, we need to ensure the input is always consistent, then build a hash by looping through each element in our input and adding the keys and values to that hash. Here's my solution:

class MyTransformer < Parslet::Transform
  # ...

  rule(:named_args => subtree(:na)) {
    Array(na).each_with_object({}) { |element, output_hash|
      key = element[:key_value][:key].to_sym
      val = element[:key_value][:val]
      output_hash[key] = val
    }
  }
end

The Array() ensures we are dealing with an array. The each_with_object iterates over each element while yielding it and the input (in this case a hash) to the block, the return will be the input (the hash). We can then extract the key and value from each hash, and add them to our output hash.

This does the trick, and our test now passes!

You can run the code on GitHub.

Just the Beginning

This article is already really long, but we're just scratching the surface of what you can do with parslet. You'll eventually want to keep adding grammar rules until you're happy with your language. I mentioned, implementing arrays, or integers. Do you think you could do that now? You can also add a root node which is where your parser starts parsing by default, rather than calling an explicit parser rule like we've been doing in our tests.

If you followed this tutorial, you're mostly there already! I recommend also walking through the official parslet tutorial and looking at some of the other examples. If you get stuck, try going back to the basics, as well as writing more and smaller tests.

When constructing parsers, try to imagine what kind of a "tree" your desired grammar might produce, and try working, starting from the leaves. Once you've got the parser, try working in the same direction with your transformers, building from the leaves inward. While this will feel clunky at first, you'll get the hang of it over time.

When all else fails, write a blog post!

parslet Cheatsheet

Parser

Parser Syntax

  • str() matches an exact string.
  • match() matches a regex.
  • any matches anything, an alias for match('.').
  • .maybe is a method on a match object that allows it to be matched 0 times.
  • .repeat is a method on a match object that allows it to be matched a number of times. If passed arguments, they represent repeat(<min times>, <max times>).
  • .absent? is a method on a match object, it essentially inverts the match so thatstring("f").absent? will not match an "f".
  • >> can be used to chain together rules (the "and then" operator).
  • | can be used to match one of multiple rules in order (the "or" operator).
  • .as is a method on a match segment that allows you to capture values and name the capture.

Parser Notes

  • Parsers are defined using the rule keyword.
  • Matching with the same as value multiple times at the same level will result in an array instead of a hash.

Transformer Syntax

  • simple matches only non-nested values; i.e., numbers, strings, etc.
  • subtree matches EVERYTHING.
  • sequence is an in-between that matches nested objects, but not deeply nested objects.

Transformer Notes

  • Transformers are defined using the rule keyword.
  • The transformer rule expects a key => value input.
  • A transformer rule must match an entire hash (i.e., the key and the value, not just the key).
  • The result of the transformer replaces the whole hash.
  • There must be somewhere for that replacement to live (i.e., the dog/cat example).

Downloadable Example App

Here's the source code to my example parslet app.

Start coding something amazing with our library of open source Cloud code patterns. Content provided by IBM.

Topics:
web dev ,ruby ,regular expression ,documentation generation ,parslet

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}