How to Implement a Ruby Hash Like Grammar in Parslet, Part 2
How to Implement a Ruby Hash Like Grammar in Parslet, Part 2
We wrap up this two-part series on Ruby, parslet, and regex by looking at transformers, repeated rules, and data structures in this application.
Join the DZone community and get the full member experience.
Join For FreeJumpstart your Angular applications with Indigo.Design, a unified platform for visual design, UX prototyping, code generation, and app development.
Welcome back! If you missed Part 1, you can check it out here.
Transformers - Abstract Syntax Trees in Disguise
I mentioned transformers previously, but they're such an abstract concept that it helps to look at an example. Think of them as a way to reduce leaf nodes on our tree so that they make more sense. One way to represent our current parse tree is this:
- key_value
- key: "Hello"
- val:
- string: "world"
You'll notice that the key
and val
results are not at the same level. Wouldn't it be great if there was a way for us to transform {:string=>"world"}
into something more usable? With a transformer, we can. We will convert that leaf hash into a simple string. Here's our test.
def test_parses_a_key_value_pair
input = %Q{ hello: "world" }
parser = MyParser.new.key_value
tree = parser.parse_with_debug(input)
refute_equal nil, tree
actual = MyTransformer.new.apply(tree)
expected = {key_value: {key: "hello", val: "world"}}
assert_equal expected, actual
end
Transformers match on a tree's hash key and value type and then modify the result. Here's the transformer that will allow this test to pass:
class MyTransformer < Parslet::Transform
rule(string: simple(:st)) { st.to_s }
end
What is going on here? We are telling our transformer to match the key string
when you see that, and a simple
value, then make a match, and name that match st
and call our block. What is a simple
value? A value such as a string or an integer, and not something complex like a hash or an array.
Given this definition it will match {:string => "foo"}
but not { :string => {:complex => "value"}}
.
Once the transformer matches {:string => "foo"}
, the simple(:st)
is captured. In this case, that is a string of "foo"
. That value is then named st
and passed to our block. We want to ensure it's a string so we call to_s
on it and then it is returned. The entire hash of { :string => foo }
would then become a simple string "foo"
So, a leaf node that looked like this:
{string: "world"}
Would now look like this:
"world"
There is one tricky point in this example. Not only was the value of "world" modified (we called to_s
on it) but the entire hash went away, and was replaced by the value we returned. This is a subtlety that is kind of pointed out in the documentation for transformers, but not really that well illustrated.
Here's the example they give, but with more details. If you have a tree like this:
{
dog: 'terrier',
cat: 'suit'
}
You cannot match dog
by itself; i.e., a rule like this makes no sense:
rule(:dog => 'terrier') { 'foo' }
Because you're not only modifying the value, you're modifying the entire match (the key and the value). If this was a legal match it would produce something like this:
{
'foo',
cat: 'suit'
}
Which is not a legal Ruby object (the 'foo'
in the example is not a key/value pair, and a hash can only hold key/value pairs). If you wanted to make this modification to a parse tree, you have to match the entire hash and then return a new hash object:
class MyTransformer < Parslet::Transform
rule(dog: 'terrier', cat: simple(:cat_value)) {
out = {}
out[:dog] = 'foo'
out[:cat] = cat_value
out
}
end
Here we're only matching when a dog
key has a value of terrier
, but matching when the cat key can be any simple object (such as a string).
Now a test like this would pass:
def test_dog_cat
tree = { dog: 'terrier', cat: 'suit'}
actual = MyTransformer.new.apply(tree)
expected = { dog: 'foo', cat: 'suit' }
assert_equal expected, actual
end
While the transformers demonstrated here aren't the most useful, they can save you work down the road when your tree starts to get extremely nested. We'll come back later.
Repeated Rules
At this point, we can parse a single key/value pair, but what about multiple pairs? Here's a test:
def test_parses_multiple_key_value_pairs
input = %Q{ hello: "world", hi: "there" }
parser = MyParser.new.key_value
tree = parser.parse_with_debug(input)
refute_equal nil, tree
end
In this case, we already have a key_value
rule, and a comma
rule, we either want to match a single key/value pair, or a key/value pair followed by a comma and another key/value. This latter pattern can repeat indefinitely. Here's how this looks as a parslet rule:
rule(:named_args) {
spaces? >> (
key_value >> (
comma >> key_value
).repeat
).as(:named_args) >> spaces?
}
This rule gets the tests to pass, though what kind of tree do you think our test string (hello: "world", hi: "there"
) returns?
It's a hash, yes. The first layer only has one key, named :named_args
. One key points to one value. How do you think the two key_value
's are stored? If it was only one key_value
, it could be represented as one hash:
{ :named_args =>
{ :key_value =>
{ :key => "hello", :val=>{:string=>"world"} }
}
}
However, we have two elements with the same key, key_value
. The only way to store them both is via an Array. The output of parsing hello: "world", hi: "there"
would look like this:
{
:named_args =>
[
{ :key_value =>
{ :key => "hello", :val => {:string=>"world"}}
},
{
:key_value =>
{ :key=>"hi", :val => {:string=>"there"}}
}
]
}
This is a subtle but important point in parslet, depending on how many times an element is matched, the value of a hash might be a hash or an array of hashes.
Our First True Data Structure
Now that we have repeated key/value pairs, let's add a little syntactic sugar on them to make a Ruby style hash:
def test_parses_hash
input = '{ hello: "world", hi: "there" }'
parser = MyParser.new.hash_obj
tree = parser.parse_with_debug(input)
refute_equal nil, tree
end
How can we match this? Hopefully, you've got a reasonable guess. Here's my answer:
rule(:hash_obj) {
spaces? >> (
str('{') >> named_args >> str('}')
) >> spaces?
}
We're essentially letting the named_args
do the heavy lifting. You might also notice that I chose not to match anything with an as
method. In this case, nesting our key/value pair one level deeper in a hash
key, doesn't really buy anything for us right now.
While this gets the tests to pass, I want to go one step further.
actual = MyTransformer.new.apply(tree)
expected = { hello: "world", hi: "there" }
assert_equal expected, actual
In this case, we're transforming a string representation of a hash into an actual hash. While it might not seem that exciting, considering we could have simply run eval
to produce the same thing, keep in mind that this ability to implement a grammar and build a transformer to produce the objects we want allows us to implement virtually any language of our choosing, not just replicate Ruby features.
For now, this is only a failing test. Here's the tree we can manipulate with our transformer:
{ :named_args => [
{:key_value => { :key => "hello", :val => {:string=>"world"}}},
{:key_value => { :key => "hi", :val => {:string=>"there"}}}
]
}
In our transformer we need to match this entire hash, there is only one key in the top-level hashnamed_args
, however, it's pointing at a pretty complex value, an array holding hashes that point to hashes.
We already have a transformer that will convert the {:string =>"there"}
into simply "there"
.
Parslet provides the ability to match ANY values with the keyword subtree
. This is dangerous because you are now responsible for handling any inconsistencies in the input. For example, remember we looked at how this rule could return either a single hash or an array of hashes, when you choose to match via subtree
you're now responsible for knowing that and doing the right thing.
Let's make sure that we can match this hash, then we'll add logic.
class MyTransformer < Parslet::Transform
# ...
rule(:named_args => subtree(:na)) {
puts na.inspect
}
end
The output should be an array like this:
[
{:key_value => { :key => "hello", :val => "world" }},
{:key_value => { :key => "hi", :val => "there" }}
]
Notice that our val
is simply a string here and not the complex hash object. This is because our prior transformation was already applied.
To transform this into a hash, we need to ensure the input is always consistent, then build a hash by looping through each element in our input and adding the keys and values to that hash. Here's my solution:
class MyTransformer < Parslet::Transform
# ...
rule(:named_args => subtree(:na)) {
Array(na).each_with_object({}) { |element, output_hash|
key = element[:key_value][:key].to_sym
val = element[:key_value][:val]
output_hash[key] = val
}
}
end
The Array()
ensures we are dealing with an array. The each_with_object
iterates over each element while yielding it and the input (in this case a hash) to the block, the return will be the input (the hash). We can then extract the key and value from each hash, and add them to our output hash.
This does the trick, and our test now passes!
You can run the code on GitHub.
Just the Beginning
This article is already really long, but we're just scratching the surface of what you can do with parslet. You'll eventually want to keep adding grammar rules until you're happy with your language. I mentioned, implementing arrays, or integers. Do you think you could do that now? You can also add a root
node which is where your parser starts parsing by default, rather than calling an explicit parser rule like we've been doing in our tests.
If you followed this tutorial, you're mostly there already! I recommend also walking through the official parslet tutorial and looking at some of the other examples. If you get stuck, try going back to the basics, as well as writing more and smaller tests.
When constructing parsers, try to imagine what kind of a "tree" your desired grammar might produce, and try working, starting from the leaves. Once you've got the parser, try working in the same direction with your transformers, building from the leaves inward. While this will feel clunky at first, you'll get the hang of it over time.
When all else fails, write a blog post!
parslet Cheatsheet
Parser
Parser Syntax
str()
matches an exact string.match()
matches a regex.any
matches anything, an alias formatch('.')
..maybe
is a method on a match object that allows it to be matched 0 times..repeat
is a method on a match object that allows it to be matched a number of times. If passed arguments, they representrepeat(<min times>, <max times>)
..absent?
is a method on a match object, it essentially inverts the match so thatstring("f").absent?
will not match an"f"
.>>
can be used to chain together rules (the "and then" operator).|
can be used to match one of multiple rules in order (the "or" operator)..as
is a method on a match segment that allows you to capture values and name the capture.
Parser Notes
- Parsers are defined using the
rule
keyword. - Matching with the same
as
value multiple times at the same level will result in an array instead of a hash.
Transformer Syntax
simple
matches only non-nested values; i.e., numbers, strings, etc.subtree
matches EVERYTHING.sequence
is an in-between that matches nested objects, but not deeply nested objects.
Transformer Notes
- Transformers are defined using the
rule
keyword. - The transformer rule expects a
key => value
input. - A transformer rule must match an entire hash (i.e., the key and the value, not just the key).
- The result of the transformer replaces the whole hash.
- There must be somewhere for that replacement to live (i.e., the dog/cat example).
Downloadable Example App
Here's the source code to my example parslet app.
Take a look at an Indigo.Design sample application to learn more about how apps are created with design to code software.
Published at DZone with permission of Richard Schneeman , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
{{ parent.title || parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}