This is the first in a series of posts based on a presentation I just did last Friday at the Barcelona Ruby Conference. I’ll also post a link to the video recording when the Baruco team has time to edit and post it online.
Let me start with a question: How many of you have actually read 20,000 Leagues Under the Sea, either in the original French or in some translation? [ A few people raised their hands, but most people in the audience did not. ] Yes, I’m not surprised. 20,000 Leagues is one of those classic novels we have all heard of, but few of us take the time to read.
While thinking about this presentation over the Summer, I decided to actually read the novel – and I’m glad that I did! It blew my mind on a number of different levels. Today I’m going to take the time to tell you a bit about the novel itself as we go.
However, actually I’m here today to talk about ActiveRecord. I’d like to explore how ActiveRecord works internally; how it finds the information we ask for and returns it to us as a Ruby object.
But why talk about ActiveRecord? We all know how to use ActiveRecord; most of you understand exactly what this line of code does. You didn’t need to come to Barcelona to learn how to use ActiveRecord.
The reason why I want to discuss this is that ActiveRecord hooked me; it first got me excited about using Rails. I came across Rails back in 2008, about six years ago now. And I can still remember the moment when I first typed a line of code similar to this one into the Rails console or into a Rails app somewhere. It was amazing! Before that I was using PHP or Java – I’ve since blocked out all memory of that so I’m not quite sure which it was :) – and when I saw how easy it was to use Rails to query a database I was very impressed.
Somehow Rails could take a simple, beautiful and readable line of Ruby code and convert it into a SQL statement like this. How could this possibly work? It seemed like magic!
But it’s not magic. There are thousands of lines of code and years of computer science research behind this simple line of code, the hard work behind the scenes that makes ActiveRecord queries possible.
Just as Professor Aronnax went on an underwater adventure with Captain Nemo, I’d like to take you on an adventure inside of ActiveRecord to find out how it works, how ActiveRecord generates and executes SQL statements.
And why stop there? Later we’ll dive underneath ActiveRecord and inside an actual database server to find out how it works too. How does it understand the SQL statements we give it? How does it find the data we ask for and return it to us? Let’s find out!
Here’s our plan: First we’ll start with a look at ActiveRecord::Relation, the top, public API for ActiveRecord many of you use everyday. Then, we’ll look deeper inside of ActiveRecord to find out how it converts our Ruby queries into SQL statements.
Later, in the second half of the presentation we’ll dive even deeper and directly into an actual relational database server (RDBMS); today I’ll use PostgreSQL as an example. How does Postgres understand the SQL statement that ActiveRecord sent it? How does it find the data we want? Finally, if we have time we’ll look at the B-Tree Algorithm, which is part of the real magic that allows database servers to work.
Let’s use my line of code from earlier as an example – today we’ll be searching for Captain Nemo together. Therefore, we’ll start with a User class, a subclass of ActiveRecord::Base.
When I call where what happens? It turns out the where method is defined in the ActiveRecord::Querying module:
But as you can see, ActiveRecord delegates the where method over to another method called all, which returns a new instance of a different class called ActiveRecord::Relation. In fact, my call to User.where is entirely equivalent to calling User.all.where:
ActiveRecord actually implements the where method using the new instance of ActiveRecord::Relation. Internally, where is implemented by the ActiveRecord::QueryMethods module, included into ActiveRecord::Relation. Next, ActiveRecord::QueryMethods#where returns, in turn, a second new instance of ActiveRecord::Relation:
If you look at the right, you can see the second ActiveRecord::Relation object contains information about what record we’re looking for, that we want the records where the name is “Captain Nemo.”
Of course, we don’t want all of the Captain Nemo users; we just want the first one. Next, we call the first method on the new ActiveRecord::Relation:
The Internal Implementation of ActiveRecord::FinderMethods#first
Now you can see ActiveRecord creates a third instance of ActiveRecord::Relation – this time with even more information about the query we’d like to execute. But what are all those other values: order, limit and offset? Where do they come from? We can find out by taking a look at the implementation of first. The ActiveRecord::FinderMethods module implements this, which ActiveRecord includes into the ActiveRecord::Relation class.
Here you can see because we didn’t pass in a value for limit, ActiveRecord calls find_nth and passes in a value of 0, indicating we want the first record from the query result set. The second argument, offset_index, turns out to have a value of zero, meaning we want to count into a window of records at the beginning of the result set, not a window located somewhere farther along the result set.
ActiveRecord implements a series of similar methods that will return the second, third, fourth or even fifth record:
You can see the pattern here; the first argument is a zero-based index indicating which record we want. And just in case we want the forty second record from the result set, ActiveRecord implements this useful method… :)
It’s no joke! forty_two is actually in the ActiveRecord source code; you can try it for yourself. Replacing first with the equivalent call to find_nth, here’s our example again:
Following the code path through the ActiveRecord::FinderMethods module, we can see find_nth calls, in turn, find_nth_with_limit:
Now the arguments have been reversed; the first argument, 0, is now the offset and the second, 1, is the number of records we want, or the limit value.
Substituting one more time, let’s replace find_nth_with_limit with more detailed calls it makes to order and limit:
Now you can see where all of the values in the final ActiveRecord::Relation object come from. Each call to a scoping method saves a new piece of information about our query, and returns a new instance of the ActiveRecord::Relation class. (We’ll see what arel_table means in a minute.)
The Beauty of ActiveRecord::Relation
Taking a step back, we can see that our simple line of code, User.where(name: "Captain Nemo").first, is creating a series of ActiveRecord::Relation objects like this:
There are two interesting and beautiful things about this, I think. First, notice the pattern that ActiveRecord uses: each call to a method returns a new instance of the class that implemented that method. This is what allows us to easily chain together different method calls. We can add on as many different scopes as we wish; because each new object is also an ActiveRecord::Relation, it implements all of the same methods. You can use the same pattern in your own code. All you need to do is create a new instance of the class that implements each method, and return that. One reason to study internal code like this is to learn about and find new ideas that you can use in your own code.
The second beautiful thing about ActiveRecord::Relation is that it’s lazy. Using this chain of method calls we are building up metadata or information about our query, without actually executing the query itself. It’s almost as if we were using a functional programming language like Haskell or Lisp. Using this trick, ActiveRecord allows us to specify exactly the query we want, without having to worry about executing it until we’re ready.
It’s not until we call the to_a method, in other to convert the ActiveRecord::Relation object into an array and access the result set, that ActiveRecord executes the query:
You can see here that to_a calls load internally, which later calls the DatabaseStatements#select_all method. Note the find_nth_with_limit method calls to_a, so first, second and forty_two are not lazy and will all execute the query immediately. Because of this, these are known as “terminating methods.” To prevent the query from executing immediately – to keep it lazy – just use order and limit instead.
In the next few days I’ll post the second part of my presentation from Barcelona. We’ll look at what “Relational Algebra” means and how the Arel gem converts our ActiveRecord::Relation object into a string containing a SQL statement.