Gain a Better Understanding of MongoDB Master/Slave Configuration
Join the DZone community and get the full member experience.
Join For FreeBasics first: what is master/slave?
One database server (the “master”) is in charge and can do anything. A bunch of other database servers keep copies of all the data that’s been written to the master and can optionally be queried (these are the “slaves”). Slaves cannot be written to directly, they are just copies of the master database. Setting up a master and slaves allows you to scale reads nicely because you can just keep adding slaves to increase your read capacity. Slaves also make great backup machines. If your master explodes, you’ll have a copy of your data safe and sound on the slave.
A handy-dandy comparison chart between master database servers and slave database servers:
Master | Slave | |
---|---|---|
# of servers | 1 | ∞ |
permissions | read/write | read |
used for | queries, inserts, updates, removes | queries |
So, how do you set up Mongo in a master/slave configuration? Assuming you’ve downloaded MongoDB from mongodb.org, you can start a master and slave by cutting and pasting the following lines into your shell:
$ mkdir -p ~/dbs/master ~/dbs/slave $ ./mongod --master --dbpath ~/dbs/master >> ~/dbs/master.log & $ ./mongod --slave --port 27018 --dbpath ~/dbs/slave --source localhost:27017 >> ~/dbs/slave.log &
(I’m assuming you’re running *NIX. The commands for Windows are similar, but I don’t want to encourage that sort of thing).
What are these lines doing?
- First, we’re making directories to keep the database in (~/dbs/master and ~/dbs/slave).
- Now we start the master, specifying that it should put its files in the ~/dbs/master directory and its log in the ~/dbs/master.log file. So, now we have a master running on localhost:27017.
- Next, we start the slave. It needs to listen on a different port than the master since they’re on the same machine, so we’ll choose 27018. It will store its files in ~/db/slave and its logs in ~/dbs/slave.log. The most important part is letting it know who’s boss: the –source localhost:27017 option lets it know that the master it should be reading from is at localhost:27017.
There are tons of possible master/slave configurations. Some examples:
- You could have a dozen slave boxes where you want to distribute the reads evenly across them all.
- You might have one wimpy little slave machine that you don’t want any reads to go to, you just use it for backup.
- You might have the most powerful server in the world as your master machine and you want it to handle both reads and writes… unless you’re getting more than 1,000 requests per second, in which case you want some of them to spill over to your slaves.
In short, Mongo can’t automatically configure your application to take advantage of your master-slave setup. Sorry. You’ll have to do this yourself. (Edit: the Python driver actually does handle case 1 for you, see Mike’s comment.)
However, it’s not too complicated, especially for what MG wants to do. MG is using 3 servers: a master and two slaves, so we need three connections: one to the master and one to each slave. Assuming he’s got the master at master.example.com and the slaves at slave1.example.com and slave2.example.com, he can create the connections with:
$master = new Mongo("master.example.com:27017"); $slave1 = new Mongo("slave1.example.com:27017"); $slave2 = new Mongo("slave2.example.com:27017");
This next bit is a little nasty and it would be cool if someone made a framework to do it (hint hint). What we want to do is abstract the master-slave logic into a separate layer, so the application talks to the master slave logic which talks to the driver. I’m lazy, though, so I’ll just extend the MongoCollection class and add some master-slave logic. Then, if a person creates a MongoMSCollection from their $master connection, they can add their slaves and use the collection as though it were a normal MongoCollection. Meanwhile, MongoMSCollection will evenly distribute reads amongst the slaves.
class MongoMSCollection extends MongoCollection { public $currentSlave = -1; // call this once to initialize the slaves public function addSlaves($slaves) { // extract the namespace for this collection: db name and collection name $db = $this->db->__toString(); $c = $this->getName(); // create an array of MongoCollections from the slave connections $this->slaves = array(); foreach ($slaves as $slave) { $this->slaves[] = $slave->$db->$c; } $this->numSlaves = count($this->slaves); } public function find($query, $fields) { // get the next slave in the array $this->currentSlave = ($this->currentSlave+1) % $this->numSlaves; // use a slave connection to do the query return $this->slaves[$this->currentSlave]->find(); } }
To use this class, we instantiate it with the master database and then add an array of slaves to it:
$master = new Mongo("master.example.com:27017"); $slaves = array(new Mongo("slave1.example.com:27017"), new Mongo("slave2.example.com:27017")); $c = new MongoMSCollection($master->foo, "bar"); $c->addSlaves($slaves);
Now we can use $c like a normal MongoCollection. MongoMSCollection::find will alternate between the two slaves and all of the other operations (inserts, updates, and removes) will be done on the master. If MG wants to have the master handle reads, too, he can just add it to the $slaves array (which might be better named the $reader array, now):
$slaves = array($master, new Mongo("slave1.example.com:27017"), new Mongo("slave2.example.com:27017"));
Alternatively, he could change the logic in the MongoMSCollection::find method.
Edit: as of version 1.4.0, slaveOkay is not neccessary for reading from slaves. slaveOkay should be used if you are using replica sets, not –master and –slave. Thus, the next section doesn’t really apply anymore to normal master/slave.
The only tricky thing about Mongo’s implementation of master/slave is that, by default, a slave isn’t even readable, it’s just a way of doing backup for the master database. If you actually want to read off of a slave, you have to set a flag on your query, called “slaveOkay”. Instead of saying:
$cursor = $slave->foo->bar->find();
we have:
$cursor = $slave->foo->bar->find()->slaveOkay();
Or, because this is a pain in the ass to set for every query (and almost impossible to do for findOnes unless you know the internals) you can set a static variable on MongoCursor that will hold for all of your queries:
MongoCursor::$slaveOkay = true;
And now you will be allowed to query your slave normally, without calling slaveOkay() on each cursor.
Opinions expressed by DZone contributors are their own.
Comments