MongoDB is a document-based NoSQL database that bridges the gap between scalable key-value stores like Datastore and Memcache DB, and RDBMS's querying and robustness capabilities. Some of its main features include:
- Document-oriented storage - data is manipulated as JSON-like documents
- Querying - uses JavaScript and has APIs for submitting queries in every major programming language
- In-place updates - atomicity
- Indexing - any attribute in a document may be used for indexing and query optimization
- Auto-sharding - enables horizontal scalability
- Map/reduce - the MongoDB cluster may run smaller MapReduce jobs than a Hadoop cluster with significant cost and efficiency improvements
MongoDB implements its own file system to optimize I/O throughput by dividing larger objects into smaller chunks. Documents are stored in two separate collections: files containing the object meta-data, and chunks that form a larger document when combined with database accounting information. The MongoDB API provides functions for manipulating files, chunks, and indices directly. The administration tools enable GridFS maintenance.
![Figure5](/storage/rc-covers/14991-thumb.png)
Figure 5 - MongoDB Cluster
A MongoDB cluster consists of a master and a slave. The slave may become the master in a fail-over scenario, if necessary. Having a master/slave configuration (also known as Active/Passive or A/P cluster) helps ensure data integrity since only the master is allowed to commit changes to the store at any given time. A commit is successful only if the data is written to GridFS and replicated in the slave
![Hot Tip](/storage/rc-covers/14992-thumb.png)
MongoDB also supports a limited master/master configuration. It's useful only for inserts, queries, and deletions by specific object ID. It must not be used if updates of a single object may occur concurrently.
Caching
MongoDB has a built-in cache that runs directly in the cluster without external requirements. Any query is transparently cached in RAM to expedite data transfer rates and to reduce disk I/O.
Document Format
MongoDB handles documents in BSON format, a binary-encoded JSON representation. BSON is designed to be traversable, lightweight and efficient. Applications can map BSON/JSON documents using native representations like dictionaries, lists, and arrays, leaving the BSON translation to the native MongoDB driver specific to each programming language.
BSON Example
{
'name' : 'Tom',
'age' : 42
}
Language |
Representation |
Python |
{
'name' : 'Tom',
'age' : 42
}
|
Ruby |
{
"name" => "Tom",
"age" => 42
}
|
Java |
BasicDBObject d;
d = new BasicObject();
d.put("name", "Tom");
d.put("age", 42);
|
PHP |
array( "name" => "Tom",
"age" => 42);
|
Dynamic languages offer a closer object mapping to BSON/JSON than compiled languages.
The complete BSON specification is available from: http://bsonspec.org/
MongoDB Programming
Programming in MongoDB requires an active server running the Mongod and the Mongos database daemons (see Figure 5), and a client application that uses one of the languagespecific drivers.
![Hot Tip](/storage/rc-covers/14993-thumb.png)
All the examples in this Refcard are written in Python for conciseness.
Starting the Server
Log on to the master server and execute:
[servername:user] ./Mongod
The server will display its status messages to the console unless stdout is redirected elsewhere.
Programming Example
This example allocates a database if one doesn't already exist,instantiates a collection on the server, and runs a couple of queries.
The MongoDB Developer Manual is available from:
http://www.Mongodb.org/display/DOCS/Manual
#!/usr/bin/env jython
import pyMongo
from pyMongo import Connection
connection = Connection('servername', 27017)
db = connection['people_database']
peopleList = db['people_list']
person = {
'name' : 'Tom',
'age' : 42 }
peopleList.insert(person)
person = {
'name' : 'Nancy',
'age' : 69 }
peopleList.insert(person)
# find first entry:
person = peopleList.find_one()
# find a specific person:
person = peopleList.find_one({ 'name' : 'Joe'})
if person is None:
print "Joe isn't here!"
else:
print person['age']
# bulk inserts
persons = [{ 'name' : 'Joe' }, {'name' : 'Sue'}]
peopleList.insert(persons)
# queries with multiple results
for person in peopleList.find():
print person['name']
for person in peopleList.find({'age' : {'$ge' : 21}}).sort('name'):
print person['name']
# count:
nDrinkingAge = peopleList.find({'age' : {'$ge' : 21}}).count()
# indexing
from pyMongo import ASCENDING, DESCENDING
peopleList.create_index([('age', DESCENDING), ('name', ASCENDING)])
The PyMongo documentation is available at: http://api.Mongodb.org/python - guides for other languages are also available from this web site.
The code in the previous example performs these operations:
- Connect to the database server started in the pr evious section
- Attach a database; notice that the database is tr eated like an associative array
- Get a collection (loosely equivalent to a table in a relational database), treated like an associative array
- Insert one or mor e entities
- Query for one or mor e entites
Although MongoDB treats all these data as BSON internally, most of the APIs allow the use of dictionary-style objects to streamline the development process.
Object ID
A successful insertion into the database results in a valid Object ID. This is the unique identifier in the database for a given document. When querying the database, a return value will include this attribute:
{
"name" : "Tom",
"age" : 42,
"_id" : ObjectId('999999')
}
Users may override this Object ID with any ar gument as long as it's unique, or allow MongoDB to assign one automatically.
Common Use Cases
- Caching - more robust capabilities, plus persistence, when compared against a pure caching system
- High volume processing - RDBMS may be too expensive or slow to run in comparison
- JSON data and program objects storage - many RESTful web services provide JSON data; they can be stored in MongoDB without language serialization overhead (especially when compared against XML documents)
- Content management systems - JSON/BSON objects can represent any kind of document, including those with a binary representation
MongoDB Drawbacks
- No JOIN operations - each document is stand-alone
- Complex queries - some complex queries and indices are better suited for SQL
- No row-level locking - unsuitable for transactional data without error prone application-level assistance
If any of these is part of the functional requirements, a SQL database would be better suited for the application.
{{ parent.title || parent.header.title}}
{{ parent.tldr }}
{{ parent.linkDescription }}
{{ parent.urlSource.name }}