The Windows Azure ecosystem is being extended both by Microsoft and by third-party providers. This post focuses on one of these extensions – the MongoDB document database. It shows how MongoDB can be deployed in a Windows Azure hosted service and accessed from other roles in the hosted service using either traditional .NET or the recently released Node.js support.
During the last 30 years SQL databases have become the dominant data storage systems, with commercial offerings such as Oracle, Microsoft SQL Server and IBM DB2 achieving enormous commercial success. These data systems are characterized by their support for ACID semantics – atomicity, consistency, isolation and durability. These properties impose a certainty that is essential for many business processes dealing with valuable data. However, supporting ACID semantics becomes increasingly expensive as the volume of data increases.
The advent of Web 2.0 has led to an increasing interest in long-tail data whose value comes not from the value of a single piece of data but from the magnitude of the aggregate data. Web logs provide a paradigmatic example of this type of data. Because of the low value of an individual data item this type of data can be managed by data systems which do not need to support full ACID semantics.
Over the last few years a new type of data storage semantics has become fashionable: BASE -basically available, soft state, eventually consistent. While the name may be a bit forced to make the pun, the general idea is that a system adhering to BASE semantics, when implemented as a distributed system, should be able to survive network partitioning of the individual components of the system at the cost of offering a lower level of consistency than available in traditional SQL systems.
In a strongly-consistent system, all reads following a write receive the written data. In an eventually-consistent system, reads immediately following a write are not guaranteed to return the newly written value. However, eventually all writes should return the written value.
NoSQL (not only SQL) is a name used to classify data systems that do not use SQL and which typically implement BASE semantics instead of ACID semantics. NoSQL systems have become very popular and many such systems have been created – most of them distributed in an open source model. Another important feature of NoSQL systems is that, unlike SQL systems, they do not impose a schema on stored entities.
NoSQL systems can be classified by how data is stored as:
In a key-value store, an entity comprises a primary key and a set of properties with no associated schema so that each entity in a “table” can have a different set of properties. Apache Cassandra is a popular key-value store. A document store provides for the storage of semantically richer entities with an internal structure. MongoDB is a popular document store.
Windows Azure Tables is a key-value NoSQL store provided in the Windows Azure Platform. Unlike other NoSQL stores, it supports strong consistency with no eventually-consistent option. The Windows Azure Storage team recently published a paper describing the implementation of Windows Azure Tables.
10gen, the company which maintains and supports MongoDB, has worked with Microsoft to make MongoDB available on Windows Azure. This provides Windows Azure with a document store NoSQL system to complement the key-value store provided by Windows Azure Tables.
MongoDB is a NoSQL document store in which individual entities are persisted as documents inside a collection hosted by a database. A single MongoDB installation can comprise many databases. MongoDB is schemaless so each document in a collection can have a different schema. Consistency is tunable from eventual consistency to strong consistency.
MongoDB uses memory-mapped files and performance is optimal when all the data and indexes fit in memory. It supports automated sharding allowing databases to scale past the limits of a single server. Data is stored in BSON format which can be thought of as a binary-encoded version of JSON.
High availability is supported in MongoDB through the concept of a replica set comprising one primary member and one or more secondary members, each of which contains a copy of all the data. Writes are performed against the primary member and are then copied asynchronously to the secondary members. A safe write can be invoked that returns only when the write is copied to a specified number of secondary members – thereby allowing the consistency level to be tuned as needed. Reads can be performed against secondary members to improve performance by reducing the load on the primary member.
Kyle Banker, of 10gen, has written an excellent book called MongoDB in Action. I highly recommend it to anyone interested in MongoDB.
MongoDB on Windows Azure
David Makogon (@dmakogon) worked out how to deploy MongoDB onto worker role instances on Windows Azure. 10gen then worked with him and the Microsoft Interoperability team to develop an officially supported preview release of the MongoDB on Windows Azure wrapper which simplifies the task of deploying a MongoDB replica set onto worker role instances.
The wrapper deploys each member of the replica set to a separate instance of a worker role. The mongod.exe process for MongoDB is started in the OnStart() role entry point for the instance. The data for each member is persisted as a page blob in Windows Azure Blob storage that is mounted as an Azure Drive on the instance.
The MongoDB on Windows Azure wrapper can be downloaded from github. The download comprises two directory trees: ReplicaSets containing the core software as the MongoDBReplicaSet solution; and SampleApplications containing an MVC Movies sample application named MongoDBReplicaSetMvcMovieSample. The directory trees contain a PowerShell script, solutionsetup.ps1, that must be invoked to download the latest MongoDB binaries.
The MongoDBReplicaSetMvcMovieSample solution contains four projects:
- MongoDBAzureHelper – helper class to retrieve MongoDB configuration
- MongoDBReplicaSetSample – Windows Azure project
- MvcMovie – an ASP.NET MVC 3 application
- ReplicaSetRole – worker role to host the members of a MongoDB replica set.
The MVCMovie project is based on the Intro to ASP.NET MVC 3 sample on asp.net website. It displays movie information retrieved from the MongoDB replica set hosted in the ReplicaSetRole instances. The ReplicaSetRole is launched as 3 medium instances each if which hosts a replica set member. The MongoDBAzureHelper and ReplcaSetRole projects are from the MongoDBReplicaSet solution.
The MongoDBReplicaSetMvcMovieSample solution can be opened in Visual Studio, then built and deployed either to the local compute emulator or a Windows Azure hosted service. The application has two pages: an About page displaying the status of the replica set; and a Movies page allowing movie information to be captured and displayed. It may take a minute or two for the replica set to come fully online and the status to be made available on the About page. When a movie is added to the database via the Movies page, it may occasionally require a refresh for the updated information to become visible on the page. This is because MongoDB is an eventually consistent database and the Movies page may have received data from one of the secondary nodes.
This example provides a general demonstration of how a MongoDB installation with replica sets is added to a Windows Azure project: add the MongoDBAzureHelper and ReplicaSetRole projects from the MongoDBReplicaSet solution and add the appropriate configuration to the ServiceDefinition.csdef and ServiceConfiguration.cscfg files.
The ServiceDefinition.csdef entries for ReplicaSetRole are:
<Endpoints> <InternalEndpoint name=”MongodPort” protocol=”tcp” port=”27017″ /> </Endpoints> <ConfigurationSettings> <Setting name=”MongoDBDataDir” /> <Setting name=”ReplicaSetName” /> <Setting name=”MongoDBDataDirSize” /> <Setting name=”MongoDBLogVerbosity” /> </ConfigurationSettings> <LocalResources> <LocalStorage name=”MongoDBLocalDataDir” cleanOnRoleRecycle=”false” sizeInMB=”1024″ /> <LocalStorage name=”MongodLogDir” cleanOnRoleRecycle=”false” sizeInMB=”512″ /> </LocalResources>
Port 27017 is the standard port for a MongoDB installation. None of the settings need be changed for the sample project.
The ServiceDefinition.csdef entries for MvcMovie are:
<ConfigurationSettings> <Setting name=”ReplicaSetName” /> </ConfigurationSettings>
The ServiceConfiguration.cscfg settings for the ReplicaSetRole are:
<ConfigurationSettings> <Setting name=”MongoDBDataDir” value=”UseDevelopmentStorage=true” /> <Setting name=”ReplicaSetName” value=”rs” /> <Setting name=”MongoDBDataDirSize” value=”" /> <Setting name=”MongoDBLogVerbosity” value=”-v” /> <Setting name=”Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString” value=”UseDevelopmentStorage=true” /> </ConfigurationSettings>
The ServiceConfiguration.cscfg settings for the MvcMovie web role are:
<ConfigurationSettings> <Setting name=”ReplicaSetName” value=”rs” /> <Setting name=”Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString” value=”UseDevelopmentStorage=true” /> </ConfigurationSettings>
It is critical that the value for the ReplicaSetName be the same in both the MvcMovie web role and the ReplicaSets worker role.
The replica set can also be accessed from the MongoDB shell, mongo, once it is running in the compute emulator. This is a useful way to ensure that everything is working since it provides a convenient way of accessing the data and managing the replica set. The application data is stored in the movies collection in the movies database. For example, the rs.stepDown() command can be invoked on the primary member to demote it to a secondary. MongoDB will automatically select one of the secondary members and promote it to primary. Note that in the compute emulator, the 3 replica set members are hosted at port numbers 27017, 27018 and 27019 respectively.
This sample demonstrates the process of adding support for MongoDB to a Windows Azure solution with a web role.
1) Add the following MongoDB projects to the solution;
2) Add the MongoDB assemblies to the web role:
- Mongo.DB.Bson (copy local)
- Mongo.DB.Driver (copy local)
- MongoDBAzureHelper (from the added project)
3) Add the MongoDB settings described earlier to the Windows Azure service configuration.
MongoDB on Windows Azure with Node.js
The SDK provides an extensive set of PowerShell scripts – such as Add-AzureNodeWebRole, Start-AzureEmulator, and Publish-AzureService – which simplify the lifecycle management of developing and deploying a Node.js web application. It contains several introductory tutorials including a Node.js Web Application with Storage on MongoDB. This tutorial shows how to add a replica set implemented, as described earlier, to a web application developed in Node.js.
The MongoDB on Windows Azure wrapper and the Windows Azure SDK for Node.js tutorial have made it very easy to try MongoDB out in a Windows Azure hosted service.