Today MongoDB has seen a new big release, 2.4. Between the features that have been added we can see hash-based sharding, capped arrays, and a brand new Text Search feature.
Moreover, MongoDB will now have an Enteprise edition, containing for example monitoring and alerting options; role-based authentication for clusters and servers administrators; support for the Kerberos protocol in authenticating connection.
I have held a Q&A with Kelly Stirman, head of product marketing, which I thank for the time he has spent in answering my inquiries about the new features. Here I report his answers verbatim, along with links to the relevant documentation for the ones of you that want to know more.
1. I have not used sharding yet (only replicas). I see that MongoDB already supports sharding on any shard key (with high cardinality). What does hash-based add to the existing support? Better write balancing due to the uniform distribution of the hash function?
Hash-based sharding builds on the existing sharding capabilities of MongoDB. Users still select a shard key, but MongoDB will now distribute documents based on a hash of the shard key, which results in a uniform distribution of documents across shards. Hash-based sharding should be easier for users to set up. In some cases range-based sharding is the best choice for an application. For example, queries that specify a range of shard keys will be routed to all shards with hash-based sharding, whereas with range-based sharding these queries will only be routed to the appropriate shard or shards.
2. Is it correct to say that a use case for capped arrays is that of a document containing a list of top-k items according to some metric?
Yes, top-k item lists, such as leaderboards in gaming applications, or logs are good examples.
3. How does the API configure the cap? On insertion or at the collection level?
These are new options for the push operator.
4. Is Text Search eventually consistent, like the other indexing methods?
Indexes in MongoDB are fully consistent, not eventually consistent. This is also true for text indexes and one of the appeals of using MongoDB for applications that provide search functionality.
5. Stemming and tokenization are interesting. Why did 10gen decide they should reside at the database layer instead of in another infrastructure like Solr?
Text search is one of the most popular enhancements requested by the MongoDB community since the inception of the project. Stemming and tokenization are important parts of building a text index. For applications that integrate an external search engine there is always some degree of latency between updates to the data and search engine index. Furthermore, maintaining additional infrastructure for a search engine adds complexity and costs to an application. Finally, deploying a search engine in a fault-tolerant system across data centers is non-trivial, whereas with MongoDB this is relatively simple to do.
6. As part of MongoDB 2.4, replication times are improved. What changes with respect to the usual 30-second election?
Election of a new primary can take a few seconds to complete. Replication speed is improved in MongoDB 2.4 for initial sync of new replicas. MongoDB 2.4 is also more intelligent about determining when a primary fails so that network hiccups do not trigger unnecessary failovers.
7. The Enterprise version provides monitoring on 100 operational metrics. Can you tell us a bit more?On-Premise Monitoring is included with the Enterprise subscription. This is based on the application that powers MongoDB Monitoring Service(MMS), in use by over 15,000 MongoDB systems today.