What’s New in MongoDB 3.4 (Part 2): Running Mission-Critical Applications
The latest edition of MongoDB has seen improvements in sharding and data distribution, security, and scalability. Get an in-depth look at how it will happen.
Join the DZone community and get the full member experience.Join For Free
Welcome to part 2 of our 3-part MongoDB 3.4 blog series.
- In part one, we demonstrated the extended multimodel capabilities of MongoDB 3.4, including native graph processing, faceted navigation, rich real-time analytics, and powerful connectors for BI and Apache Spark
- In part three, we’ll conclude with the modernized DBA and Ops tooling available in the new release
In this post, I’ll cover the enhanced capabilities for running mission-critical applications, including geo-distributed MongoDB zones, elastic clustering, tunable consistency, and enhanced security controls.
Remember, if you want to get the detail now on everything the new release offers, download the What's New in MongoDB 3.4 white paper.
MongoDB Zones: Data Distribution
MongoDB provides horizontal scale-out for databases by partitioning data across low cost, commodity hardware using a technique called sharding. While many NoSQL databases also offer scale-out designs, MongoDB uniquely supports multiple sharding policies that give administrators precise control over how data is distributed across a cluster. As a result, data can be sharded according to query patterns or environmental considerations, providing higher scalability over diverse workloads and deployment architectures:
- Range sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values close to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range based queries.
- Hash sharding. Documents are distributed according to an MD5 hash of the shard key value. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries.
- Zone sharding. Provides the ability for DBAs and operations teams to define specific rules governing data placement in a sharded cluster.
MongoDB zones (superceding tag-aware sharding in earlier MongoDB releases) allow precise control over where data is physically stored, accommodating a range of deployment scenarios – for example by geographic region, by hardware configuration, or by application feature. Administrators can continuously refine data placement rules by modifying shard key ranges, and MongoDB will automatically migrate the data to its new zone. MongoDB 3.4 adds new helper functions and additional options in Ops Manager and Cloud Manager to configure zones – essential for managing large deployments.
Figure 1: Configuration of geographically distributed MongoDB zones via Ops Manager GUI
The most popular use cases for MongoDB zones include the following, along with links to tutorials demonstrating how to configure zones for each deployment pattern:
- Geographically Distributed Clusters with MongoDB zones: Users can create zones in multiple geographic regions. Each Zone is part of the same, single cluster and can be queried globally, but data resides in the correct location based on sovereignty and local access requirements. By associating data to shards based on user location, administrators are able to maintain low latency access.
- Localized Writes in a Distributed Cluster with MongoDB zones: Provides a solution for the continuous availability of insert-only workloads such as the ingestion of sensor data in IoT applications. Zones can be used to create configurations specifically for localized writes in a distributed cluster, ensuring there is always a node available to accept inserts, even during a data center failure.
- Tiered Storage with MongoDB zones. By implementing a tiered storage pattern, the most recent data can be located on the highest performance hardware with fast CPUs and SSDs, while aged data can be moved onto slower, but less expensive instances based on conventional, high capacity spinning disks.
- Application Affinity with MongoDB zones. Data for a specific application feature or customer can be associated to specific zones. For instance, a company offering Software-as-a-Service (SaaS) may assign users on its free usage tier to shards provisioned onto lower specified hardware, while paying customers are allocated to richly configured, premium infrastructure.
Many modern workloads have unpredictable performance and capacity demands. Loads can quickly spike in response to specific events, and then stabilize to regular levels. Responding to such dynamic workloads requires elastic database clusters that allow for capacity to be seamlessly added and removed on demand. This allows the business to get the scalability it needs, when it needs it, and can then reduce resources when demand drops, thus avoiding over provisioning and containing costs. MongoDB 3.4 adds a range of enhancements to further support elastic clusters.
Faster Cluster Balancing and Node Synchronization
The MongoDB sharded cluster balancer, responsible for evenly distributing data across the nodes of a cluster, has been improved, allowing users to scale capacity quickly and easily with minimal operational overhead.
The balancer process now supports parallel data migrations. Multiple node pairs can perform balancing migrations simultaneously, significantly improving balancing throughput as nodes are added or removed from the cluster, or as data is redistributed across nodes. An individual node can be involved in at most one migration at a time, so the benefits of parallelized balancing will be observed in clusters with four or more shards. In addition, with WiredTiger as the default MongoDB storage engine, balancer throttling that was necessary with the earlier MMAP engine is now off by default, which dramatically speeds migrations, by as much as 10x in some deployments.
Adding new replica set members to a MongoDB cluster has also been improved with an optimized “initial sync” process. Initial sync is implemented by copying all data from an existing replica to the newly added replica member. Initial sync is typically used when adding nodes to a cluster, migrating to a new MongoDB storage engine, or restoring a node that has fallen too far behind the replication process. MongoDB 3.4 offers the following enhancements to initial sync:
- Indexes are now created as data is copied, rather than after copying is complete, therefore reducing IO overhead and improving overall initial sync times, especially when synchronizing large data sets between nodes.
- The initial sync retry logic has been updated to be highly resistant to transient network issues, which significantly reduces the need to restart the copying process.
Intra-Cluster Network Compression
As a distributed database, MongoDB relies on efficient network transport during query routing and inter-node replication. MongoDB 3.4 introduces a new option to compress the wire protocol used for intra-cluster communications. Based on the snappy compression algorithm, network traffic can be compressed by up to 70%, providing major performance benefits in bandwidth-constrained environments, and reduced networking costs. One early access tester was able to reduce its networking bill by over $20,000 a month after configuring network compression.
Compressing and decompressing network traffic requires CPU resources – typically imposing a low single digit percentage overhead. Compression is ideal for those environments where performance is bottlenecked by bandwidth, and sufficient CPU capacity is available.
MongoDB 3.4 adds a new readConcern level of “linearizable”. This option confirms the primary replica is still connected to a quorum (majority) of replica nodes before returning results to the client. When used to perform reads against a single document, linearizable read concern provides two guarantees:
- First, it guarantees that the returned data reflects only writes that are committed to a majority of nodes in the replica set, and therefore will not roll back in the future as a result of a replica set election.
- Second, it guarantees that the read is not stale. This means that the returned data reflects the last write operation to the document that successfully replicated to a majority of nodes. If a new primary replica is elected and a client writes to a document using that new primary – and that write propagates to a majority of nodes – a subsequent read by any client using linearizable read concern will be guaranteed to reflect that write or return an error, regardless of which node is used to service the read.
In order to provide the extra guarantees, using linearizable read concern level will have a significant impact on read latency.
With the linearizable read concern, MongoDB offers among the strongest data consistency guarantees of any modern, distributed database. You can learn more by reviewing the linearizable read concern documentation.
Expanded Platform Support
As MongoDB adoption accelerates, there has been growing demand to run the database on a more diverse range of platforms to support a broader set of use-cases:
- MongoDB 3.4 has been ported to the ARM v8-64 bit platform, supporting new generations of power-efficient servers being deployed into ultra-dense data center racks.
- MongoDB 3.4 has been ported to IBM’s POWER8 and zSeries platforms, providing a seamless migration for large enterprises modernizing legacy workloads as part of digital transformation initiatives. The port is available for the MongoDB Enterprise Server, available as part of MongoDB Enterprise Advanced.
All of these new ports are available from the MongoDB download center.
Enterprise-Grade Security for Regulatory Compliance
With widespread usage across financial services, healthcare, retail, and government, MongoDB offers some of the most extensive security controls available in modern databases. Robust access control, end-to-end encryption, and auditing for forensic analysis enable organizations to build regulatory compliant apps. MongoDB 3.4 further extends security protection with new LDAP authorization and read-only views.
MongoDB 3.4 extends existing support for authenticating users via LDAP to now include LDAP authorization as well. This enables existing user privileges stored in the LDAP server to be mapped to MongoDB roles, without users having to be recreated in MongoDB itself. When configured with an LDAP server for authorization, MongoDB 3.4 will allow user authentication via LDAP, Active Directory, Kerberos, or X.509 without requiring local user documents in the $external database. When a user successfully authenticates, MongoDB will perform a query against the LDAP server to retrieve all groups the LDAP user is a member of, and will transform those groups into their equivalent MongoDB roles.
MongoDB 3.4 now leverages native platform libraries to integrate with LDAP. This removes the need for the external sasld dependencies and configuration required in earlier releases, while also adding support for LDAP when running MongoDB on Windows. In addition, LDAP authentication and authorization can now be configured in Ops Manager, rather than separately via the command line on each MongoDB node.
You can learn more about the from the documentation. MongoDB Enterprise Advanced is required to take advantage of LDAP integration.
New in MongoDB 3.4, DBAs can define non-materialized views that expose only a subset of data from an underlying collection, i.e. a view that filters out specific fields, such as Personally Identifiable Information (PII) from sales data or health records, or filter out entire documents, such as customers who have opted out of marketing communications. As a result, risks of data exposure are dramatically reduced. DBAs can define a view of a collection that's generated from an aggregation over another collection(s) or view. Permissions granted against the view are specified separately from permissions granted to the underlying collection(s). This capability allows organizations to more easily meet compliance standards in regulated industries by restricting access to sensitive data, without creating the silos that emerge when data has to be broken apart to reflect different access privileges.
Views can also contain computed fields – for example summarizing total and average order value per region, without exposing underlying customer data. All of this can be done without impacting the structure or content of the original source collections. Developers and DBAs can modify the underlying collection’s schema without impacting applications using the view.
As views are non-materialized, the view data is generated dynamically by reading from the underlying collections when a user queries the view. This reduces data duplication in the database, and eliminates inconsistencies between the base data and view.
Views are defined using the standard MongoDB Query Language and aggregation pipeline. They allow the inclusion or exclusion of fields, masking of field values, filtering, schema transformation, grouping, sorting, limiting, and joining of data using $lookup and $graphLookup to another collection.
You can learn more about MongoDB read-only views from the documentation.
That wraps up the second part of our three-part blog series. Remember, you can get the detail now on everything packed into the new release by downloading the What’s New in MongoDB 3.4 white paper.
Published at DZone with permission of Mat Keep, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.