Over a million developers have joined DZone.

What’s New in TokuMX 1.4, Part 1: Primary Keys

· Java Zone

Discover how AppDynamics steps in to upgrade your performance game and prevent your enterprise from these top 10 Java performance problems, brought to you in partnership with AppDynamics.

We just released version 1.4.0 of TokuMX, our high-performance distribution of MongoDB. There are a lot of improvements in this version (release notes), the most of any release yet. In this series of blog posts, we describe the most interesting changes and how they’ll affect users.

MongoDB doesn’t have a “primary key” the way SQL databases do. In vanilla MongoDB, all documents are stored in arbitrary order in a heap, and the “_id” index and all secondary indexes point into that heap. In TokuMX, the documents are clustered with the _id index, and all secondary indexes store a copy of the _id for each document so they can look up the full document in the _id index for non-covering queries.

In this way, TokuMX collections sort of have a primary key, but it’s always the _id index. It’s just a clustering index that non-clustering secondary indexes use to reference the full document.

In TokuMX 1.4.0, we are introducing a new feature that makes the primary key user-definable, so it doesn’t always have to be the index on {_id: 1}. By setting the primaryKey field of a collection create command, you can define the primary key to any compound key.

To ensure uniqueness, we require that the end of the key is “_id: 1”, and we automatically create an additional unique, non-clustering index on {_id: 1}. This way, the primary key will be clustering and unique, but we’ll only do the unique checks on the non-clustering _id index, where it will be inexpensive if you allow TokuMX to auto-generate values for _id.

This will define the default sort order (“$natural” order) of the collection, and essentially lets you have a clustering key without always having a second clustering index on the _id, if you don’t need it and want to save on storage costs. Keep in mind, though, that the primary key for each document appears as a reference in every non-clustering secondary index, so if you insert documents with large fields in the primary key, that will make all your secondary indexes bigger. Also, you won’t be able to save any documents where the fields in the primary key are arrays or regexes.

To see it in action, simply run a command like

> db.createCollection(“foo”, {primaryKey: {a: 1, b: 1, _id: 1}})
{ "ok" : 1 }

and you can see it work:

> db.foo.getIndexes()
                "key" : {
                        "a" : 1,
                        "b" : 1,
                        "_id" : 1
                "unique" : true,
                "ns" : "test.foo",
                "name" : "primaryKey",
                "clustering" : true
                "key" : {
                        "_id" : 1
                "unique" : true,
                "ns" : "test.foo",
                "name" : "_id_"

Later this week, I’ll explain why we added this feature and what else it’s used for in TokuMX 1.4.

Want to check out the newest version of TokuMX?  Download TokuMX 1.4.0 here:

MongoDB DownloadMongoDB Download

The Java Zone is brought to you in partnership with AppDynamics. AppDynamics helps you gain the fundamentals behind application performance, and implement best practices so you can proactively analyze and act on performance problems as they arise, and more specifically with your Java applications. Start a Free Trial.


Published at DZone with permission of Leif Walsh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}