Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

What’s New in TokuMX 1.4, Part 1: Primary Keys

DZone's Guide to

What’s New in TokuMX 1.4, Part 1: Primary Keys

· Java Zone ·
Free Resource

Get the Edge with a Professional Java IDE. 30-day free trial.

We just released version 1.4.0 of TokuMX, our high-performance distribution of MongoDB. There are a lot of improvements in this version (release notes), the most of any release yet. In this series of blog posts, we describe the most interesting changes and how they’ll affect users.

MongoDB doesn’t have a “primary key” the way SQL databases do. In vanilla MongoDB, all documents are stored in arbitrary order in a heap, and the “_id” index and all secondary indexes point into that heap. In TokuMX, the documents are clustered with the _id index, and all secondary indexes store a copy of the _id for each document so they can look up the full document in the _id index for non-covering queries.

In this way, TokuMX collections sort of have a primary key, but it’s always the _id index. It’s just a clustering index that non-clustering secondary indexes use to reference the full document.

In TokuMX 1.4.0, we are introducing a new feature that makes the primary key user-definable, so it doesn’t always have to be the index on {_id: 1}. By setting the primaryKey field of a collection create command, you can define the primary key to any compound key.

To ensure uniqueness, we require that the end of the key is “_id: 1”, and we automatically create an additional unique, non-clustering index on {_id: 1}. This way, the primary key will be clustering and unique, but we’ll only do the unique checks on the non-clustering _id index, where it will be inexpensive if you allow TokuMX to auto-generate values for _id.

This will define the default sort order (“$natural” order) of the collection, and essentially lets you have a clustering key without always having a second clustering index on the _id, if you don’t need it and want to save on storage costs. Keep in mind, though, that the primary key for each document appears as a reference in every non-clustering secondary index, so if you insert documents with large fields in the primary key, that will make all your secondary indexes bigger. Also, you won’t be able to save any documents where the fields in the primary key are arrays or regexes.

To see it in action, simply run a command like

> db.createCollection(“foo”, {primaryKey: {a: 1, b: 1, _id: 1}})
{ "ok" : 1 }

and you can see it work:

> db.foo.getIndexes()
[
        {
                "key" : {
                        "a" : 1,
                        "b" : 1,
                        "_id" : 1
                },
                "unique" : true,
                "ns" : "test.foo",
                "name" : "primaryKey",
                "clustering" : true
        },
        {
                "key" : {
                        "_id" : 1
                },
                "unique" : true,
                "ns" : "test.foo",
                "name" : "_id_"
        }
]

Later this week, I’ll explain why we added this feature and what else it’s used for in TokuMX 1.4.

Want to check out the newest version of TokuMX?  Download TokuMX 1.4.0 here:

MongoDB Download MongoDB Download

Get the Java IDE that understands code & makes developing enjoyable. Level up your code with IntelliJ IDEA. Download the free trial.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}