Over a million developers have joined DZone.

MongoDB and Non-Existent Collections

The differences you don't expect because you don't think about them are the hardest ones to get over.

· Database Zone

Sign up for the Couchbase Community Newsletter to stay ahead of the curve on the latest NoSQL news, events, and webinars. Brought to you in partnership with Coucbase.

In this blog, I will discuss how I found some of my basic SQL assumptions that don't hold true when dealing with MongoDB and non-existent collections.

Coming from a MySQL background, I have some assumptions about databases that don't apply to MongoDB (or other kinds of databases that are neither SQL-based nor relationally inspired).

An example of this is the assumption that data is organized in rows that are part of a table, with all tables having a strict schema (i.e., a single type of row). When working with MongoDB, this assumption must be transformed into the idea that data is organized in documents that are part of a collection and have a flexible schema so that different types of documents can reside in the same collection.

That's an easy adjustment to make because a dynamic schema is one of the defining features of MongoDB. There are other less-obvious assumptions that need to be adjusted or redefined as you get familiar with a new product like MongoDB (for example, MySQL does not currently support built-in sharding, while MongoDB does).

There is a more fundamental kind of assumption, and by "fundamental," I mean an assumption that is deeply ingrained because you rely on it so often it's automatic (i.e., unconscious). We're usually hit by these when changing programming languages, especially in dynamic ones ("Will I be able to add a number to a string? If so, how will it behave?"). These can make it hard to adjust to a new database (or programming language, operating system, etc.) because we don't consciously think about them and so we may forget to verify if they hold in the new system. This can happen out in the world, too: Try going to a country where cars drive on the other side of the road from yours!

While working on a MongoDB benchmark recently, I was hit by one of these assumptions. I thought sharing my mistake may help others who are also coming to MongoDB from an SQL background.

One of my computing assumptions can be summarized as "reading from a non-existent source will fail with an error."

Sure enough, it seems to be true for my operating system:

And for MySQL:

But what happens in MongoDB?

As I said, I hit this while working on a benchmark. How? I was comparing the throughput for different engines and various configurations, and after preparing the graphs, they all showed the same behavior. While it's not impossible for this to be accurate, it was very unlikely given what I was trying to measure. Some investigation led me to discover a mistake in the preparation phase of my benchmarks. To save time and to use the same data baseline for all tests, I was only running sysbench prepare once, backing up the data directory for each engine, and then restoring this backup before each experiment. The error was that I was restoring one subdirectory below MongoDB's expected dbpath (i.e., to /data/db/db instead of /data/db), and so my scripts were reading from non-existent collections.

On a MySQL experiment, this would have immediately blown up in my face; with MongoDB, that is not the case.

On reflection, this behavior makes sense for MongoDB in that it is consistent with the write behavior. You don't need to create a new collection, or even a new database. It's enough that you write a document to it, and it gets created for you. If writing to a non-existent collection produces no errors, reading from one shouldn't either.

Still, sometimes an application needs to know if a collection exists. How can you do this? There are multiple ways, and I think the best approach is to verify their existence during the application initialization stage. Here are a couple of examples:

The Getting Started with NoSQL Guide will get you hands-on with NoSQL in minutes with no coding needed. Brought to you in partnership with Couchbase.

Topics:
mongodb ,collections

Published at DZone with permission of Fernando Ipar. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}