NoSQL Zone is brought to you in partnership with:
  • submit to reddit
A. Jesse Jiryu Davis04/24/14
0 replies

Rewriting PyMongo's BSON Decoder: An Enlightening Failure

The author plans to rewrite PyMongo's BSON decoder. The decoder is written in C, and it's fast, but he had a radical idea for how to make it faster. That idea turned out to be wrong. Discovering he was wrong was the best way to learn, but the second-best way is by writing, so here is a story about his wrong idea.

Michael Hunger04/24/14
0 replies

Importing Forests into Neo4j

Sometimes you don’t see the forest for the trees. But if you do, you probably use a graph database. Trees are one of the simple graph datastructures, directed acyclic graphs (DAGs). For our example we use a time-tree that we want to import into the database.

Alec Noller04/23/14
0 replies

MongoDB & Bitcoin: How NoSQL Design Flaws Brought Down Two Exchanges

On March 3rd, the Bitcoin exchange Flexcoin shut down after being hacked and catastrophically robbed. The hack was made possible, according to Emin Gün Sirer, by a concurrency problem brought on by the use of NoSQL databases. However, this wasn't just a fluke.

Zardosht Kasheff04/23/14
0 replies

On TokuMX Oplog, Tailable Cursors, and Concurrency

In a post last week, the author described the difference in concurrency behavior between MongoDB’s oplog and TokuMX’s oplog. In this article, you'll find the key differences, an explanation of tailable cursors, and more.

Shane Johnson04/23/14
0 replies

Modern Big Data > Hadoop

That's right. A modern big data solution requires more than Hadoop. Welcome to the data, it's all big and fast. Welcome to Big Data Central.

Mark Needham04/22/14
0 replies

Remote Profiling Neo4j Using yourkit

yourkit is the author's favorite JVM profiling tool, and while it’s really easy to profile a local JVM process, sometimes he needs to profile a process on a remote machine. In that case we need to first have the remote JVM started up with a yourkit agent parameter passed as one of the args to the Java program.

Zardosht Kasheff04/22/14
0 replies

On TokuMX (and MongoDB) Replication and Transactions

In the author's last post, he described the differences between a TokuMX oplog entry and a MongoDB oplog entry. In this post, he wants to elaborate on why multi-statement transactions cause changes to the oplog, and explain how they changed replication to support arbitrarily large transactions.

Ayende Rahien04/22/14
0 replies

My Distributed Build System

Yes, the author knows that you are probably getting geared up to hear about some crazy setup, and in some manner, it is crazy. His distributed build system is this.

Ayende Rahien04/21/14
0 replies

Reduce ^ 2 in RavenDB

How do you re-reduce the results of a map/reduce? It is a really nice feature on the surface, but it has a lot of implications. For example, when and how do you run the second reduce, can you chain only one time or multiple times, what happens when there are a lot of reduce results, and so on.

Michael Hunger04/21/14
0 replies

Sampling A Neo4j Database

After the author read the interesting blog post of his colleague Rik van Bruggen on “Media, Politics and Graphs,” he thought it would be really cool to render it as a GrapGist. Especially as he already shared all the queries as a GitHub Gist.

Alec Noller04/20/14
0 replies

The Best of the Week (Apr. 11): NoSQL Zone

Make sure you didn't miss anything with this list of the Best of the Week in the NoSQL Zone. This week's best include lessons learned during a migration from MySQL to MongoDB, differences between RavenDB and MongoDB when it comes to map/reduce, MongoDB's "incremental" map/reduce, and more.

Chris Chang04/18/14
0 replies

MongoDB Driver Tips & Tricks: Mongoose

Many of the support requests we get at MongoLab are questions about how to configure and use particular MongoDB drivers and client libraries. This post is the 2nd of a series where we are covering the popular MongoDB drivers in depth (we covered Mongoid last time). The driver we’re covering today is Mongoose.

Michael Hunger04/18/14
0 replies

Quickly Create a 100k Neo4j Graph Data Model with Cypher Only

We want to run some test queries on an existing graph model but have no sample data at hand, and also no input files (CSV, GraphML) that would provide it. Why not create quickly it on our own just using Cypher?

Alec Noller04/17/14
1 replies

Is Oracle's NoSQL Standards Body an Attempt to Hinder Progress?

You've probably heard the unexpected news that Oracle, the relational database giant, is planning to create a standards body for NoSQL databases. What does it mean, though? According to Andrew C. Oliver's "Beware of NoSQL standards in Oracle's clothing," the intentions are not good.

Don Pinto04/17/14
0 replies

N1QL Querying for Shoppers and Merchants

N1QL is a next generation query language for Couchbase Server. It goes beyond SQL and the relational model in several ways - most importantly, attributes in N1QL can contain multiple values, which can be nested. In this article, the author explores N1QL queries commonly seen in e-commerce applications.

Nati Shalom04/17/14
0 replies


In order to scale horizontally on cheap hardware (a key NoSQL attribute), several compromises had to be made, including abandoning some highly valuable characteristics of relational databases. In this article, you'll learn how to avoid having to make some of these compromises.

Kenny Bastani04/16/14
0 replies

Neo4j 2.0.2 Maintenance Release

Today we released the 2.0.2 maintenance release of Neo4j. This release comes with some critical stability improvements as well as a few small but handy Cypher type conversion functions. All Neo4j users are strongly recommended to upgrade to this release.

Ayende Rahien04/16/14
0 replies

“Incremental” Map/Reduce in MongoDB Isn’t

Rafal and Ben Foster commented on the author's previous post with some ideas on how to deal with incremental updates to map/reduce indexes. And while they look right, they actually can’t possibly work. In this post, the author explains why that is.

Kenny Bastani04/15/14
0 replies

Setting up Neo4j 2.0.1 Linux VM on Windows Azure

Neo4j 2.0.1 has been released as a Linux distribution on the Windows Azure VM Depot. The distribution runs on Ubuntu 12.04 LTS kernel. Follow the steps below to setup your Windows Azure VM.

Brian O' Neill04/15/14
0 replies

Shark on Cassandra (w/ Cash): Interrogating Cached Data from C* Using HiveQL

Shark provides an integration layer between Hive and Spark, which allows us to articulate operations in HiveQL at the shark prompt. This enables a non-developer community to explore and analyze data in Cassandra.

Moshe Kaplan04/14/14
0 replies

How to Migrate from MySQL to MongoDB

Over the last week, the author was working on a key project to migrate a BI platform from MySQL to MongoDB. They chose MongoDB as the platform data infrastructure to support high data insert rate and scale data analysis. In this article, the author shares some of lessons he learned during the process.

Ayende Rahien04/14/14
0 replies

Differences in Map/Reduce Between RavenDB & MongoDB

Ben Foster has a really cool article showing some of the similarities and differences between MongoDB & RavenDB with regards to their map/reduce implementation. However, there is a very important distinction that was missed.

Alec Noller04/13/14
0 replies

The Best of the Week (Apr. 4): NoSQL Zone

Make sure you didn't miss anything with this list of the Best of the Week in the NoSQL Zone. This week's best include a TokuMX and MongoDB Oplog entry comparison, the release of MongoDB 2.6, announcing Couchbase Server 2.5.1, and more.

Alec Noller04/12/14
0 replies

NoSQL Zone Link Roundup (Apr. 12)

For a look at what's been happening outside of the NoSQL Zone, we've assembled a collection of links from around the web, including a look at MapReduce performance on SSDs, Oracle's intention to create NoSQL standards, a new data structure for Redis, Cassandra at one million writes per second, and more.

Antoine Girbal04/11/14
0 replies

Tips to Check and Improve Your Storage IO Performance with MongoDB

In most applications, the disk IO will typically end up being your main bottleneck, all other silly bottlenecks being worked out (CPU, number of connections, etc). And whether our competitors like it or not, the write locks are rarely the bottlenecks in a well designed application.