What we all need is a generic way to run functions over data stored in Cassandra. Sure, you could go grab Hadoop, and be locked into articulating analytics/transformations as MapReduce constructs. But that just makes people sad. Instead, I'd recommend Spark. It makes people happy.
As you may already know, GridGain went open source last week. Going open source was a lot more involved than simply opening up their code. They put a significant amount of thought into simplifying their APIs and making their development process as community friendly as possible.
Voron is a key/value store that exposes a sorted tree abstraction. You can have as many trees as you would like, and the keys and values are both arbitrary byte strings. Given that, let us try to bring some order to the mix.
This release is another bug fix/stability release concentrating on improving the retry algorithm for Views and adding more refined logging to the client, along with a few other miscellaneous fixes.
Being able to handle replication at the storage level is a really nice feature to have. More than that, it is a feature that can be broadly applied. But… a database is a lot more than just storage. Being able to just move the data around between machines is nice, but there are other things we have to take into account.
If two distributed systems are equally effective, is the one with the simpler topology the one with the better architecture? This article compares the architecture of two document databases and two wide column stores by looking at their topologies.
So, after reaching the conclusion that replication is going to be hard, the author went back to the office and discussed those challenges and was in general pretty annoyed by it. Then somebody made a really interesting suggestion: Why not put it on RAFT?
In the last blog post we managed to run Neo4j at Ludicrous Speed over http using Undertow and get to about 8000 requests per second. But can we go faster on a single server without new hardware? Well… yes, if we’re willing to drop http and switch to Web Sockets.
One of the nice features in RavenDB 3.0 is optimizing the process of creating a new index. In particular, we want to optimize it when you create a new index on a small collection in a large database.
As well as being key value store, Redis offers a publish subscribe messaging implementation.
Sometimes when you work with the Neo4j community you get a database handed to you that you don’t know anything about. Then it is handy to get an idea what’s in there. Which kinds of node-labels are used, what relationship-types connect these nodes and which properties are floating around.
Apache Hadoop is the definitive big data platform, a warehouse, and Apache Storm is the definitive stream processing platform, a conveyor belt. However, there is something missing: How do items placed on a conveyor belt end up in a warehouse?
Make sure you didn't miss anything with this list of the Best of the Week in the NoSQL Zone. This week's best include NoSQL job trends in February of 2014, tips on the Mongoid 3 driver, a discussion of the performance impact of MongoDB's embedded arrays, and more.
The syncing of mobile data is an important issue, ranging from clean and reliable (Pocket, for example) to spotty and erratic (Facebook, for example). This recent article discusses Couchbase Mobile and the multi-master approach to mobile data syncing.
Last week we released the first milestone of Neo4j 2.1.0 and one its features is a new function in cypher – LOAD CSV – which aims to make it easier to get data into Neo4j. The author thought he'd give it a try to import the London tube graph – something that his colleague Rik wrote about a few months ago.
Recently GridGain released it's 6.0 version under the Apache 2.0 open source license. Nikita Ivanov wrote about the new features and licensing in his blog here, so the author will not repeat them. Instead, he will briefly describe our vision behind In-Memory Computing and why we made the move to open source.
Armed with the knowledge about replication strategies from my previous post, we can now consider this in the context of the time series database. We actually have two distinct pieces of data to track: the actual time data (timestamp and value that we keep track of) and the series information (tags, mostly).
MarkLogic is still the biggest NoSQL vendor by revenue, and has overtaken Cloudera in Big Data revenues. The latest update to the Wikibon chart is a bit hard to follow, so in this article, you'll find the NoSQL and Hadoop companies listed for your convenience.
What happens when you want to do an aggregation query over very large data set? Let us say that you have 1 million data points within the range you want to query, and you want to get a rollup of all the data in the range of a week.
In this post, the author would like to discuss some performance problems recently mentioned about MongoDB’s embedded arrays, and how TokuMX avoids these problems and delivers more consistent performance for MongoDB applications.
In this article, you'll learn how to get Couchbase deployed on Windows Azure virtual machines. It is a step by step guide to getting your environment running on Azure.
The author expects (at a bare minimum) to be able to do about 25,000 sustained writes per second on a single machine. He actually expect to be able to do significantly more. But let us go with that amount for now as something that if it drops below that value, we are in trouble.
So, it is a few days late but we finally have the NoSQL installment of the February job trends. For the NoSQL job trends, we continue to focus on Cassandra, Redis, Couchbase , SimpleDB, CouchDB, MongoDB, HBase, and Riak.
The author has been doing quite a few Intro to Neo4j sessions recently, and because it contains a lot of problems for the attendees to work on, I get to see how first time users of Cypher actually use it. A couple of hours in they wanted to write a query to find directors who acted in their own films.
It’s not a horde of zombies that I fear the most, a network partition. It’s not even a zombie hidden behind a door. It’s the thought of someone in my group becoming a zombie that I fear the most, an unresponsive node. The truth is, distributed systems such as NoSQL databases are terrified of unresponsive nodes.