Apache Hadoop is the definitive big data platform, a warehouse, and Apache Storm is the definitive stream processing platform, a conveyor belt. However, there is something missing: How do items placed on a conveyor belt end up in a warehouse?
Make sure you didn't miss anything with this list of the Best of the Week in the NoSQL Zone. This week's best include NoSQL job trends in February of 2014, tips on the Mongoid 3 driver, a discussion of the performance impact of MongoDB's embedded arrays, and more.
The syncing of mobile data is an important issue, ranging from clean and reliable (Pocket, for example) to spotty and erratic (Facebook, for example). This recent article discusses Couchbase Mobile and the multi-master approach to mobile data syncing.
Last week we released the first milestone of Neo4j 2.1.0 and one its features is a new function in cypher – LOAD CSV – which aims to make it easier to get data into Neo4j. The author thought he'd give it a try to import the London tube graph – something that his colleague Rik wrote about a few months ago.
Recently GridGain released it's 6.0 version under the Apache 2.0 open source license. Nikita Ivanov wrote about the new features and licensing in his blog here, so the author will not repeat them. Instead, he will briefly describe our vision behind In-Memory Computing and why we made the move to open source.
Armed with the knowledge about replication strategies from my previous post, we can now consider this in the context of the time series database. We actually have two distinct pieces of data to track: the actual time data (timestamp and value that we keep track of) and the series information (tags, mostly).
MarkLogic is still the biggest NoSQL vendor by revenue, and has overtaken Cloudera in Big Data revenues. The latest update to the Wikibon chart is a bit hard to follow, so in this article, you'll find the NoSQL and Hadoop companies listed for your convenience.
What happens when you want to do an aggregation query over very large data set? Let us say that you have 1 million data points within the range you want to query, and you want to get a rollup of all the data in the range of a week.
In this post, the author would like to discuss some performance problems recently mentioned about MongoDB’s embedded arrays, and how TokuMX avoids these problems and delivers more consistent performance for MongoDB applications.
In this article, you'll learn how to get Couchbase deployed on Windows Azure virtual machines. It is a step by step guide to getting your environment running on Azure.
The author expects (at a bare minimum) to be able to do about 25,000 sustained writes per second on a single machine. He actually expect to be able to do significantly more. But let us go with that amount for now as something that if it drops below that value, we are in trouble.
So, it is a few days late but we finally have the NoSQL installment of the February job trends. For the NoSQL job trends, we continue to focus on Cassandra, Redis, Couchbase , SimpleDB, CouchDB, MongoDB, HBase, and Riak.
The author has been doing quite a few Intro to Neo4j sessions recently, and because it contains a lot of problems for the attendees to work on, I get to see how first time users of Cypher actually use it. A couple of hours in they wanted to write a query to find directors who acted in their own films.
It’s not a horde of zombies that I fear the most, a network partition. It’s not even a zombie hidden behind a door. It’s the thought of someone in my group becoming a zombie that I fear the most, an unresponsive node. The truth is, distributed systems such as NoSQL databases are terrified of unresponsive nodes.
Make sure you didn't miss anything with this list of the Best of the Week in the NoSQL Zone. This week's best include when to use MongoDB rather than MySQL, a look at how to deal with application-level scenarios in Cassandra, a bughunt in MongoDB, and more.
This blog post is the first of a series where we plan to cover each of the major MongoDB drivers in depth. The driver we’ll be covering today is Mongoid, developed by Durran Jordan (@modetojoy).
Ruby developers working with Cassandra might be interested in Cequel, a Ruby ORM for Cassandra using CQL3. The documentation on Cequel's GitHub is fairly extensive, covering basic installation through more specific details of use, such as Rails integration, setting up models, schema synchronization, and more.
Neo4j exposes a lot of valuable information via JMX. Sometimes you want to gain some insight in some JMX beans when running Neo4j in embedded mode. For this it’s crucial to have the neo4j-jmx-[version].jar file on the classpath.
In the last blog post we saw how we could get about 1,250 requests per second (with a 10ms latency) using an Unmanaged Extension running inside the Neo4j server… but what if we wanted to go faster?
GridGain 6.0 includes a lot of news and enhancements, including significantly reworked and simplified APIs, bi-directional WAN Data Center Replication across different geographies, massive improvements in data grid, and more.
MongoDB has optimized count queries to be very fast. As a result, MongoDB developers design applications with this in mind. Prior to 1.4, TokuMX count queries were not optimized, and were as fast as other non-counting queries, without returning the results over the network. In 1.4, Tokutek addressed this.
The author has been thinking about this quite a lot in the past few days. He is trying to see if there is a common solution to replication in general that we can utilize across a number of solutions. If we can do that, we can provide much better feature set for a wide variety of scenarios.
Recently the author attended a Meetup at MongoDB’s new Palo Alto office to hear the CTO, Eliot Horowitz, speak about the product roadmap. With a new production release right around the corner and MongoDB World in the not-so-distant future, the buzz and excitement around all things MongoDB is high.
The author was recently reminded of a Neo4j Cypher query that he wrote a couple of years ago to find the colleagues that he hadn’t worked with in the ThoughtWorks London office. In this article, you'll find a model to help explain how to write such a query.
Moving from MySQL to Cassandra can be beneficial for a number of reasons, particularly when it comes to spreading out failure scenarios. However, there are still challenges to be faced. According to this recent blog post on the transition, the Rackspace team encountered a number of hiccups in the process.