There's an in-depth post the Datastax about doing lightweight transactions in Cassandra. It's an interesting read for those looking into the tradeoffs between availability and consistency in their database system.
In this tutorial, I am going to use the Spray-Client, DataStacks Cassandra driver and Akka to build an application that downloads tweets and then stores their id, text, name and date in a Cassandra table
As a followup to his "Fun with Music, Neo4j and Talend" post a few weeks ago, Rik Van Bruggen also posted a gist of his Last.fm dataset.
Need an introduction to graphs, graph databases, and a NoSQL graph database like neo4j along with a graph query language – Cypher?
The LMDB codebase is a very dense piece of code, but at the same time, it is also quite interesting. In particular, B-Trees are pretty much The Answer for a lot of on disk data structures, but they tend to be quite hard to handle properly.
Tungsten Replicator is a software package that allows replication to be established between MySQL and another database product. This blog post describes how to configure replication between MySQL and NuoDB.
Batch importing and processing is getting more and more popular it seems. This video, featuring DZone MVB Mark Needham, gives a brief tutorial on how to use the neo4j batch importer - a tool used to import large data sets.
When I was helping prepare for the CFSummit conference, we organized the sessions on Trello. The MongoDB aggregation framework is a relatively new addition to the platform. Using this framework, you can group, sort, calculate and handle information gathering in the aggregate sense. Here's how I did this for the Trello JSON data.
Flyclops, a game company, recently decided that MySQL wasn't cutting it for them anymore. They tried out a bunch of different databases including CouchDB, MongoDB, Cassandra, HBase, Neo4j, DynamoDB, PostgreSQL, and a bunch more. What they finally settled on was...
If you're looking to learn some real-world usage scenarios for all four of these technologies all in one sweep, then look no further.
Redis 2.8 hit release candidate 2 status this week with some of the major bugs finally getting fixed. The new release should be a welcome change given Redis' recent, and very public, incident with Twilio's billing system.
If you've tried Riak and didn't like it, or you're about to start using it, you'll want to take a look at this video to see if you were using the best practices when working with this database.
J Brisbin's work with NoSQL datastores over the last couple of years has given me some insight into the direction applications will inevitably take as NoSQL becomes the dominant data storage and retrieval method—at least for web and cloud-based applications.
A relatively new NoSQL data store on the scene is Aerospike. They're focused on pushing the limits of SSD/Flash in-memory data processing and providing "Storm speed" (perhaps he's referring to Twitter Storm?) which is about 10 million messages a second.
This article will show you how core MongoDB operations are made using the MongoDB Java Driver version 2.11.1.
Although it's not yet in a code release, Cloudant and the rest of the Apache CouchDB community has finally finished the merging of BigCouch, a HA, fault-tolerant, clustered version of CouchDB, with the primary CouchDB Apache project. The code is now in the testing phase, which you can help with.
Over the past few years I’ve seen the emergence of polyglot persistence i.e. using different data storage technologies for different data and in most situations we work that out up front. The main down side to this approach is that we now have to keep two data sources in sync but it’s interesting to think about whether this trade off is worthwhile...
Here's a real quick video of the war between SQL and NoSQL methodologies.
I've recently returned from a rather brilliant Couchbase trip to Israel. My colleague Tug Grall and I lead the Couchbase Developer Day held at the LivePerson offices . . .
As mentioned, the data in LMDB is actually stored in page. And I am currently tracking through the process in which we add a new item to the database.
Graph ranking algorithms are all about mapping a complex graphical structure to a numeric vector. For a given algorithm, a single numeric value in the resultant vector corresponds . . .
Here's a discussion of how to manage large numbers of clusters using Chef roles, as well as how to automatically snapshot RAID arrays.
We are happy to announce that we have updated our CUBRID PHP and PDO drivers to roll out loads of improvements and bug fixes.
More than 110 million songs, albums and radio stations have been played 40 billion times through apps integrated with Facebook’s Open Graph.
This lecture from 2010 was presented by Jon Kleinberg of Cornell University. He focuses his discussion of cascading behavior on large social networks by . . .