NoSQL is a Stupid Name
Join the DZone community and get the full member experience.
Join For Free- they don't use sql. who knew?
- there are different flavours. there's a graphy one and key-value things and... others...
- they're "scalable" (yes, yes, it's web scale ).
- some/many/all(?) embrace the idea of eventual consistency
i was suspicious of the hype surrounding nosql, partly because it's associated with the meaningless marketing term "big data" and partly because i'm a cynic that sneers at things that get too popular. here's what i think when i hear the following terms:
- cloud - fire your systems people and ditch your comms room!
- big data - parse twitter in order to learn how to read your customer's minds!
- nosql - stop paying oracle!
- functional - we couldn't get good enough at mainstream programming languages so we switched to something more difficult!
i don't know if it's healthy to be this cynical, but i'm too old to jump on every bandwagon that comes along.
anyway . back to the people who now pay my bills.
in a traditional relational databases you have tables, and relationships between those tables are achieved with foreign keys. i'm starting to think of these as something kind of grid-shaped with links between them:
![]() |
series of database tables and their relationships. honest. |
- column family
- key/value
- graph
- document
column family databases feel to me, as a newbie to the field, similar to key/value, which i'll come on to. i've mostly heard cassandra used as an example of this type of nosql database. i guess the way i think of this, and of course i could be wrong/over-simplifying, is a unique key linked to a set of key/values:
id: 63537 name: trisha twitter: @trisha_gee location: london
which i'm translating into groups of key/value pairs, with a the id as a sort of header:
![]() |
key/value pairs grouped by id |
key/value
these types of nosql database (e.g. riak ) are pretty much as schema-less as you get - just dump key-value pairs into them. to be honest, the best description i found was on dba.stackexchange.com , so i'm not going to re-write that with my (at this point) limited understanding.
![]() |
never ending lists of key/values |
graph
i came across graph databases when i stumbled across neo4j , chatting to some of the very smart guys there. a graph database lets you model you data as a series of nodes and relationships. and if i think about it, this is not a massive step from either relational models or object models. it doesn't just apply well to the social networking domain (where it's very easy to think in terms of users and their relationships), in actual fact lots of things we design could be modelled this way. not having used it, i'm not sure just how much of a mental leap you need to take to start thinking that way, but it seems like it might be a good fit for many problems.
![]() |
graph of nodes with annotated relationships |
document
now mongodb falls into category four, the document database. and as a nosql n00b , this is now the product and area i know most about, and am clearly going to be more excited about since 10gen are indoctrinating me in the mongodb way.
documents are a familiar structure for developers, especially if they've been working with json. so, a document might be:
{ name: "trisha", twitter: "@trisha_gee", address: [ { line1: "not telling", line2: "no really", town: "london" } ] }
to me, this looks like it maps onto to my domain-shaped object model more easily than a relational database, which always needs some sort of o-r mapping (whether you do this with hibernate or use spring to do it yourself, you're still mapping tables into objects and vice versa). what i like about the document format is the nested sub-documents for data that belongs together. in relational databases you often end up denormalising for performance anyway, so why not just accept that up front and have it as part of the thing you're storing?
this does have a cost, of course - nothing is without trade-offs. every time you request this document, you get the whole lot. you can't have the person without the address. so, you do need to understand the relationships (still) and whether you're usually going to want to get all that data at the same time or whether you might want to make two separate calls.
which brings me on to another thing which is familiar from relational days - foreign keys. a field in your document can be the id of another document, so you can follow the links through and retrieve other documents associated with the starting one. again, there are trade-offs here - each link you follow is a different request to the database. these database requests can be very quick, but if you wanted this data every time, you'd probably want it embeded in your first document to save the additional call. i guess it's a latency vs throughput question really - a single query which returns a chunky document, or multiple queries that return smaller ones.
one of the advantages, it seems, of something like mongodb over some of the key/value databases is the ability to write ad-hoc queries and to tune for those queries. the data is structured (it's in a document) and it doesn't have to be in the same structure every time - not every document relating to a person needs all the fields that another person might have. but you can still query for people who have blue cars or people who live in london, or people who's surnames begin with g. if you find yourself doing the same query a number of times, you can add indexes to mongodb the same way you would a relational database.
semms like i'm getting into more of the nitty-gritty mongodb details, so i'll stop there and leave that for another time.
in summary
classing a whole swathe of products as "nosql" is misleading and confusing. the only thing they all share in common is that they are not traditional relational databases. other than that, some of them are as different from each other as they are from relational databases. i haven't even mentioned caching technologies - these products have functionality which overlaps with nosql databases as well. but even then, the purposes are somewhat different, and not even mutually exclusive.
as with anything, it's really important to understand the strengths and weaknesses of a technology, and the demands of your domain. these different ways of organising data, and different products, are going to perform really well in certain circumstances, and pretty poorly when used in others. getting an understanding of what those strengths and weaknesses are is going to be important in making the correct product/architecture/design decisions.
none of this information is new, there's a lot of material on the web about the different types of nosql databases. i'm writing it more for my own benefit than anything else, my memory is notoriously shocking. for more in-depth (and probably more accurate reading) there's:
-
martin fowler's
nosql distilled
- ...and his introduction to the subject
- tim berglund (@tlberglund) did a great overview of three types at jax london last week. there's a video of the same content (different conference) here .
- http://nosql-database.org/ appears to list all the products that fall under the massive umbrella, but isn't the most usable of sites.
- and yes, i used wikipedia. which is probably where i went wrong...
Published at DZone with permission of Trisha Gee, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments