Why FoundationDB Might Be All It's Cracked Up To Be
Join the DZone community and get the full member experience.
Join For FreeWhen I first heard about FoundationDB, I couldn’t imagine how it could be anything but vaporware. Seemed like Unicorns crapping happy rainbows to solve all your problems. As I’m learning more about it though, I realize it could actually be something ground breaking.
NoSQL: Let's Review…
So, I need to step back and explain one reason NoSQL databases have been revolutionary. In the days of yore, we used to normalize all our data across multiple tables on a single database living on a single machine. Unfortunately, Moore’s law eventually crapped out and maybe more importantly hard drive space stopped increasing massively. Our data and demands on it only kept growing. We needed to start trying to distribute our database across multiple machines.
Turns out, its hard to maintain transactionality in a distributed, heavily normalized SQL database. As such, a lot of NoSQL systems have emerged with simpler features, many promoting a model based around some kind of single row/document/value that can be looked up/inserted with a key. Transactionality for these systems is limited a single key value entry (“row” in Cassandra/HBase or “document” in (Mongo/Couch) — we’ll just call them rows here). Rows are easily stored in a single node, although we can replicate this row to multiple nodes. Despite being replicated, it turns out transactionally working with single rows in distributed NoSQL is easier than guaranteeing transactionality of an SQL query visiting potentially many SQL tables.
There are deep design ramifications/limitations to the transactional nature of rows. First you always try to cram a lot of data related to the row’s key into a single row, ending up with massive rows of hierarchical or flat data that all relates to the row key. This lets you cover as much data as possible under the row-based transactionality guarantee. Second, as you only have a single key to use from the system, you must chose very wisely what your key will be. You may need to think hard how your data will be looked up through its whole life, it can be hard to go back. Additionally, if you need to lookup on a secondary value, you better hope that your database is friendly enough to have a secondary key feature or otherwise you’ll need to maintain secondary row for storing the relationship. Then you have the problem of working across two rows, which doesn’t fit in the transactionality guarantee. Third, you might lose the ability to perform a join across multiple rows. In most NoSQL data stores, joining is discouraged and denormalization into large rows is the encouraged best practice.
FoundationDB Is Different
FoundationDB is a distributed, sorted key-value store with support for arbitrary transactions across multiple key-values — multiple “rows” — in the database.
To understand the distinction, let me pilfer an example from their tutorial. Their tutorial models a university class signup system. You know, the same system every CS major has had to implement in their programming 101 class. Anyway, to demonstrate the potential power here, I just want to share a single function with you, the class signup function:
def attendsKey(s, c): """ Key for student(s) attending class(c)""" return fdb.tuple.pack(('attends', s, c)) def classKey(c): """ Key for num available seats in class""" return fdb.tuple.pack(('class', c)) @fdb.transactional def signup(tr, s, c): rec = attendsKey(s, c) # generates key for a whether a student attends a class if tr[rec].present(): return # already signed up (step 3) seatsLeft = int(tr[classKey(c)]) ## Get the num seats left for a class if not seatsLeft: raise Exception('no remaining seats') ## (step 3) classes = tr[attendsKeys(s)] ## Count the number of "attends" records for this student if len(list(classes)) >= 5: raise Exception('too many classes') ## (step 4) tr[classKey(c)] = str(seatsLeft-1) ## decrement the available steps tr[rec] = '' # mark that this student attends this class
Okay, more than one function, but the other functions are just helpers to show you how keys are getting generated.
Important here is that all work is done through signup
s first argument, tr
, this is the transaction object where all work is done. First we check for the existence of a special key that indicates whether student s
is attending classc
. Then in the same transaction, we work on a completely different “row” — the count of students attending a class. If we are able to, we update that count and then create a row to store the fact that that student stores that class. More important than what is actually happening here, FoundationDB is able to attempt to perform this transaction atomically across the entire cluster.
If this were a more traditional NoSQL store, we would have to take a bit more awkward tack to do this atomically. We’d have to chose either the class or the student to make the row that we can work with atomically. Implicitly, our key would become either a lookup for a class or a lookup for a student. For the sake of discussion, lets say we made our rows classes and we simply stored the id of all the students attending that class in that row. Its trivial to work on classes to add/remove students. We simply lookup a class and append the student id to sign them up.
Conceptually this model is pretty simple, but its lacking if we suddenly want to lookup students in the database. What would that query look like? Can you do it atomically? You’ll need to have another type of rows for students. Then you have to entities to work across outside of the transactionality guarantees.
FoundationDB == Unopinionated Transactions
A big reason that many NoSQL stores were simplified to the atomic row architecture is to get away from the forced large-scale transactionality (and performance hit) of SQL transactions. The solution was to go back to making everything a map and to make accesses to each entry/row/document a transaction. So we all bought into that and began working our schemas into that model.
However, at the end of the day both SQL and traditional NoSQL are both very opinionated about a transaction should be. Despite the transaction manifesto, Foundation is completely unopinionated when it comes to how you define transactions. The same signup code above could easily be implemented as two or three transactions if that was truly what was called for.
This power is expressed in how you access Foundation. Foundation gets exposed more as a library for defining transactions on an arbitrary key-value store. This narrower aim lets you write code in your language, not constrained to a second query language or awkwardly fitting your code to an ORM. Instead, You write natural code expressing the transactions that you want to perform over the key-value store. Pretty exciting stuff.
Whoah Whoah Whoah, Slow Your Roll Sparky, Looks Cool And All But Prove This Isn’t A Giant Boondongle?
Okay: Foundation is new and unproven. There are plenty of unanswered questions about it. How does it perform vs {HBase/Cassandra/Mongo/Couch/…}? What is the cost of this transactionality? At what point does its transactional architecture stop scaling? What are the trade-offs? Etc Etc
Yeah, yeah so don’t start rewriting all your database code to use Foundation, that would be pretty crazy. Nevertheless, the unopinionated, highly client-controlled notion of transactionality is ground-breaking, obviously useful, and I’m hopeful it can be successful.
Published at DZone with permission of Doug Turnbull, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Logging Best Practices Revisited [Video]
-
Observability Architecture: Financial Payments Introduction
-
Redefining DevOps: The Transformative Power of Containerization
-
Build a Simple Chat Server With gRPC in .Net Core
Comments