The Biggest Challenges of Moving to NoSQL
The Biggest Challenges of Moving to NoSQL
Speed and scalability and developer joy — these are the rich gains that we have found to be well worth the challenges of moving to NoSQL.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
We are in the midst of a shift towards NoSQL data stores across the software world, especially in the web and mobile space. Many developers and enterprises are migrating or looking to do so, yet a great chasm exists between traditional SQL databases and NoSQL databases. What are the challenges facing developers moving from SQL to NoSQL?
Personally, I've experienced both worlds, having developed and maintained SQL and NoSQL apps and migrated several large apps to NoSQL. In this article, I pass that wisdom on to you, dear reader: pitfalls to avoid, riches to reap, new ways of thinking. My experience is primarily with RavenDB, a NoSQL database built on .NET Core, but the lessons here are general enough to apply to many NoSQL databases.
Let's dig in.
Breaking Out of the Relational-Only Mindset
A primary challenge when moving to NoSQL databases is, surprisingly, human rather than technical.
NoSQL changes the storage game, yet humans persist in old modes of thinking. Like a developer writing C++ using Visual Basic paradigms or attempting mutable object-oriented programming in functional Haskell, so also is the developer who exclusively uses relational database ideas in a NoSQL database. The outcome is often ugly and fails to realize the benefits of NoSQL stores.
Consider, for example, the shape of your objects. Developers coming from a relational database world invariably default into a model friendly to relational databases, even if that model is not friendly to the app itself. We have been bending to the will of the database rather than the database bending to our apps.
In NoSQL, models should be based on how the app interacts with the data rather than how the model can be stored as rows in one or more tables.
In one app that I migrated to NoSQL, we had a table
ManufacturingCompany and a table
ManufacturingCompany had one or more
Contacts; a one-to-many relationship via foreign keys. In our code, we had a
ManufacturingCompany class that had a list of contacts. Our Object-Relational Mapper (ORM) then populated that list via a SQL join.
We had to break out of the SQL mindset and realize we didn't need to shape our model this way. In our case, contacts were never updated outside the context of a company. We didn't need to store contacts separately from manufacturing companies; we only did so because SQL wouldn't let us store a list of complex objects inside another object.
In object-oriented programming, we have a term for this: encapsulation. NoSQL databases, especially document databases like RavenDB, MongoDB, and CouchDB, allow for encapsulation. We changed
ManufacturingCompany to contain a list of contacts, no need for foreign keys, relationships to other documents, JOINs, or ORM mapping.
Freedom! Store Arrays, View Models, and Complex Objects
In another part of our app, we had an entity
Graphic that had a list of
Rankings. A ranking was really just a number. But since we cannot easily store a list of numbers in SQL (it would require extra coding every time we loaded, queried, or updated a
Graphic), we took the path of least resistance and created a table
Ranking, with a foreign key back to the
Graphic table; once again, a one-to-many relationship.
But in NoSQL, we were able to simply store the rankings as an array of integers right on the
Graphic object; once again, simplifying our model and improving performance through eliminated joins.
In another instance, we had a complex stored procedure that pieced together data from over 20 tables on the fly to display a complex admin UI dashboard. Moving to NoSQL, we realized this was no longer needed. Firstly, we could store complex objects as a single entity, greatly reducing the number of joins. Secondly, we utilized RavenDB's Index support to author an index that pulled in the remaining pieces of data. This index is then pre-computed and automatically maintained as data is inserted into the database, giving us blazing fast queries over a much simpler model. The devs rejoiced.
Leveraging Non-Integer Keys
Keys (that is, unique IDs) in a NoSQL database are often strings rather than integers or GUIDs.
For example, in RavenDB, it's common to see documents with IDs like
manufacturingCompanies/32. These keys are strings, they are unique in the database, and they are assigned automatically when storing your object in the database.
This felt strange and foreign to our dev team in the beginning. But over time, we found advantages in string keys. In some scenarios, well-known names can be used as keys, resulting in fewer queries and faster lookup times.
In our case, we had users in our software system with unique emails. It's not possible to have two users with the same email address. Rather than store users as
users/1, we supply a well-known key of
email@example.com, utilizing the unique email address as part of the unique key.
When it comes time to log in a user, we don't have to execute a costly query looking for a user with that email address nor dedicate an index. Instead, we can simply load a user directly by its known key, a highly efficient and fast operation.
We used the same technique for session tokens for users, user roles, and more.
Maintaining Identity Within Non-Roots
Another hurdle when moving to NoSQL was that since not all objects are roots (that is, some objects exist merely as a child inside a parent object), how does one maintain identity within these non-roots?
In the previous case, we had moved
Contact objects as child objects of a parent
ManufacturingCompany. Since contacts are no longer root objects, they don’t get assigned a key or ID by the database.
Often, this is acceptable. We found many cases in which we thought we needed IDs on child objects but we didn’t actually need them.
But in a few scenarios, we did need to uniquely identify child objects. Perhaps we needed to send an email to a particular contact and do some logging off that contact. In such cases, we initially tried referring to the contact by index, i.e. the fourth contact inside
ManufacturingCompanies/32. But this turned out to be ugly and error-prone. We instead moved to a model where when we stored a
ManufacturingCompany, we manually assigned a GUID to
contact.Id — essentially a manual way of assigning identity. We found this sufficed for the few scenarios where nested objects needed to be identified and manipulated.
Yes, Relationships Are Permitted in NoSQL
Folks outside the NoSQL world often believe that moving to NoSQL means removing all relationships between entities. While this may be true of some NoSQL systems (particularly key-value stores), document databases like RavenDB not only permit relationships between entities but also make them simple to work with.
In RavenDB, our
Graphic object refers to another object,
Film. To link these documents, we put a
.FilmId property on the
Graphic. At query time, RavenDB provides an
.Include method to join the related
Film into the results.
In short, relational data works fine with NoSQL. Instead of forcing you into a particular model, NoSQL databases give you additional freedom in how you shape your objects — whether you're dealing with traditional relationships between distinct entities or child objects embedded inside a parent. Both are supported in NoSQL. Recognize this freedom for a smoother transition to NoSQL.
Many of the above issues are human and psychological; changing modes of thinking for new paradigms.
But there are technical challenges, as well, and we encountered several while migrating apps to NoSQL.
NoSQL databases are not a panacea. They make tradeoffs to achieve their high performance and high scalability. Usually, this comes in the form of sacrificing one or more staples of SQL databases: atomic, consistent, isolated, durable (ACID) guarantees are forfeited to gain performance or scalability.
For example, atomic writes (in which all changes take place, or none of them do) is a wonderful feature of traditional relational databases. However, it comes at the cost of performance and internal database complexity. Many NoSQL databases do not offer atomic writes for multiple documents, and in return give better performance.
And while consistency is another great feature of SQL databases, it impedes the ability to scale out a database across multiple nodes, so many NoSQL databases give up consistency.
This moves the onus for data correctness into your app. That sounds scary! But in practice, this was already the case in relational databases on a lesser level; ACID properties could never guarantee that the data was sound from a logical perspective but only guarantee correctness based on the keys, triggers, and constraints setup.
No Transactions Unless You’re Sophisticated
We found RavenDB to strike a nice balance between the safe, traditional world of relational databases and the Wild West that is often NoSQL. In RavenDB, transactions are supported, thus guaranteeing atomic writes. RavenDB’s API requires you to begin a transaction and end with committing that transaction; all writes in between are atomic.
Other NoSQL solutions, including MongoDB, don’t offer transactions or atomic multi-document writes. Moving to these NoSQL platforms requires consideration of additional points of failure: If you need two documents inserted together, what happens if the first succeeds and the second fails? If your NoSQL database doesn’t support transactions, you’ll either hope for the best (you’d be surprised how many developers do this!) or roll your own two-phase commit.
Instead of offering ACID properties, NoSQL databases typically offer BASE properties (basically available, soft state, eventually consistent). This is a fancy way of saying that NoSQL databases punt on all the super nice things that we get for free in a relational database.
Remember why? NoSQL punts on one or more of the ACID guarantees in order to provide better performance, scaling, and more. Tradeoffs.
RavenDB offers a compromise — writes are ACID. Reads by ID are ACID. But queries (that is, querying on something other than ID) are BASE.
The end result is that your data remains consistent with atomic, multi-document writes, and your queries remain fast, running against pre-computed BASE indexes. The tradeoff? The indexes are eventually consistent; they might return stale data. Updated an entity 1ms ago? Querying it now might return the old data.
Our team encountered this issue later in production, typically when our data was larger and our indexes took longer to update. We handled this using RavenDB-specific patterns. Raven offers a
.WaitForIndexesAfterSaveChanges(…) that instructs Raven to block when committing a transaction until the changes are updated in the necessary indexes. And for scenarios where up-to-the-millisecond data is needed, RavenDB offers various
Other NoSQL databases offer different kinds of querying guarantees. MongoDB indexes are updated synchronously, which means no stale data, but blocking writes. Each NoSQL database has its own guarantees, and a prudent developer will investigate and weigh the offerings.
Planning for Orphaned Data
Invariably, in NoSQL databases, you need to plan for data that may break business rules. In our case, sometimes, our related objects might no longer exist; i.e. a
Graphic object with a
films/42, but no such film exists.
While SQL databases would have foreign keys with consistency to ensure the database is always in a valid state, most NoSQL databases have no such guarantee.
This was a major concern for our team initially, but over time, we have discovered this to be only a minor issue. Ultimately, it reveals the existence of a logic bug in application code. We handle these issues with judicious defensive programming and occasional logging.
No Stored Procedures
Many NoSQL databases don’t have the concept of stored procedures. Our team again thought this would be a major issue. But instead, we found it liberating. Moving business logic into the app itself, rather than in the database, allowed us to have better control over the requirements and behavior. Typically, what we expressed in a few hundred lines of SQL could be concisely stated in a dozen lines of modern object-oriented app code.
And for the times we truly needed code in the database to be shared across queries, we found RavenDB’s indexes, transformers, and patches to fit the bill.
Lack of Triggers; Overcome With Listeners
A minor issue we encountered when moving to NoSQL was a lack of triggers. One app we ported from MySQL used triggers to update dates stored on entities as they were inserted. Another used triggers to fire off external processes that would mutate data in other tables.
Most NoSQL options provide a means to work around this, however. MongoDB, for example, allows users to utilize tail-able cursors to monitor for changes.
RavenDB has a built-in feature known as Subscriptions, which allows you to programmatically subscribe to changes to any document, collection, or query. We used this feature in our app to do geolocation lookups of addresses users stored in the system. We
.Subscribe to the
Company collection, and if a new or updated company address is inserted, we fire off an asynchronous geolocation API. While initially skeptical, we found this a cleaner and more powerful model than SQL triggers.
Patching and Model Schemas
NoSQL databases promise the flexibility of schema-less data. And it’s true; there are no constraints on the type or shape of data you store in the database.
Ultimately, though, your app intrinsically does have a schema. Your users have an
.EmailAddress, and so on. What happens when you need to evolve your objects? Does your database evolve with it? How will you handle old versions of your objects stored in the NoSQL database?
In the relational world, we typically handled this using database upgrade scripts that would modify tables, columns, and data in those tables. Object-relational mappers like Entity Framework would further abstraction over the same concept. Failure to update the database schema in the face of new objects results in runtime breakage.
In NoSQL, the story is quite a bit brighter. In our experience, NoSQL databases tend to “roll with what you have” — if you added a
.IsFoo property to your users, no action is needed to update your data. Querying for old users will return the default value in your programming language;
.IsFoo will be false.
For more advanced scenarios, NoSQL databases offer various means to patch existing documents in the database. Mongo offers an
this.IsFoo = true.
In our larger apps, we’ve found that it’s wise to have
DatabaseSchemaVersion document. Our app then executes patches based on the
DatabaseSchemaVersion value during startup.
Moving an Existing SQL App to NoSQL
The strategy that’s worked well for us on multiple apps is to first identify the new shape of the model.
As discussed above, a SQL model shouldn’t map directly to a NoSQL model. Do what makes sense for your app and model your objects accordingly. Should the
Album object embed a list of songs, or should the songs be stored as separate documents? Do what makes sense for your app.
Once a data model has been identified, migrating the data usually involves exporting to CSV or some other independent format. Most NoSQL databases support importing JSON or CSV dumps. We import these, and then run patches on them to move the data into our new, app-friendly NoSQL model.
Alternately, NoSQL databases typically provide some form of writing documents in bulk. RavenDB offers
BulkInsert, allowing high-performance batched import of data. We’ve found this to also be successful and easy to work with: writing a method in that reads in a JSON or CSV file dump from SQL, loading that data into our new NoSQL model, and storing it via bulk insert into the database.
Finally, a less disruptive means of adopting NoSQL can be as a supplement to, rather than a replacement to, relational databases. We’ve adopted this strategy in one app where the effort to replace SQL was too great; we adopted a NoSQL database to help when dealing with complex objects that didn’t fit well as rows in tables.
The biggest challenges in adopting NoSQL are usually human rather than technical. NoSQL is a new way of thinking about data storage, and too often, devs are spellbound to the relational mindset of the 1970s. New paradigms and new software realities demand new thinking about databases and data modeling.
There are technical challenges, as well. Most prominent are those dealing with the lack of ACID guarantees. Each NoSQL database makes different promises and tradeoffs regarding ACID; a wise developer will investigate and weigh them.
Despite all these challenges, I almost invariably turn to NoSQL; the riches are vast! Ease of development, rapid app evolution. No tables to set up, no foreign keys required, no object-relational mapping to configure. Just store your stuff! Even if your stuff is complex objects, view models, lists, encapsulation, polymorphism, and more, NoSQL works with modern apps.
But it’s more than developer joy; it’s performance and scalability. NoSQL databases deliberately punt on certain ACID properties and instead gain a powerful scale-out capability for modern cloud workloads, optimized for fast and efficient read-heavy apps. In the modern era, reads that take even a fraction of a second reduce user engagement and, as Amazon famously found, reduces revenue. Speed and scalability and developer joy — these are the rich gains that we have found to be well worth the challenges.
Opinions expressed by DZone contributors are their own.