Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

SQL Hegemony and Document Databases: Challenging Relational Assumptions

DZone's Guide to

SQL Hegemony and Document Databases: Challenging Relational Assumptions

A surpassingly strange question is this: "How do I get the data out of MongoDB into a spreadsheet?" I'm always perplexed by this question. It has a subtext that I find baffling. The subtext is this "all databases are relational, right?"

· Database Zone
Free Resource

Running out of memory? Learn how Redis Enterprise enables large dataset analysis with the highest throughput and lowest latency while reducing costs over 75%! 

A surpassingly strange question is this: "How do I get the data out of MongoDB into a spreadsheet?"

The variation is "How can we load the MongoDB data into a relational database?"

I'm always perplexed by this question. It has a subtext that I find baffling. The subtext is this "all databases are relational, right?"

In order to ask the question, one has to be laboring under the assumption that the only difference between MongoDB and a relational database is the clever sticker on your laptop. Mongo folks have a little green Mango leaf. Postgres has a blue/gray elephant.

This assumption is remarkably hard to overcome.

THEM: "How can we move this mongo data into a spreadsheet?"
ME: "What?"
THEM: "You know. Get a bulk CSV extract."
ME: "Of complex, nested documents?"
THEM: "Nested documents?"
ME: "Mongo database documents include arrays and—well—subdocuments. They're not in first normal form. They don't fit the spreadsheet data model."
THEM: "Whatever. Every database has a bulk unload into CSV. How do you do that in Mongo?"
ME: "You can't represent a mongo document in rows and columns."
THEM: (Thumping desk for emphasis) "Relational Theory is explicit. ALL DATA CAN BE REDUCED TO ROWS AND COLUMNS!"
ME: "Right. Through a process of normalization. The Mongo data you're looking at isn't normalized. You'd have to normalize it into a relational table model. Then you could write a customized extract focused on that relational model."
THEM: "That's absurd."

At this point, all we can do is give them the minimal PyMongo MongoClient code block. Hands-on queries seem to be the only way to make progress.

from pymongo import MongoClient
from pprint import pprint
with MongoClient("mongodb://somehost:27017") as mongo:
    collection = mongo.database.collection
    for document in collection.find():
         pprint(document)

Explanations seem to wind up in a weird circular pattern where they keep repeating their relational assumptions. Not much seems to work: diagrams, hand-waving, links to tutorials are all implicitly rejected because they don't confirm SQL bias.

A few days later they call asking how they are supposed to work with a document that has complex nested fields inside it.

This could be the beginning of wisdom. Or, it could be the beginning of a lengthy reiteration of SQL Hegemony talking points and desk thumping.

THEM: "The document has an array of values."
ME: "Correct."
THEM: "What's that mean?"
ME: "It means there are multiple occurrences of the child object within each parent object."
THEM: "I can see that. What does it mean?"
ME: (Rising inflection) "The parent is associated with multiple instances of the child."
THEM: "Don't patronize me! Stop using mongo mumbo-jumbo. Just a simple explanation is all I want."
ME: "One Parent. Many Children."
THEM: "That's stupid. One-to-many absolutely requires a foreign key. The children don't even have keys. Mongo must have hidden keys somewhere. How can I see the keys on the children in this so-called 'array' structure? How can expose the underlying implementation?"
THEM: "That's stupid. One-to-many absolutely requires a foreign key. The children don't even have keys. Mongo must have hidden keys somewhere. How can I see the keys on the children in this so-called 'array' structure? How can I expose the underlying implementation?"

The best I can do is show them an approach to normalizing some of the data in their collection.

from pymongo import MongoClient
from pprint import pprint
with MongoClient("mongodb://your_host:27017") as mongo:
    collection = mongo.your_database.your_collection
    for document in collection.find():
         for child in parent['child_array']:
              print( document['parent_field'], child['child_field'] )

This leads to endless confusion when some documents lack a particular field. The Python document.get('field') is an elegant way to handle optional fields. I like to warn them that they should not rely on this. Sometimes document['field'] is appropriate because the field really is mandatory. If it's missing, there are serious problems. Of course, the simple get() method doesn't work for optional nested documents. For this, we need document.get('field', {}). And, for optional arrays, we can usedocument.get('field', []).

Interestingly, we sometimes have confusion over {} for document and [] for array. I chalk that up to folks who are too used to very wordy SQL and Java. I save the questions for my next book on Python.

At some point, the "optional" items may be more significant than this. Perhaps an if statement is required to handle business rules that are reflected as different document structures in a single collection.

This leads to yet more desk-thumping. It's accompanied with the laughable claim that a "real" database doesn't rely on if statements to distinguish variant subentities that are persisted in a single table. The presence of SQL ifnull() functions, case expressions, and application code with if statements apparently don't exist. Or—when it is pointed out—isn't the same thing as writing an if statement to handle variant document subentities in a Mongo database.

It appears to take about two weeks to successfully challenge entrenched relational assumptions. Even then, we have to go over some of the basics of optional fields and arrays more than once.

Running out of memory? Never run out of memory with Redis Enterprise databaseStart your free trial today.

Topics:
software devlopement ,nosql ,mongodb ,python ,sql

Published at DZone with permission of Steven Lott, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}