Diving Deep Into Databases With Joe Karlsson
A SQL and NoSQL discussion.
Join the DZone community and get the full member experience.Join For Free
NoSQL databases are known to provide agility and flexibility when compared to traditional SQL databases. They can be scaled across a myriad of servers, making them well suited for working with large sets of data. This equips organizations with the ability to embrace big data, analytics projects, and further work their digital transformation efforts.
Joe Karlsson, a Developer Advocate at MongoDB joins David Brown in this episode as he shares his insights on NoSQL databases, relational databases, and how SQL and NoSQL, in some cases, cross each other’s boundaries.
Kevin Montalbo: In this episode of cocktails we discuss NoSQL databases with Joe Karlsson, developer advocate from MongoDB. Joe shares his insights on NoSQL and SQL databases, their similarities and differences, and which you should choose for your next project.
All right, Joe Karlsson, welcome to the show.
Joe Karlsson: Hi, welcome. Welcome. Thanks for having me.
KM: Thank you very much for being here. What an energetic start to our podcast. Let’s jump right in. Can you tell us what is a NoSQL database and why should developers consider it?
JK: NoSQL is an acronym for not only SQL, emerged just when we were beginning of starting to look at other database models that aren't SQL or relational. NoSQL is a massive term and it encompasses basically anything that isn't a relational database including time-series databases, graph databases, key-value stores, document-based databases. I'm sure I'm forgetting a couple, but a lot. yeah, absolutely.
David Brown: What category does MongoDB fit?
JK: MongoDB is a document-based database. What that means is instead of saving your data in like a relational rows and columns table type format, you're saving it in documents. The key differentiator there is programmers were used to saving and working with data in JSON-like objects or dictionaries. We can save the data the way that we think about it without having to use an ORM to map that back and forth between data.
DB: Why is it a good thing to lose the concept of tables and relationships and why would someone want to move towards a document-based database instead?
JK: You don't have to get rid of relationships. There's a lot of benefits to it, but like not being able to work directly with the database and not using an ORM actually removes a whole layer of abstraction and connection crease, crude performance and makes it easier to work with the data. I could save the data the way that I'm thinking about. I don't have to like map it back and forth. There's some key performance gains to get to from embedding your data directly in a document, as opposed to doing joins on a foreign key.
DB: Developers spend a lot of time thinking about entity designs. In a NoSQL database, you don’t need to worry about entity design anymore, they can basically build their contact entity on the fly and add extra fields as they need them, that sort of thing.
JK: Actually, quite the opposite. I think that's a common misconception with NoSQL databases and particularly document-based databases. I'm a developer advocate and software engineer at MongoDB. So I'll talk about it from a MongoDB document-based perspective. You still need to worry about it. Schema design, just like with SQL development, is one of the key parts about increasing query and right performance. I think it's one of the things that people don't give enough time and energy to when they're developing or like working on a NoSQL database. Actually I just spoke at a conference today about that. It can hurt performance. I think a lot of people who complain about their MongoDB database, not like scaling well, it's nine times out of 10, it's a schema design problem and kind of making sure that they in fact try to reconsider it.
DB: How does that affect future modifications of your document design? If you do want to change the fields in your entities and that sort of stuff, should you rebuild your documents or in order to maintain performance, how does that work?
JK: You could, I honestly wouldn't recommend it just like you would but data requirements and feature requirements are always changing. They're always growing and even with an SQL database, even like day one, when you launch it, rarely, six months to a year down the line is that database still the exact same schema that you need to follow at the beginning. Same thing with MongoDB database too, it's the right software, it never ends.
Software development never ends. It's always being updated, changed, expanded on. So schema design changes are, that's just part of life. You can't avoid it, right. Even if day one, you're the perfect schema which doesn't exist. But it's going to change. Typically, we do with an SQL database is you run migrations and you make changes to it, and those are, but you can do the same thing with the no SQL database with MongoDB.
That's definitely not an anti-pattern. Yet you could pause it and dump it and restart it again. I wouldn't recommend that most of the time you could probably have some downtime and it's not really necessary. I would probably just copy that data over to another dev database and start running some migration queries on, or like migration updates on all your data then run the query.
DB: What should developers consider in terms of their schema design and optimizing it for performance? Is it the same considerations as a relational database, or is it?
JK: It's confusing so I'll talk about it from an SQL perspective, which people might be familiar with or not. There's very prescribed and well-researched approaches to SQL schema design. We typically do that with normalization. Most developers normalized to the third form. What that means is that like with a relational database, your concern is not how that's going to be used. It's what data you have. I'm not saying that's always true, but I have eight users, I have some user data and I have some professions. Maybe they have a class schedule and say “Oh, I should just split that up, we'll do some joins and that with the foreign key.” Normalization is typically what we're doing. With MongoDB, schema design and document-based schema design, there's no rules, there's no process, there's no algorithm.
The only thing that matters is designing a schema based on the needs of your database. A schema might work for you but at a very similar application, it may be totally different for someone else. I'll give you an example. I just recently built an IoT kitty litter box. It measures my cat's weight over time and how often he uses the bathroom. I designed the schema based on how I'm going to be using and reading that data. With IoT data, you chart sensor data over time. I'm designing my schema in a time series type schema that's going to be optimized for reading on a chart really quickly in real time. That made sense for my application. It doesn't make sense for everyone else. You can still map relationships and model things however you want.
DB: The IoT device for kitty monitoring, is that a new career path you probably see in Amazon marketplace?
JK: I haven’t quit my job yet. I was just looking on Wired, there’s a robotic IoT litter box on the market today. Wired gave them 8 out of 10 and they’re selling for $500 a pop. If anyone wants to steal that idea from me, all the code is open source. You could totally go and take that and monetize it. I am not doing that. I just did it for fun. I talk about it at conferences.
DB: There would be plenty of people that would be interested in knowing that sort of stuff.
JK: I found that to be true too. It's been my most popular talk and blog post by far, my most popular open-source project.
DB: Let’s talk about relationship database designs, relational databases are good with relationships so that’s how they originated. You have a database schema design, which is highly relational, like a CRM system, or transactional systems like a water management system, SQL databases have typically been the database of choice. What can you say to those that argue that a relational database is better suited to data that is highly relational?
JK: The two biggest misconceptions about MongoDB is one, that it does not support ACID transactions, which is false. Two, that it doesn't support relational joins, which is also false. In our aggregation pipeline, you can do a join or we call it a lookup and you can join data from separate collections databases, no problem. Relationship building is not a problem. When you're designing a schema of MongoDB, there's two things you can do: you have to make it for every piece of data, either embed this directly in the document, or I've referenced it using a foreign key, just like you would with a relational database. Um, I think there's been like, and I will admit it's a newer feature and it's something we've listened to the community on.
They'd been asking for years so we built it. I think it's been out since like version 4.2. You could, any relationship you can model with a SQL database, you can totally model with a MongoDB database. You actually have additional flexibility because you can certain bedding that data directly in it, which increases performance. If I don't have to do a lookup and even in an SQL database, joins are really expensive. I don't know if you know it works right, but if I have data in two separate tables, I do a join in them. It basically pulls all those tables in the memory. Then it runs an SQL query on that joint data set in memory that's expensive time-wise and memory wise, and it can become a blocking operation at scale. But if you don't have to do that, that's a massive gain.
DB: People would be surprised that MongoDB supports ACID transactions as well.
JK: That is my number one misunderstanding about MongoDB, it doesn't support ACID. I think even, like, six months ago, you can do asset transactions on shore to data clusters. So, you have data distributed all around the world. You can still run an ACID transaction on that and you can control the amount of the right concerns, how many replicated shards it goes to or you can control all that.
DB: I noticed on one of your blog posts. You have a like for like in terms of terminology between MongoDB and SQL databases. It's called a join in SQL and something else in MongoDB and the like. So, it seems to be like that there is a like for like now in terms of MongoDB and SQL database, is that true?
JK: A hundred percent. We call ourselves a general purpose database and I get asked all the time too, like words what do I use? Do I use this or this? This database or this database, and 90% or 99% of use cases would work just fine on a MongoDB database.
DB: So, this is probably the wrong question to ask an advocate for MongoDB. But where is the downside then? What is that 10%?
JK: Totally. It depends. If anyone tells you that a piece of tech is a silver bullet, they're lying, right? That doesn't exist. I hate in tech too, we're using this thing and someone comes in and be like, Oh, you gotta use this language or this framework. I'm on Stack Overflow. That's not helpful. If you're already an SQL shop, cool, great PostgresQL is awesome. I've been using it for years. I still use it. I think it's great. There's a lot of great use cases for it and like word document-based database, maybe a key-value store like Redis is a better fit for saving user session IDs or some Memcache for even faster lookups. We write to discs. It depends on the problem you're trying to solve and the types of data structure working with and what you're already using.
DB: And the skill sets you have in house that you can already leverage.
JK: A hundred percent. If you're like an SQL master, cool, but I think the database should help make your life easier. If it's making it harder, that's probably a problem. If you're already super-efficient with a piece of tech, cool, go for it.
DB: You mentioned Postgres, Postgres supports JSON-B now. How is that seen as a threat where relational databases are sort of crossing the boundary and supporting NoSQL type functionality?
JK: It's flattering because I think we're seeing a lot of companies now. Amazon has a document DB in Azure disrupt Cosmo DB, they've totally ripped off the MQL, the MongoDB query language syntax, which is great. It's super flattering. We're doing something right. The industry is moving towards the query language that we're designing, awesome. PostgresQL is the same way. I'm actually generally seeing a trend where SQL database sort of becoming more like NoSQL databases. We can more like SQL databases, like we're supporting ACID transactions, all that stuff. But JSON-B I think, and I get asked that question a lot too, and it's important to understand what working with JSON-B documents looks like compared to MongoDB documents.
For example, querying a JSON-B document is a lot harder to do than with MQL or MongoDB query language. You have to use proprietary SQL. It's usually pretty complicated SQL queries. Get the data you could get with a much simpler MQ or MongoDB query. You're also going to have to have all of the legacy relational overhead that you wouldn't have to. You still have to do the mapping. You still have to have an ORM to help interact with that which is additional abstraction, an additional performance hit on YouTube. There's no data governance within the JSON-B document. You have to have a client-side, data governance model to protect what you can or cannot access, or the schema design within that JSON document.
It's basically a blob, right? With MQL or with MongoDB, you can enforce the schema on a database level. You can just control the structure and you can add indexes to deeply nested components of that JSON document with MongoDB. It's very similar but the feature completeness and additional overhead you have with SQL databases is a lot bigger. It's something that should be taken to account too if you want to go that route, but I've used it. It's great. It's not complicated, JSON blobs, and you're already using Postgres. That makes sense, go for it.
DB: There's a use case for it. You mentioned dessert and the like are coming up within an AWS document DB. There was a bit of controversy a couple of years ago when MongoDB changed its licensing model. It was one of the first to change the lesson model because hearsay that public cloud providers were using his tech without paying for it. It changed the licensing things to, uh, get some revenue stream for those that you're going to be using it in a public cloud environment. Now about the Atlas service week, you can host it on MongoDB. Good decision working out well for the company. Is it, you know, it'd be seen that as a general trend in the open-source space.
JK: I think we're seeing that more and more. I agree, there was a lot of backlash on that when it first came out. I also think there's a lot of misunderstanding about it. In the SSPL, it hasn't been endorsed by the open source foundation, but the key is right. If you're selling MongoDB as a provider, you either have to open your stack up or pay us for the license. Everyone else's good to go. If you have an e-commerce shop, using MongoDB that does not apply the license and apply to you, the open source rules are still the same, but you're making money off of code that we paid to produce. I think that the industry is softening to that too, because I think developers are weird about it.
I think like, as a company, you need to make money. With open source, it's hard to do. I think the SSPL is a good way to do a thing without being like having a giant like Amazon Google or Microsoft rip you off or make money off of your intellectual work. I think that there's a lot of benefits to it too. I think if you're not one of those humongous companies trying to monetize a surface, you're fine. Atlas has been great for us and it also allows us to be way more open.
If you go into Cosmo or document DB, right databases, but you're locked into that vendor transferring around a super hard and we just unveiled multi-cloud last week. You're going to install a MongoDB cluster on Google, Azure, and GCP all at the same time you can do replication so if the whole data center goes down, you're totally fine. You can't do that with anyone else because we don't care where you go.
DB: Cloud agnostic. A lot of companies are spending a lot of effort trying to be trying to be cloud-agnostic. Right.
JK: No one wants to be locked in. That's how it gets you, cheap, they pull you in and they start jacking the prices up. But if you give me flexibility, that's a huge win.
DB: So, where do you see the competition mostly coming from? Is it from the cloud-native solutions? Like your document DBs or is it more your open source solutions? Like a couch DB?
JK: I don't even think it's at either of those. The important thing to note too, with the competition of Cosmo document is, let's just go document DB with AWS. AWS, they're based on the last, fully open-source release we did, which is version 2.4, run 4.4. All of them are about independent testing, we've seen about 65% feature completeness with MongoDB, our current releases. We're seeing like massively short on features.
The other thing too is document DB is based on, it's a SQL; it's relational database-backed on the backend. So, what typically happens is they're copying the MQL query language, but putting it on top of a relational database and you're going to get the relational cons with that, which includes not being able to shard or split up the data. And typically what happens is it becomes massively expensive to run these data, these databases on their kind of competitors, and the companies are great. They're awesome products. If you're already in that ecosystem, cool. Makes sense. It's important to understand what you're sacrificing there. I think people assume they're getting the full MongoDB experience, but they're not.
DB: As I understand it, it's running out of Postgres database where they've replicated the MongoDB API.
JK: Exactly. It's a flattering copy. We're super happy about it. It means we're doing something right. The developer community loves the MQL query language. It's easy, it's intuitive. But you lose the benefits.
DB: What about the open-source players?
JK: I mean, they're great. They're awesome. They're not seeing the mass amount of growth that we're seeing. The massive investments, like when there are platforms, I think that we're trying to become like a data platform, not just a database. We have a serverless platform built on top of it. We just unveiled a brand new graph QL endpoint because we know the JSON type-like structure of your database. You can hit a button and we'll generate a graph, a serverless graph QL on point to make fraud operations on your data instantly.
DB: Awesome. I wanted to ask you about that. I actually did read some of your content on graphQL and support. What makes MongoDB a natural choice for when you're working with APIs?
JK: JSON is the payload data of the web. That's all we're sending around. That's a graphQL works. We're even querying now with that same data structure. We're sending that data around, as developers, we're saving data, we think about things in terms of nested key value pairs in terms of documents, JSON, and objects and dictionaries. It makes sense to save the data that you're passing around. If I can just save a document, that's an exact payload formation that I need to send back to a client that makes total sense. You could be more detailed about the declaring what it is. We don't save things as JSON.
DB: If you could elaborate on that graphQL support you were talking about. That sort of native graphQL query support, how does that work?
Have a database, you set up a serverless provider to send some data to your client. You're set up front end and then transfer around. But this graphQL, I love graphQL. I think it's expressive. I think it's interesting. I helped implement it at a large e-commerce store at my last gig. It saved us a ton of time. It just makes front-end developer at like development mega faster. Because I can just ask for what I'm looking for and get it back. It's even easier to get set up. I used to have to set up my own node microservices to handle all the graphical endpoints and have to do all the design, the schema on there and deploy it. That was a massive pain in the butt, but now we don't have to do it. I literally hit a button and it generates the whole thing for me.
DB: So, that's the difference we're talking about here. The native graphQL support is that there's no middleware required to make that query at a MongoDB database.
JK: Exactly right. We know your data better than anyone. We know the structure because it stayed as a BSON. We know the data types, graphQLs and opinionated schema based API provider. We figured out, Hey, we can actually make that for you for free, which is so cool.
DB: I love it. I get so excited about it.
JK: I love it. I wish more people knew about it. Just from a time saving perspective, it's so easy.
DB: Now there's still a place for rest, obviously, and I don't want to get into a whole debate versus rest of us. There's a place for each. We're obviously big fans of restful APIs with particular open API support, but we are also increasingly supporting graphQL and you'll see some stuff coming up from us as well on graphQL space.
JK: You can have both, we did the same thing at my last company. If you have both sitting side by side, it makes API requests and some graphQL requests.
DB: Yes. So in terms of, at rest support in MongoDB, are we just talking about, you know, there is, does need to be some sort of layer in between to make those rest requests?
JK: No, we still have a serverless, it's called realm. We have a whole serverless front end that basically sits in front of your MongoDB databases. You can set up triggers. Based on data changes, you want to fire up some serverless function or make an API request to it. No problem. We do all that for you for free. Those are some things that you wouldn't get with a competition. We want to make handling data ridiculously easy. You want to make it so easy. Most developers don't care about most of the stuff. Nope. They don't care about replication and right concerns and shard, no one cares. Some people do and it's important.
Don't get me wrong. Most people just want to put some data somewhere and get it back as easy as possible. Let's make it and try and make that as easy as a possible. This is my wild speculation for the future: but I think that the winner, the next Oracle is going to be the one who makes working with the data easier than anyone else. We're seeing time over time, in the developer community, the tools that are easier to use, developers are smart, but you still gotta make it, if you have to jump through too many hoops, you're going to lose. I think we're going to see increased abstraction and ease of use for just retrieving and scaling up your data with little to no downtime.
JK: I have another wild speculation for the future. If anyone's listening to this at like 2026 or I may be way off on this. My best guesstimates for the future of data is going to be machine learning, having machine learning models, make automatic adjustments to indexes based on your queries, automatically sharding, partitioning, distributing data based on use cases like, Hey, you have a bunch of users in Hong Kong. Let me replicate this data over to a Hong Kong data center. It's super fast.
We all have to manage that as a human being. We're now seeing more alerts for doing that. Like, Hey, we're seeing this, these recommendations because that's our data company. We can start making models to start massively analyzing and making really smart recommendations for people. It makes it even easier. We already have performance query monitors or like index suggestions. Like, Hey, we see that this query is being made a ton and you can increase your average query time by blah, blah, blah milliseconds by implementing this index. I'm fascinated to see how much automation is going to help us develop databases in the future.
DB: Kevin, inputting your diary 2026, we'll do, and we'll revisit this web, this podcast.
JK: You can roast me on Twitter in 2026 if these don't come through. I don't know when this is going to happen. I think that cat's out of the bag for data modeling.
DB: That makes a lot of sense. You were talking about how we can now query MongoDB directly by restful requests for graphQL. Is there any use case to having some sort of proxy in front of MongoDB where you might want to transform data and manipulate data?
JK: Totally, maybe you want to get data from a separate data source, like a relational database and you want to massage those together before sending it off to whoever's requesting it, or maybe you're set up your own. The great thing about graph QL, it's database agnostic. You could set up a bunch of different databases and it's just querying a bunch of different things and consolidating all together. Go for it. Cool.
DB: That was my natural segueway to Martini obviously facilitates that, our product.
JK: Absolutely. Again, there's no silver bullet. Go for it. Mix it up. I think what's the term, polyglot? Use a bunch of stuff, whatever you want. Most of my applications end up using at least a key-value store, you know, and a document-based database.
DB: Good stuff. Thank you so much, Joe. We've run out of time. It's been a pleasure having you on the show.
JK: Oh my gosh. I had so much fun. This has been a blast. Thanks for having me. Hopefully, we can do this again sometime.
DB: Awesome. I'd love to.
JK: I was just in Kansas City last summer, but hopefully soon for Kansas City, a Devcon that'd be back.
DB: Good. Where can our listeners follow you and find out more about you?
JK: You all can roast me on Twitter at joekarlsson1. One, I make TikToks and funny videos on there. I also post programming tips too often some would say, but I've lots of great stuff, but we're going to chit-chat about whatever I'm hanging out with you there.
KM: Thank you very much Joe Karlsson for being with us. To our listeners, are you working with databases? Do you have any stories that you'd like to share? Let us know in the comments from whatever podcast platform you're listening to also please visit our website at www.torocloud.com for our blogs and our products. We're also on social media, Facebook, LinkedIn, YouTube, Twitter, and Instagram. Talk to us there because we listen, just look for TORO Cloud again. Thank you very much for listening to us today. This has been Joe Karlsson, David Brown, and Kevin Montalvo at your service for Coding Over Cocktails.
Published at DZone with permission of David Brown. See the original article here.
Opinions expressed by DZone contributors are their own.