We're now less than a week away from PGConf US 2016 which runs from April 18th to the 20th at the New York Marriot Brooklyn Bridge. You should check out the full schedule of training sessions and talks to see if you're interested in attending. Also worth highlighting, the Postgres Turns 20, Anniversary Party is happening on the 19th—right smack in the middle of the conference. Going with an 80's throwback theme, the party should feel right at home in hipster paradise Brooklyn, NYC... haha, kidding of course. But in all seriousness, it should be a great time. As one synth-soaked 80's pop duo once noted, "sweet dreams are made of this."
Sorry if that GIF crossed the line...
Leading up to this event, I got the chance to speak with Joshua Drake, Founder and Director of United States PostgreSQL, and Amanda Gilmore, a Ruby developer on the Heroku Postgres team. Joshua's offering a training session entitled Elevating Your Confidence with PostgreSQL's Restoration Capabilities! and a talk by the name of Kicking the Donkey of PostgreSQL Replication, while Amanda's talk covers PostgreSQL’s secret NoSQL superpowers. Asking them each about their history with Postgres and more specifically about their upcoming talks, I was able to gather some insights that I hope you'll find interesting.
Many of these questions overlap, so I'll be using J and A to distinguish between Joshua and Amanda's answers. Now, on to the questions...
What are you working on right now? What's hard and/or interesting about it?
J: Outside of my non-profit work, my primary interest is in containers and microservices. Specifically in relation to PostgreSQL and how to make them scale at a level that is reasonable for production use.
A: Right now, I'm working on the infrastructure automation for Heroku Postgres. I'm really not at liberty to dive too much into the nitty gritty, but it's interesting and I'm enjoying it. Mainly it's interesting because it's hard to operate at the scale we do.
How did you get to where you are now?
J: Hard work. I have been at this game since the very early 90's. I have been an avid Linux user since the SLS distribution which lives on today from Slackware and SUSE. I moved into Postgres with Postgres95, before PostgreSQL itself existed.
A: Hah, entirely by accident! I was working as a QA engineer for a long time, and I was testing internal ad network reporting tools that required me to a. have a strong understanding of a complex database infrastructure (like, MySQL replicating data snapshots from regional DB nodes to a reporting MS-SQL Server instance) and b. be able to maintain data in our test environments that were true to production. It was a pretty daunting task, and I learned the hard way how to maintain data integrity between prod and staging. But, I learned a lot about SQL and really enjoyed working with it, so when I decided to switch to the dev side of things, the Postgres team was a perfect fit.
What's the current state of Postgres and database management?
J: PostgreSQL is in a great position right now. We are the market and community leaders in open source, diversity, and technology. Whether it be the implementation of parallel aggregates, hot standby, or multi-master replication, PostgreSQL is the most advanced database you can get, paid or free.
Where will Postgres go in the next year? Two years?
J: By far the biggest developments in Postgres are around scalability and flexibility. I expect that we will see in-core multi-master within two years. In the next year, we will see native logical (versus binary) replication as continued maturity of existing features that enable us to scale far past what competitors are able to offer.
Tell me about an engineering achievement (recent or past) that you're particularly proud of.
J: I was the architect of an original in-core, native replication system for PostgreSQL—a binary compatible fork called Mammoth. It was an excellent exercise in success and failure.
A: This is tricky since I've always worked in collaboration with others. The main thing I've done exclusively on my own is to make an app that analyzes heart rate data from the Google Fit API to flag when the user is experiencing high levels of stress, but honestly, I wrote that pretty early in my career and the code really isn't the best work I've ever done... which is probably a good thing I think, we should always be learning better ways to do things. Also, I was in the consulting and agency worlds for a long time so a lot of what I've worked on has been under NDA.
Anything particularly noteworthy in your upcoming talks that you’d like to mention here… without giving too much away of course!
J: I think the biggest thing for the Elevating training is showing Linux users the capabilities afforded to them with the Canonical ZFS inclusion. The Kicking talk is great for anyone who doesn't have a firm grasp on PostgreSQL replication. It educates the attendees on everything from the most basic setup to using more advanced and beta technologies such as pglogical.
While these previous questions were sent to Josh and Amanda through email, I also managed to catch a quick call with Amanda to discuss some of the details of her talk...
Can you begin by summing up some of the major points your going to discuss in PostgreSQL’s secret NoSQL superpowers?
A: So, my talk specifically is about how you can leverage the relational and non-relational aspects of the database world in Postgres. I’m sort of a classicist when it comes to data management—I like the relational model and having normalized database schemas. But, at the same time, I recognize the power in the scalability of databases like Cassandra and others like it. I don’t want to be rigid in prescribing how people use databases because there are different tools for different problems. And, I think it’s exciting that Postgres makes it possible to use both of these properties [relational and non-relational] at once.
The way it does that—it’s quite easy because whenever you're formatting a particular field in Postgres, it supports so many data types, pretty much anything you need, natively. And, my talk is going to cover the different data formats that are available to you natively in Postgres to help developers make an informed decision about how they might want to design their schema in their apps.
Another exciting feature that I want to cover, if I have enough time, is foreign data wrappers—they’re really great and very powerful! I’m really excited about them because, for example, returning to Cassandra, let’s say a developer has a giant data store that their data scientist are using in Cassandra, but they still want the ability to bring some of that data into Postgres for various reasons. Foreign data wrappers make that possible—so you can have the best of both worlds and give your developers a lot of flexibility in how they use that data. So, that’s very exciting.
The other nice thing… for example, I don’t know if you’ve ever used Heroku’s Postgres implementation, but they have a product called DataLinks that leverages the existing open source foreign data wrapper for Redis, specifically. I think it’s just redis_fdw in the package manager, so the SQL for that would be: create extension redis_fdw and then you’d create your table schema and point that table at your foreign database connection, and that’s it. But, because that open source software is there and readily available—it’s just super exciting and easy.
Would you elaborate on Postgres' relational store properties?
A: Like I said, Postgres is a pretty powerful relational database that supports a lot of different data formats. So, if you’re working with a document oriented database what that means is that it doesn’t have the ability to create relationships as easily between different blobs of data. But, the nice thing about Postgres is that everything is normalized into different tables. Whenever you’re working with a relational database schema, a best practice is to have a well-normalized schema meaning that you don't have redundant data. And, that’s why Postgres is so powerful... you can be very efficient with how you're storing your data by leveraging those relationships between tables to make sure that you’re not repeating yourself.
Recently, in my office, I was chatting with someone who was just visiting, and this person’s partner was working with a non-relational database. Anyway, they were very amused and confused that you couldn’t do a join with document-oriented databases. And, I pointed out that usually with document databases that concept of a join doesn’t exist because you don’t typically need it. So, a document-oriented database is going to be used in cases when your data model is more conceptually focused.
Actually, Sarah Mei wrote a pretty great blog post about this that gives a television review site as an example. She basically explains how an app that she was creating contained information which was all organized around particular TV shows, but she didn’t necessarily need the ability to create a relationship between the TV shows. So, all the information was contained and centered around the TV show concept... for instance, a single show has seasons and episodes but doesn't need to relate to other shows, which provides a great case for a document-oriented database. Now, the thing with Postgres—returning to that concept of normalization—if you’re using more dynamic data that does need the ability to create multidirectional relationships, when you design your database schema in a way that it’s not repeating that information in any one table, that’s what we call normalized. Where you have an id that other tables can use to access the information so that it’s not being repeated. Anytime you introduce a comparison into the equation you need relational information.
Looking at your abstract, I see you say "sometimes you only have a few rich data structures to deal with, or can’t afford the time and resource cost of running a NoSQL cluster alongside Postgres" can you dive into that a bit... specifically on how Postgres handles performance.
A: Absolutely, that’s an important point.
So, I mentioned that because database connections are expensive computationally, you need to weigh the costs of that… for instance, is it really worthwhile to bring that data from your document database into Postgres, and exactly why are you doing that? One of the reasons that you might want to do that is because Postgres' archiving capabilities are very strong… so, if you want to back data up in Postgres, that'd be a good option. Going back to Redis for example, it gives you the data structure of a key-value store—it's got workable data, the data structure itself is robust, but it’s not as persistent as Postgres. You can almost think of it as a Memcache layer. So, it’s sort of like this intermediary layer between an application cache and an actual persistent database. However, maybe you actually want to persist that data, so you could implement Redis for that intermediary caching layer to help your site performance, but then if you were wrapping Redis into Postgres then you can guarantee the persistence of that data. So, that might be a good use-case for that foreign data wrapper where the database connection is worthwhile, depending on what the data is.
Well, that's it for now. I'd like to thank my interviewees, Joshua and Amanda, for taking the time to speak with me about their experiences with Postgres and their upcoming talks. And of course, if you're able to make it to PGConf US 2016, you should join in for what's sure to be a fun and lively discussion amongst a diverse group of fellow Postgres adopters.