PostgreSQL Performance: Crunching Twitter with the JSON Data Type
People love their databases. Sometimes it's a deeply thought-out consideration of one's business needs, but other times, people just really love their databases. Consider this post from Bibhas Debnath about the performance of PostgreSQL. Debnath makes makes his position pretty clear early on:
I’m not going to start a DBMS comparison. I’ve spent a lot of time fiddling with structred data and PostgreSQL has been really kind to me. It’s stable, really feaure rich and it can literally do anything. You want to setup a cluster? Replication? Crunch geo spatial data? parse Gigabytes of CSV files? And want to do all of these really fast? Then PostgreSQL is for you, it can do all that and more out of the box.
If that's not a glowing review, I don't know what it is. But that's not to say that Debnath doesn't have the evidence to back up his claims; the bulk of his post is spent on the mechanics of a particular use case: fetching Tweets as an excuse to experiment with PostgreSQL's new JSON data type.
So, Debnath walks you through everything you need to do, complete with code:
- Create your tables
- Write a script to grab some Tweets
- Start narrowing in on the useful Tweets
- Start getting location coordinates involved
And, as you might have guessed, his final query makes pretty good time: a total runtime of 617.561 ms. Just to knock his point home, though, Debnath follows this post with a second part, which reveals just how much further PostgreSQL performance can be pushed with the addition of indexes. Even if you start outgrowing your indexes, though, you've still got options.
Take a look at the full article to see how it's all put together and how to maximize performance.