Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

PostgreSQL Performance: Crunching Twitter with the JSON Data Type

DZone's Guide to

PostgreSQL Performance: Crunching Twitter with the JSON Data Type

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

People love their databases. Sometimes it's a deeply thought-out consideration of one's business needs, but other times, people just really love their databases. Consider this post from Bibhas Debnath about the performance of PostgreSQL. Debnath makes makes his position pretty clear early on:

I’m not going to start a DBMS comparison. I’ve spent a lot of time fiddling with structred data and PostgreSQL has been really kind to me. It’s stable, really feaure rich and it can literally do anything. You want to setup a cluster? Replication? Crunch geo spatial data? parse Gigabytes of CSV files? And want to do all of these really fast? Then PostgreSQL is for you, it can do all that and more out of the box.

If that's not a glowing review, I don't know what it is. But that's not to say that Debnath doesn't have the evidence to back up his claims; the bulk of his post is spent on the mechanics of a particular use case: fetching Tweets as an excuse to experiment with PostgreSQL's new JSON data type.

So, Debnath walks you through everything you need to do, complete with code:

  • Create your tables
  • Write a script to grab some Tweets
  • Start narrowing in on the useful Tweets
  • Start getting location coordinates involved

And, as you might have guessed, his final query makes pretty good time: a total runtime of 617.561 ms. Just to knock his point home, though, Debnath follows this post with a second part, which reveals just how much further PostgreSQL performance can be pushed with the addition of indexes. Even if you start outgrowing your indexes, though, you've still got options.

Take a look at the full article to see how it's all put together and how to maximize performance.


Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}