DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Formulating a Robust Strategy for Storage in Amazon Relational Database Service PostgreSQL Deployments
  • Just Use PostgreSQL, a Quick-Start Guide: Exploring Essential and Extended Capabilities of the Most Beloved Database
  • Keep Calm and Column Wise
  • Understanding RDS Costs

Trending

  • Next-Gen IoT Performance Depends on Advanced Power Management ICs
  • Beyond ChatGPT, AI Reasoning 2.0: Engineering AI Models With Human-Like Reasoning
  • Medallion Architecture: Why You Need It and How To Implement It With ClickHouse
  • How to Write for DZone Publications: Trend Reports and Refcards
  1. DZone
  2. Data Engineering
  3. Databases
  4. Using PostgreSQL® JSON Functions To Navigate Reviews of Restaurants in India

Using PostgreSQL® JSON Functions To Navigate Reviews of Restaurants in India

Parsing semi-structured dataset in a relational database seems scary. Read on for how PostgreSQL® JSON functions allow your SQL queries to work with json and jsonb data.

By 
Francesco Tisiot user avatar
Francesco Tisiot
·
Sep. 22, 23 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
2.5K Views

Join the DZone community and get the full member experience.

Join For Free

The original idea behind relational databases was "structure, then data": you needed to define what the data looked like before being able to insert any content. This strict data structure definition helped keeping datasets in order by verifying data types, referential integrity, and additional business conditions using dedicated constraints.

But sometimes, life can't be predicted, and data can take different shapes. To enable some sort of flexibility, modern databases like PostgreSQL® started adding semistructured column options JSON, where only a formal check on the shape of the data is done.

PostgreSQL actually offers two options in this space, json and jsonb. The first one validates that the content is in JSON format and stores it as a string. The second is a binary representation optimized for faster processing and better indexing. You can read more on Stack Overflow.

This blog post goes into detail about a few jsonb functions (with the JSON version being really similar without the b ending), by using a dataset containing restaurant information. Therefore, if you, like me, are always willing to discover new cuisines, take out your chef-investigator hat (a mix of Gordon Ramsay's and Sherlock Holmes's hat) and join me in the search for a good restaurant in our imaginary trip to India!

Deploy a PostgreSQL® Instance

Let's start with the basics: we need a database with some sort of parsing capabilities for JSON data. PostgreSQL is perfect for that, and you can either use your own or create an instance in Aiven via the CLI:

avn service create demo-pg \
    --service-type pg      \
    --plan hobbyist        \
    --cloud google-europe-west3


The above creates an Aiven for PostgreSQL (--service-type pg) service called demo-pg , with the bare minimum hobbyist plan in the google-europe-west3 cloud region, which is in Frankfurt, Germany. The service takes a couple of minutes to be ready; you can monitor that with the avn service wait command.

Get the Restaurant Data in PostgreSQL

Any research starts with a dataset, and this is no different! There's a nice Zomato restaurants dataset available in Kaggle that can serve our purposes, containing restaurant information for a lot of cities around the world.

All we need to download it is a valid Kaggle login. Once downloaded, we can unzip the archive file, and we'll get a folder containing five files (file1.json to file5.json). We can load them into our PostgreSQL service using the Aiven CLI, which will get the connection string and use the psql client:

avn service cli demo-pg


Create a rest_reviews table containing a unique column called reviews_data of jsonb type.

create table rest_reviews (reviews_data JSONB);


Then load the files with the following:

\copy rest_reviews from program 'sed -e ''s/\\/\\\\/g'' file1.json';
\copy rest_reviews from program 'sed -e ''s/\\/\\\\/g'' file2.json';
\copy rest_reviews from program 'sed -e ''s/\\/\\\\/g'' file3.json';
\copy rest_reviews from program 'sed -e ''s/\\/\\\\/g'' file4.json';
\copy rest_reviews from program 'sed -e ''s/\\/\\\\/g'' file5.json';


We are using sed -e ''s/\\/\\\\/g'' fileX.json to properly escape any \ characters, which are a problem in the \copy command (kudos to StackOverflow for the answer).

These files we just uploaded are nested JSON documents containing the Zomato API responses in an array like:

[
    {
        "results_found": 17151,
        "restaurants": [
            {"restaurant": {"name":"Hauz Khas Social",...}},
            {"restaurant": {"name":"Qubitos - The Terrace Cafe",...}},
            ...
    },
    {
        "results_found": 100,
        "restaurants": [
            {"restaurant": {"name":"Spezia Bistro",...}},
            {"restaurant": {"name":"Manhattan Brewery & Bar Exchange",...}},
            ...
    },
    ...
    {"message": "API limit exceeded", "code": 440, "status": ""}
    {"message": "API limit exceeded", "code": 440, "status": ""}
]


Extract the List of Restaurants With jsonb_array_elements

To access the list of restaurants (in the restaurants field) we need to:

  • Parse the outer array containing the list of API responses
  • Parse the array of the restaurants JSON item (this also removes the API limit exceeded errors)

We can do that with the following SQL query:

select restaurant -> 'restaurant' ->> 'name' restaurant_name
from rest_reviews
    cross join lateral jsonb_array_elements(reviews_data) dt
    cross join lateral jsonb_array_elements(dt -> 'restaurants') restaurant
limit 10;


In the above:

  • We use the jsonb_array_elements function to parse the jsonb array.
  • jsonb_array_elements(reviews_data) dt gives us dt, which is the outer array of API responses
  • ->retrieves a jsonb subitem. So
    • jsonb_array_elements(dt -> 'restaurants') restaurant gives us restaurant, which is the array contained in the restaurants field.
    • and restaurant -> 'restaurants' gives us the restaurant jsonb values from that array of restaurants.
  • Like ->, ->> retrieves the jsonb subitem, but this time as text. So 'restaurant' ->> 'name' retrieves the field name from the restaurant jsonb, as text.

When executing the above query, we can see the data being parsed correctly:

         restaurant_name
----------------------------------
Hauz Khas Social
Qubitos - The Terrace Cafe
The Hudson Cafe
Summer House Cafe
38 Barracks
Spezia Bistro
Manhattan Brewery & Bar Exchange
The Wine Company
Farzi Cafe
Indian Grill Room
(10 rows)


It would be nice to create a table having a row per restaurant. We can do that using a similar query. Note: This step could do more and parse more columns. It's just an example of what's achievable.

create table restaurant_data as
select
    (restaurant -> 'restaurant' ->> 'id')::int id,
    restaurant -> 'restaurant' jsonb_data
from rest_reviews
    cross join lateral  jsonb_array_elements(reviews_data) dt
    cross join lateral jsonb_array_elements(dt -> 'restaurants') restaurant;


We now have a table called restaurant_data with an integer field id and a jsonb field jsonb_data that we can use for further analysis.

Dive Deep Into the Restaurants Data

Now, we can start with our research. First, let's explore some fields. Apart from the id and name, there's a nice location JSON subitem where we can find the restaurant city amongst other information.

select
    jsonb_data ->> 'id' id,
    jsonb_data ->> 'name' name,
    jsonb_data -> 'location' ->> 'city' city
from restaurant_data limit 5;


In the above, check again the usage of -> to extract the jsonb subitem versus ->> to extract the same as text. The results are the following:

   id    |            name            |   city
---------+----------------------------+-----------
308322   | Hauz Khas Social           | New Delhi
18037817 | Qubitos - The Terrace Cafe | New Delhi
312345   | The Hudson Cafe            | New Delhi
307490   | Summer House Cafe          | New Delhi
18241537 | 38 Barracks                | New Delhi
(5 rows)


What Are the Top Prices in India? Filter Data Using @>

Let's talk money! What are the most expensive restaurants based on average_cost_for_two in Rs., (Indian Rupies) the local currency in India?

select
    jsonb_data ->> 'id' id,
    jsonb_data ->> 'name' name,
    (jsonb_data ->> 'average_cost_for_two')::int average_cost_for_two
from restaurant_data
where jsonb_data @> '{"currency": "Rs."}'
order by 3 desc
limit 5;


We use int again to say the price is an integer, and the @> operator to check that the jsonb_data JSON document contains {"currency": "Rs."}. An alternative would be to extract the currency subitem and filter with jsonb_data ->> 'currency' = 'Rs.'. The following are the results:

  id    |                         name                         | average_cost_for_two
--------+------------------------------------------------------+----------------------
3400072 | Dawat-e-Nawab - Radisson Blu                         |                 3600
2300187 | Waterside - The Landmark Hotel                       |                 3000
3400059 | Peshawri - ITC Mughal                                |                 2500
102216  | Chao Chinese Bistro - Holiday Inn Jaipur City Centre |                 2500
3400060 | Taj Bano - ITC Mughal                                |                 2500
(5 rows)


Check the rating with with_bucket

Okay, the above query gave me an idea of the cost. What about the quality? Let's explore the user_rating item and create a histogram with the following:

with agg_bucket as (
    select
        width_bucket((jsonb_data -> 'user_rating' ->> 'aggregate_rating')::numeric, 0, 5, 10) bucket,
        count(*) nr_restaurants
    from restaurant_data
    where jsonb_data @> '{"currency": "Rs."}'
    group by width_bucket((jsonb_data -> 'user_rating' ->> 'aggregate_rating')::numeric, 0, 5, 10)
    )
select bucket,
    numrange(bucket*0.5 -0.5, bucket*0.5) range,
    nr_restaurants
from agg_bucket
order by 1;


The above query uses the with_bucket function to assign the user_rating value to one of 10 buckets, each covering 0.5 in ratings (e.g., 0-0.5, 0.5-1) etc. The result below shows that we can safely filter for rating >= 4 and still retain a good choice of restaurants.

bucket |   range   | nr_restaurants
-------+-----------+----------------
     5 | [2.0,2.5) |              2
     6 | [2.5,3.0) |              1
     7 | [3.0,3.5) |             88
     8 | [3.5,4.0) |            266
     9 | [4.0,4.5) |            287
    10 | [4.5,5.0) |             76
(6 rows)


What’s the Best Affordable Restaurant?

Should we try to minimize the spending? Let's search if there is any restaurant with a rating greater or equal than four and an average_cost_for_two less than 1000 Indian Rupees. Again, we are casting both aggregate_rating and average_cost_for_two to integers before applying the filter.

select
    jsonb_data ->> 'id' id,
    jsonb_data ->> 'name' name,
    (jsonb_data ->> 'average_cost_for_two')::int average_cost_for_two,
    (jsonb_data -> 'user_rating' ->> 'aggregate_rating')::numeric aggregate_rating
from restaurant_data
where jsonb_data ->> 'currency' = 'Rs.'
and (jsonb_data ->> 'average_cost_for_two')::int < 1000
and (jsonb_data -> 'user_rating' ->> 'aggregate_rating')::numeric >= 4
order by 4 desc, 3 asc
limit 5;


This shows quite a good selection of not-too-expensive but still good restaurants!

  id    |           name            | average_cost_for_two | aggregate_rating
--------+---------------------------+----------------------+------------------
3400346 | Sheroes Hangout           |                    0 |              4.9
2600109 | Sagar Gaire Fast Food     |                  250 |              4.9
800468  | Grandson of Tunday Kababi |                  300 |              4.9
3001321 | CakeBee                   |                  350 |              4.9
96776   | Conçu                     |                  600 |              4.8
(5 rows)


A Deep Dive Into the Events Array, With jsonb_array_length and jsonb_array_elements

Let's refine our research. Are any of the resulting restaurants doing events? Checking events done in the past might give us some more insights into what the restaurant can offer. Since the item zomato_events contains an array of events, we can check for restaurants with at least two entries in that array so we can get a sense of what's available.

select
    jsonb_data ->> 'id' id,
    jsonb_data ->> 'name' name,
    (jsonb_data ->> 'average_cost_for_two')::int average_cost_for_two,
    (jsonb_data -> 'user_rating' ->> 'aggregate_rating')::numeric aggregate_rating,
    jsonb_array_length(jsonb_data -> 'zomato_events') nr_events
from restaurant_data
where jsonb_data ->> 'currency' = 'Rs.'
and (jsonb_data ->> 'average_cost_for_two')::int < 1000
and (jsonb_data -> 'user_rating' ->> 'aggregate_rating')::numeric >= 4
and jsonb_array_length(jsonb_data -> 'zomato_events') > 1
order by 5 desc, 4 desc, 3 asc
limit 5;


To filter on zomato_events, we're using the jsonb_array_length function, which returns the number of items in a specific jsonb array. Interestingly, we only get three rows.

   id    |      name      | average_cost_for_two | aggregate_rating | nr_events
---------+----------------+----------------------+------------------+-----------
103019   | Mutual's       |                  650 |              4.2 |         3
113537   | Puffizza       |                  700 |              4.3 |         2
18413814 | Aangan Horizon |                  900 |              4.0 |         2
(3 rows)


Now that we have that list of three restaurants, let's have a look at what events they created to make our final decision:

select
    jsonb_data ->> 'id' id,
    jsonb_data ->> 'name' name,
    (jsonb_data ->> 'average_cost_for_two')::int average_cost_for_two,
    (jsonb_data -> 'user_rating' ->> 'aggregate_rating')::numeric aggregate_rating,
    events -> 'event' ->> 'title' event_title
from restaurant_data
cross join lateral  jsonb_array_elements(jsonb_data -> 'zomato_events') events
where jsonb_data ->> 'currency' = 'Rs.'
and (jsonb_data ->> 'average_cost_for_two')::int < 1000
and (jsonb_data -> 'user_rating' ->> 'aggregate_rating')::numeric >= 4
and jsonb_array_length(jsonb_data -> 'zomato_events') > 1
order by 1,2,5
limit 5;


Wow, #FlauntYourPizza Contest seems interesting (and possibly very dangerous)!

  id   |   name   | average_cost_for_two | aggregate_rating |            event_title
-------+----------+----------------------+------------------+-----------------------------------
103019 | Mutual's |                  650 |              4.2 | ABHI SOLANKI LIVE
103019 | Mutual's |                  650 |              4.2 | FREEBIRDS
103019 | Mutual's |                  650 |              4.2 | IPL SPECIAL OFF
113537 | Puffizza |                  700 |              4.3 | Dim Light Dinner (Every Thursday)
113537 | Puffizza |                  700 |              4.3 | #FlauntYourPizza Contest
(5 rows)


Looks like we found our restaurant! Now it's time to go there, check the menu, and eat!

Conclusion

"Relational databases are too rigid," how many times did we hear that? It turns out they can also perform really well with semi-structured data like JSON, and PostgreSQL, in particular, has a deep set of functions for manipulating JSON objects. So, next time you have a JSON dataset, maybe give PostgreSQL a try.

Some more resources that you might find useful:

  • PostgreSQL JSON functions
  • JSON vs. jsonb, what are the differences?
Database JSON Relational database PostgreSQL

Published at DZone with permission of Francesco Tisiot. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Formulating a Robust Strategy for Storage in Amazon Relational Database Service PostgreSQL Deployments
  • Just Use PostgreSQL, a Quick-Start Guide: Exploring Essential and Extended Capabilities of the Most Beloved Database
  • Keep Calm and Column Wise
  • Understanding RDS Costs

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!