DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

How does AI transform chaos engineering from an experiment into a critical capability? Learn how to effectively operationalize the chaos.

Data quality isn't just a technical issue: It impacts an organization's compliance, operational efficiency, and customer satisfaction.

Are you a front-end or full-stack developer frustrated by front-end distractions? Learn to move forward with tooling and clear boundaries.

Developer Experience: Demand to support engineering teams has risen, and there is a shift from traditional DevOps to workflow improvements.

Related

  • An Overview of SQL Server Joins
  • How to Pivot and Join Time Series Data in Flux
  • Top 10 Web Developer Communities Developers Should Join in 2021
  • MongoDB to Couchbase: An Introduction to Developers and Experts

Trending

  • Kung Fu Code: Master Shifu Teaches Strategy Pattern to Po – The Functional Way
  • RAG vs. CAG: A Deep Dive into Context-Aware AI Generation Techniques
  • Data Storage and Indexing in PostgreSQL: Practical Guide With Examples and Performance Insights
  • Software Specs 2.0: An Elaborate Example
  1. DZone
  2. Data Engineering
  3. Databases
  4. How to Perform Joins in Apache Hive

How to Perform Joins in Apache Hive

Covering the basics of joins in hive in a detailed tutorial from Hardik Pandya.

By 
Hardik Pandya user avatar
Hardik Pandya
·
Mar. 05, 16 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
27.3K Views

Join the DZone community and get the full member experience.

Join For Free

joins

We will be working with two tables — customer and orders — that we imported in my sqoop import article, and we'll perform the following joins:

  1. INNER JOIN – Select records that have matching values in both tables.
  2. LEFT JOIN (LEFT OUTER JOIN) – Returns all the values from the left table, plus the matched values from the right table, or NULL in case of no matching join predicate
  3. RIGHT JOIN (RIGHT OUTER JOIN) A RIGHT JOIN returns all the values from the right table, plus the matched values from the left table, or NULL in case of no matching join predicate
  4. FULL JOIN (FULL OUTER JOIN) – Selects all records that match either left or right table records.
  5. LEFT SEMI JOIN: Only returns the records from the left-hand table. Hive doesn’t support IN subqueries so you can’t do

SELECT * FROM TABLE_A WHERE TABLE_A.ID IN (SELECT ID FROM TABLE_B);

Customer Table

Hive Tip: to print column headers in command line:

hive> set hive.cli.print.header=true;
hive> select * from customers;
 OK
 customers.id customers.name
 1 John
 2 Kevin
 19 Alex
 3 Mark
 4 Jenna
 5 Robert
 6 Zoya
 7 Sam
 8 George
 9 Peter

Orders Table:

hive> select * from orders;
 OK
 order_id orders.order_date orders.customer_id orders.amount
 101 2016-01-01 7 3540
 102 2016-03-01 1 240
 103 2016-03-02 6 2340
 104 2016-02-12 3 5000
 105 2016-02-12 3 5500
 106 2016-02-14 9 3005
 107 2016-02-14 1 20
 108 2016-02-29 2 2000
 109 2016-02-29 3 2500
 110 2016-02-27 1 200

 

INNER JOIN

Select records that have matching values in both tables.

hive> select c.id, c.name, o.order_date, o.amount from customers c inner join orders o ON (c.id = o.customer_id);
Output
c.id c.name o.order_date o.amount
 7 Sam 2016-01-01 3540
 1 John 2016-03-01 240
 6 Zoya 2016-03-02 2340
 3 Mark 2016-02-12 5000
 3 Mark 2016-02-12 5500
 9 Peter 2016-02-14 3005
 1 John 2016-02-14 20
 2 Kevin 2016-02-29 2000
 3 Mark 2016-02-29 2500
 1 John 2016-02-27 200

LEFT JOIN (LEFT OUTER JOIN)

Returns all the values from the left table, plus the matched values from the right table, or NULL in case of no matching join predicate

hive> select c.id, c.name, o.order_date, o.amount from customers c left outer join orders o ON (c.id = o.customer_id);
Output
 c.id    c.name    o.order_date    o.amount
 1    John    2016-03-01    240
 1    John    2016-02-14    20
 1    John    2016-02-27    200
 2    Kevin    2016-02-29    2000
 19    Alex    NULL    NULL
 3    Mark    2016-02-12    5000
 3    Mark    2016-02-12    5500
 3    Mark    2016-02-29    2500
 4    Jenna    NULL    NULL
 5    Robert    NULL    NULL
 6    Zoya    2016-03-02    2340
 7    Sam    2016-01-01    3540
 8    George    NULL    NULL
 9    Peter    2016-02-14    3005
 Time taken: 40.462 seconds, Fetched: 14 row(s)

RIGHT JOIN (RIGHT OUTER JOIN)

Returns all the values from the right table, plus the matched values from the left table, or NULL in case of no matching join predicate

hive> select c.id, c.name, o.order_date, o.amount from customers c left outer join orders o ON (c.id = o.customer_id);
Output
 c.id    c.name    o.order_date    o.amount
 7    Sam    2016-01-01    3540
 1    John    2016-03-01    240
 6    Zoya    2016-03-02    2340
 3    Mark    2016-02-12    5000
 3    Mark    2016-02-12    5500
 9    Peter    2016-02-14    3005
 1    John    2016-02-14    20
 2    Kevin    2016-02-29    2000
 3    Mark    2016-02-29    2500
 1    John    2016-02-27    200

FULL JOIN (FULL OUTER JOIN)

Selects all records that match either left or right table records.

hive> select c.id, c.name, o.order_date, o.amount from customers c full outer join orders o ON (c.id = o.customer_id);

Output

c.id c.name o.order_date o.amount
 1 John 2016-02-27 200
 1 John 2016-02-14 20
 1 John 2016-03-01 240
 19 Alex NULL NULL
 2 Kevin 2016-02-29 2000
 3 Mark 2016-02-29 2500
 3 Mark 2016-02-12 5500
 3 Mark 2016-02-12 5000
 4 Jenna NULL NULL
 5 Robert NULL NULL
 6 Zoya 2016-03-02 2340
 7 Sam 2016-01-01 3540
 8 George NULL NULL
 9 Peter 2016-02-14 3005

LEFT SEMI JOINleft_semi_join

Find all the customers where at least one order exist or find all customer who has placed an order.

hive> select *  from customers  left semi join orders  ON 
(customers.id = orders.customer_id);

OUTPUT

customers.id    customers.name
1    John
2    Kevin
3    Mark
6    Zoya
7    Sam
9    Peter
Time taken: 56.362 seconds, Fetched: 6 row(s)

That’s it for this article. I hope you find this useful, thank you for reading!

Database Joins (concurrency library) Apache Hive

Published at DZone with permission of Hardik Pandya, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • An Overview of SQL Server Joins
  • How to Pivot and Join Time Series Data in Flux
  • Top 10 Web Developer Communities Developers Should Join in 2021
  • MongoDB to Couchbase: An Introduction to Developers and Experts

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: