DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Databases
  4. Uber System Design

Uber System Design

This article provides a look into the system design of popular ride-share apps and what databases are used for their implementation.

N K user avatar by
N K
·
Mar. 13, 23 · Tutorial
Like (1)
Save
Tweet
Share
1.64K Views

Join the DZone community and get the full member experience.

Join For Free

The popular implementations of the ride-hailing service are the following:

  • Uber
  • Lyft
  • Curb
  • Grab

Requirements

  • The rider can see all the available nearby drivers
  • The driver can accept a trip requested by the rider
  • The current location of the rider and driver should be continuously published on the trip confirmation

Data Storage

Database Schema

Database Schema

  • The primary entities are the riders, the drivers, the vehicles, and the trips tables
  • The relationship between the drivers and the vehicles table is 1-to-many
  • The relationship between the drivers and trips table is 1-to-many
  • The relationship between the riders and trips table is 1-to-many
  • The trips table is a join table to represent the relationship between the riders and the drivers

Type of Data Store

  • The wide-column data store (LSM tree-based) such as Apache Cassandra is used to persist the time-series location data of the client (driver and rider)
  • The cache server such as Redis is used to store the current location of the driver and the rider for quick lookups
  • Message queue such as Apache Kafka is used to handle the heavy traffic
  • A relational database such as Postgres stores the metadata of the users

High-Level Design

  • The DNS redirects the requests from the client (rider and driver) to nearby data centers
  • The client (rider and driver) updates the data stores with Geohash of their real-time location
  • WebSocket is used for real-time bidirectional communication between the rider and the driver
  • Consistent hashing is used to partition the data stores geographically

Write Path

Write Path

  1. The client (driver) creates a WebSocket connection on the load balancer to publish the current location (latitude, longitude) of the driver in real-time
  2. The load balancer uses the round-robin algorithm to delegate the client’s connection to a server with free capacity in the nearby data center
  3. The Geohash of the driver location is persisted on the message queue to handle the heavy traffic
  4. The Geohash of the driver location is stored on the wide-column data store for durability
  5. The Geohash is stored on the point location cache to provide real-time location updates
  6. The client (rider) creates a WebSocket connection on the load balancer to publish the current location (latitude, longitude) of the rider in real-time
  7. The load balancer uses the round-robin algorithm to delegate the client’s connection to a server with free capacity in the nearby data center
  8. The Geohash of the rider location is persisted on the message queue to handle the heavy traffic
  9. The analytics service (MapReduce based) queries the wide-column data store to generate offline analytics on the trip data
  10. The controller service prevents hotspots by auto-repartitioning the stateful services
  11. The point location cache is denormalized by the generation of multi-character Geohash to improve the read performance (provides zoom functionality)
  12. The server holding the rider’s WebSocket connection queries the point location cache to identify the available nearby drivers
  13. As a naive approach, the euclidean distance can be used to find the nearest vehicles within a Geohash
  14. Sharding of the services can be implemented on multiple levels, such as the city level, geo sharding for further granularity, and the product level (capacity of the vehicle)
  15. The hotspots are handled through replication and further partitioning of the stateful services by the driver ID
  16. The wide-column data store is optimized for writes, while the cache server is optimized for reads
  17. The wide-column data store is replicated across multiple data centers for durability
  18. The LRU policy is used to evict the cache server

Read Path


Read Path

Driver Accepting a Trip Request

  1. The client (driver) creates a WebSocket connection on the load balancer to receive updates on trip requests in real-time
  2. The load balancer uses the round-robin algorithm to delegate the client’s connection to a server with free capacity in the nearby data center
  3. The server holding the driver’s WebSocket connection must acquire a distributed lock to handle concurrency issues when accepting trip requests from concurrent unique riders
  4. The server holding the driver’s WebSocket connection invokes the trip service to confirm the trip
  5. The trip service queries the pub-sub server to create a one-to-one communication channel between the driver and the rider
  6. The server publishes the location updates by the driver on the pub-sub server
  7. The pub-sub server persists the driver’s location data on the trip data store for durability
  8. The pub-sub server delegates the location updates by the driver to the server that is holding the rider’s WebSocket connection using the publish-subscribe pattern
  9. The server delegates the driver’s location update to the load balancer that holds the rider’s WebSocket connection
  10. The driver’s location updates are published to the rider
  11. The state of the trip is cached on the client (rider and driver) for a fallback to another data center
  12. Chaos Engineering can be used for resiliency testing
  13. Services gossip protocol the state for high availability
Data store Database Relational database Data storage Systems design

Published at DZone with permission of N K. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How to Submit a Post to DZone
  • The Path From APIs to Containers
  • Spring Boot, Quarkus, or Micronaut?
  • REST vs. Messaging for Microservices

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: