DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Building REST API Backend Easily With Ballerina Language
  • A Data-Driven Approach to Application Modernization
  • Java Module Benefits With Example
  • Providing Enum Consistency Between Application and Data

Trending

  • GDPR Compliance With .NET: Securing Data the Right Way
  • Kubeflow: Driving Scalable and Intelligent Machine Learning Systems
  • *You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research
  • Automatic Code Transformation With OpenRewrite
  1. DZone
  2. Data Engineering
  3. Data
  4. How a URL Shortening Application Works

How a URL Shortening Application Works

By 
Sunil P V user avatar
Sunil P V
·
Updated Feb. 13, 20 · Tutorial
Likes (47)
Comment
Save
Tweet
Share
49.7K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

Typically, a long URL is a system-generated URL, usually containing multiple query parameters that may be encrypted/encoded to mask those parameters. The need for shortened URLs arises when long URLs have to be shared for various purposes, like sharing links via SMSs, Emails, media types, like Twitter or WhatsApp, etc.

A URL shortening service might sound like a simple service at first, but it poses a very interesting and challenging engineering problem. Please pause here and think about how you could design a system to generate an infinite number of short URLs for possibly all long URLs in the world. The interesting part of the problem is generating unique, random identifiers for these long URLs. The identifiers should be short enough to fit in the URL with an acceptable character set.

Short URLs

A typical short URL could be something like this https://ts.dz/qp7h3zi, where ts.dz is a short domain for a particular application, service, or company, and the last part in the URL qp7h3zi is a unique, random identifier. This part is called as the token. Tokens are alphanumeric strings and act as identifiers to the actual long URL. The length of a token typically varies from 6-9 characters. While designing the URL shortening service, we pay close attention to the length of the token and the characters it contains. To be URL safe, consider smaller case characters a-z and numbers from 0-9, we have a total character set of 36 character set. The number of unique tokens that can be generated using various token lengths are shown below.

Token Length Unique Token Combination Range
6 2,176,782,336 > 2 Billion
7 78,364,164,096 > 78 Billion
8 2,821,109,907,456 > 2 Trillion
9 101,559,956,668,416 > 100 Trillion

From the above table, we can choose the length of our token to be 7 characters long, which can accommodate more than 78 billion unique tokens. If we require a larger range, then setting the token length to 8 characters gives us a larger set, which comes to around 2 trillion (we can build a service possibly with almost all URLs in the world! ). The token generation is covered below, after the high-level design section.

You may also like: Building Blocks of a Scalable Architecture.

High-Level Design

The High-Level Design (HLD) consists of two parts. 

  1. Short URL Generation — covers how to handle a request by the system to generate a short URL for an incoming long URL
  2. Short URL Redirection — covers how to handle a short URL click request by a user so that the user can be redirected to the actual location i.e. long URL

HLD - Short URL Generation

The URL Shortening service accepts a long URL as input, generates a short URL, and returns it. If the incoming long URL already exists in the system, then the previous generated short URL has to be returned. If the existing short URL has expired, then a new one has to be created and sent as a response.

URL Generation Workflow

URL Generation Workflow

HLD - Short URL Redirection

When a user clicks on the short URL, the request should be redirected to the URL Shortening Service, which would then redirect it to the actual long URL location. The short domain in the short URL has to be mapped to the service like below:

URL Redirection workflow

URL Redirection workflow

Token Generation

The logic to generate unique, random tokens can be tricky. The URL Shortening service must ensure that the token should not be assigned to two (or more) different long URLs. Otherwise, the system may incorrectly redirect the short URL to another location (long URL). Tokens can be generated by multiple strategies by using 3rd party libraries or using conventional methods like Base36 encoding. We'll cover each of these cases individually.

Third-Party Library

A very reliable library that generates unique, random tokens (ids) is Hashids (which has support in more than 30 languages!). Hashids can create short, unique Ids from numbers (positive numbers and zero) and allows custom alphabets as well as salt, so Ids are unique only to you. The Ids generated are mangled and are not easy to guess. The code footprint is very minimal, and there are no external library dependencies.

Sample tokens generated for salt = ''this is my salt" and character set = "abcdefghijklmnopqrstuvwxyz1234567890".

Hashid Input Token Generated Short URL
1 3joed16 https://test.com/3joed16
2 8lop31w https://test.com/8lop31w
3 l6o3k1n https://test.com/l6o3k1n
4 leo2418 https://test.com/leo2418

Base36 Encode

This is a very straightforward approach, and most languages have these functions. If you don't need all the fancy extras Hashids has to offer, this method will work just fine. It'll probably be faster too. The input range of numbers can be synchronized using a central Zookeeper service.

A high-level design for token generation using Hashids library is shown below. The input to the Hashids library is a number (long) that generates unique tokens. As the URL Shortening service will be working in a cluster with "N" nodes, the input numbers to the Hashids library should be non-repetitive. To handle this, we introduce a synchronization service, like Zookeeper, which can manage and handle the range of Ids (range of numbers) assigned to the URL Shortening application node. 

Every application node, on start-up, requests the Zookeeper service for a new range, and the same will be recorded by the Zookeeper. For every short URL request, the URL Shortening application node increments a number (from the range assigned by Zookeeper) and passes it to the Hashids library, which in-turn generates a unique token for a corresponding input number. This token will be saved into the database corresponding to the long URL. 

Once the assigned number range has been exhausted by the application node, the URL Shortening application requests for a new range from Zookeeper. This design gives us the flexibility to add any number of application nodes based on the incoming traffic.

Service workflow with Hashids

Service workflow with Hashids

The Zookeeper service can be replaced by an instance of DynamoDB, which can maintain a counter. The role of the Zookeeper service is to maintain the ranges that are already assigned to application nodes and assign new ones when requested. A simple table with a counter variable in DynamoDB can be stored. This counter variable is made atomic in nature so that any application node saving an incremental number value doesn't clash (writes-reads don't clash). See how to implement atomic counters in DynamoDB in this link. The DynamoDB can also be replaced by a MySQL table as well or any other system that offers atomic number generation like Redis INCR.

Complete System Design

The complete system design can be summarized, as shown below. The main points and how the work flows are maintained are described below the diagram.

Final architectural diagram

Final architectural diagram

Note: A Redis cache setup is added here to improve application performance, as newly generated short URLs can be responded to from the cache rather than querying the database. The cache entries (short URL and long URL map) can be configured with an expiry or an eviction policy, such as LRU (least recently used).

  1. The URL Shortening application nodes works in a cluster; whenever the application nodes start or when a new node is added to the cluster, they talk to a Zookeeper service.
  2. The Zookeeper service allocates a unique number range to each application node on request.
  3. A request to shorten a long URL reaches the Load Balancer, which then appropriately delegates the call to one of the URL Shortening application nodes in a round-robin fashion.
  4. The application checks if there is an existing short URL entry in the database/cache for the long URL. If an entry exists, and if the short URL is still valid (not expired), then the same short URL will be responded back with HTTP 200 OK.
  5. If the long URL does not exist in the database, then the URL Shortening application uses the next available number (from the range allotted by Zookeeper in step 2) and generates a new token by using the Hashid library. A short URL is generated using the token and the configured expiry is set and then saved to the database and to the cache. The short URL is sent back to the client with a status of HTTP 201 CREATED.
  6. Whenever the short URL is clicked (by any user), the request reaches the Load Balancer and is passed on to one of the URL Shortening application nodes.
  7. The application checks in the cache if a long URL exists for the short URL token and then goes to step 9.
  8. If there is no cache entry, then the application checks in the database and goes to step 9.
  9. Upon finding a long URL that the client has to be redirected to, the application responds with HTTP 302 with the long URL set in the HTTP header Location. This redirects the client from short URL to the actual location (i.e. long URL).
  10. If there is no such token found in the database/cache the application node responds with a status of HTTP 404 NOT FOUND.


Further Reading

  • Software Architecture: The 5 Patterns You Need to Know.
  • Top 20 System Design Interview Questions for Java Programmers.
  • Design Patterns for Beginners With Java Examples.
application Data Types Design Requests Database Library

Opinions expressed by DZone contributors are their own.

Related

  • Building REST API Backend Easily With Ballerina Language
  • A Data-Driven Approach to Application Modernization
  • Java Module Benefits With Example
  • Providing Enum Consistency Between Application and Data

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!