DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Allow Users to Track Fitness Status in Your App
  • Rust’s Ownership and Borrowing Enforce Memory Safety
  • High-Performance Java Serialization to Different Formats
  • How to Build a Concurrent Chat App With Go and WebSockets

Trending

  • While Performing Dependency Selection, I Avoid the Loss Of Sleep From Node.js Libraries' Dangers
  • Mastering Fluent Bit: Installing and Configuring Fluent Bit on Kubernetes (Part 3)
  • A Modern Stack for Building Scalable Systems
  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
  1. DZone
  2. Data Engineering
  3. Data
  4. How to Parse and Standardize Street/Postal Addresses

How to Parse and Standardize Street/Postal Addresses

Learn how to parse and standardize addresses with different tools, including regex, npm packages, online validators, and geocoding APIs.

By 
Alfiya Tarasenko user avatar
Alfiya Tarasenko
·
Oct. 11, 21 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
6.9K Views

Join the DZone community and get the full member experience.

Join For Free

For any apps or websites that work with addresses, it's necessary for these addresses to be validated and parsed, as well as standardized and verified. There are various mechanisms that are best suited to different projects, so figuring out what you need exactly isn’t always that easy.

What Problems Appear Around Parsing and Standardization?

There are three primary issues that often occur in the parsing and standardization process.

  1. In general, addresses are not regular. They can be a short string in a certain format or a large fragment written in a very specific way. Also, one abbreviation can mean more than one word. Most challenging of all, there is no versatile open-code to divide and standardize all of this.
  2. There are many different ways to write an address. Some people enter street names and house numbers, while others use zip codes, post office boxes, etc. With variable punctuation, a parsing mechanism must be strong to deal with it.
  3. Countries and regions have different address formats, and this makes the address parsing task again more complicated.

Ways to Parse and Standardize Postal Addresses

Keeping in mind the three main difficulties, you'll now need to choose a suitable method. Here are some of the most popular technologies, from simple ones to the most complicated and versatile.

Regex

This is the easiest solution for situations when you have only regular form addresses. Here you create a regular expression to read this particular form and no other. For example, it can look like [HOUSE_NUMBER, STREET_NAME, CITY_NAME, STATE_NAME]. Then, a regexp will divide this string into suitable components. 

Here is an example of regex that will work well for US addresses containing house number, street, and city:

JavaScript
 
// Address examples:
// 123 W 34th St, Richmond
// 3700 Crutchfield St, Richmond
// 202 E 35th St, Richmond
// 420 Kenyon St NW, Washington
// 102 Irving St NW, Washington

const address = '123 W 34th St, Richmond';

const { groups: { house_number, street, city } } = 
    /(?<house_number>\d+[a-zA-Z]*) (?<street>.+),\s(?<city>.+)/ug.exec(address);


Try building and testing regular expressions at RegExr. 

A regex address parser doesn’t need any external libraries or APIs but simplifies working only with standardized locations. However, it is almost impossible to debug and hard to read. Also, keep in mind that performance issues appear sometimes.

Npm Packages

Another popular variant is npm-packages, which are (or contain) Node modules. Again, there is a wide choice of packages; mostly, they suit one specific country or data type. Some popular ones are:

  • parse-address for the US. This package is regex-based, it knows about many types of data (prefixes, grid-based addresses, official abbreviations, etc.) and is very forgiving with user-provided addresses.
  • addresser intakes an address string and converts it into structured geographic data. It handles abbreviations and normalizes them well. Also, it has the function getRandomCity, which is helpful for testing.
  • humanparser works with human names and divides strings into the first name, last name, middle name, suffixes, and other components. It also parses addresses with the regex method.

While this technology is community-driven, open-data-based, and effective, it also has its cons — primarily in its difficulties with licenses and dependencies. So be careful, as many npm packages cannot be used in commercial projects.

Online Address Validator Tools

Do you have a one-time job? Then there is no need to reinvent the wheel! You can parse and standardize addresses with an online address validator tool. Usually, these tools are compatible with CSV, Excel, and Text formats. The tool will verify each address and you will receive a CSV file with all strings checked. 

An address validator is convenient and straightforward, but the number of addresses to parse may not be as big as you want. Try these tools to parse a bunch of addresses:

  • Address Validation Tool by Geoapify
  • Address Standardization Tool by Geoapify
  • Batch Address Check Tool by Melissa
  • Bulk Address Validation Tool by SmartyStreets

Geocoding API

The final and the strongest technology from today's list is a geocoding API. It is a mechanism processing all operations you need, including parsing, postal address normalization, postal code lookup by address, validation, and verification.

It allows not only to structurize but also to get location's coordinates and information about it. The purpose of a geocoding API is not to parse and divide addresses into components but to show their most suitable locations. For example, if you enter an address that does not exist, you'll get the nearest one.

Some geocoding APIs, such as Geoapify Geocoding API, also provide you a confidence level for each found address. From there, you can decide on the quality of the results and be sure that the found location corresponds to the entered address.

The API as an address parsing technology will probably handle all your tasks and work reliably. Still, do not expect that it is a silver bullet that will work for any address you pass. As with many other cases, the better input you provide, the better results you get. In addition, even if most geocoding API providers offer a free tier, the geocoding service is not free for a large number of addresses. You'll also need additional coding and logic to deal with not-found addresses.

Which One to Choose?

With so many technologies, it might be challenging to choose the best one for your project. Here is a piece of advice on picking the right one.

  • Work with regex if you have strictly regular addresses only. In other cases, use it to eliminate special symbols that shouldn’t be in the address.
  • For projects based in one certain country, npm packages act well. However, there are difficulties with dependencies, and you must check developers’ information precisely.
  • If you need to validate a small number of addresses, an online validator suits you well. For stronger mechanisms, move to geocoding APIs. They simplify the developer’s work maximally and provide high-precision data, which makes it almost versatile.

Hope you’ve found a suitable way of parsing. Try testing different ones to see which one fits better, as well as which is the most comfortable and requires less effort. Remember that different apps and websites might not have the same requirements!

IT API file IO Strings Data (computing) Data Types app Npm (software)

Published at DZone with permission of Alfiya Tarasenko. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Allow Users to Track Fitness Status in Your App
  • Rust’s Ownership and Borrowing Enforce Memory Safety
  • High-Performance Java Serialization to Different Formats
  • How to Build a Concurrent Chat App With Go and WebSockets

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!