DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Embracing Reactive Programming With Spring WebFlux
  • Spring Boot Annotations: Behind the Scenes and the Self-Invocation Problem
  • Comparing ModelMapper and MapStruct in Java: The Power of Automatic Mappers
  • A Guide to Enhanced Debugging and Record-Keeping

Trending

  • Top 7 Best Practices DevSecOps Team Must Implement in the CI/CD Process
  • How To Verify Database Connection From a Spring Boot Application
  • Exploring Sorting Algorithms: A Comprehensive Guide
  • Building AI Applications With Java and Gradle
  1. DZone
  2. Coding
  3. Frameworks
  4. Searching for Names in all the Wrong Places

Searching for Names in all the Wrong Places

Zone Leader Tim Spann shows us how to use the Soundex library with Elasticsearch to find people's names based on phonetic similarities and variations (Jim vs. Jimmy, etc.).

Tim Spann user avatar by
Tim Spann
CORE ·
Dec. 20, 15 · Tutorial
Like (7)
Save
Tweet
Share
5.33K Views

Join the DZone community and get the full member experience.

Join For Free

Algorithms for Searching for People

As you can imagine, searching for people's names is not trivial. Besides the usual text issues of mixed case, you have name variations, nicknames, and the like. Phonetic searching is very interesting.

You want to find Catherine when you search for Katherine. And searching for James should find you Jim and Jimmy.

There's a number of algorithms and libraries to help you do this more advanced name-matching.

Soundex is the standard and is very commonly used. It's in most of the major databases and is reasonably good at finding matches.  Soundex was created by for the US census, so they had a pretty good test data set.

For those lucky enough to have ElasticSearch, there are a number of heavy duty options. The examples I have been talking about have been related to US/English names—obviously there's other languages and countries that have their own algorithms that make more sense for them.

So how do I put this cool searching into practice? Here is a list of some common Java solutions:

import org.apache.commons.codec.language.Soundex;
import org.apache.commons.codec.language.Nysiis;
import org.apache.commons.codec.language.DoubleMetaphone;

// ...

Soundex soundex = new Soundex();
String soundexEncodedValue = soundex.encode("Timothy");
String soundexEncodedCompareValue = soundex.encode("Tim");
String s3 = soundex.encode("Timmy");

// Timothy = T530 Tim = T500, Timmy = T500

Nysiis n = new Nysiis();

// Timothy = TANATY, Tim =TAN, Timmy = TANY

DoubleMetaphone m = new DoubleMetaphone();
// Timothy = TM0, Tim = TM, Timmy = TM


  • Apache Commons Codec

  • Soundex Java Example

  • Apache Commons Codec Soundex Javadoc

  • Soundex Algorithm at Princeton Java Class

  • Soundex in Oracle Database

Levenshtein Distance is another option or at least an enhancer.

A slightly better alternative is NYSIIS. NYIIS is implemented in Java by the Apache Commons Codec library.

NYIIS is also pretty simple to implement on your own. 

Also, Metaphone is very good and also in the Swiss army knife of text searching, Apache Commons Codec.

In my example code, for my name, it seems Double Metaphone is the best. For really advanced queries you may need to use multiple algorithms. Since Apache Commons Codec has them all and they all use the same encoding method, you should have no issues integrating this into your Java 8, Spring, Hadoop, or Spark code. It would be really easy to write a REST service that looks up names and similar names in Spring Boot with Apache Commons Codec running in a CloudFoundry instance.

Spring Framework

Opinions expressed by DZone contributors are their own.

Related

  • Embracing Reactive Programming With Spring WebFlux
  • Spring Boot Annotations: Behind the Scenes and the Self-Invocation Problem
  • Comparing ModelMapper and MapStruct in Java: The Power of Automatic Mappers
  • A Guide to Enhanced Debugging and Record-Keeping

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: