DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. What's Wrong With Hashcode in java.lang.String?

What's Wrong With Hashcode in java.lang.String?

This overview of strings and hashcode covers how hashcode works, how collisions happen, and the danger of improperly using strings as keys.

Artem Rukavytsia user avatar by
Artem Rukavytsia
·
Jun. 06, 17 · Tutorial
Like (35)
Save
Tweet
Share
57.87K Views

Join the DZone community and get the full member experience.

Join For Free

One of the most significant criteria of every hash function is the tendency for collisions. Hash functions inside the JDK are not the exception. The main idea of a collision attack is finding two different messages, m1 and m2, such that hash(m1) = hash(m2). In this article, I would like to show that this problem is reproducible in every program written in Java and how to get around it.

Firstly, let's consider the internals of the hashcode function in java.lang.String:

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }


As you see hashcode in java.lang.String class is computed as:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

This is nothing more than a polynomial accumulation method. The multiplier of this function is 31 and it has its own rationale: Joshua Bloch in Effective Java explained this solution: "The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically." (from Chapter 3, Item 9: Always override hashcode when you override equals, page 48).

The collisions in hashcode are generated trivially:

If String a and String b have a common prefix and the same length — if n and the statement 31*(b[n-2] - a[n-2]) == (a[n-1] - b[n-1]) is true — it means that first and second strings have the same hashcode.

Let's consider this example:

 String a = "Aa";
 String b = "BB";
 System.out.println(a.hashCode());
 System.out.println(b.hashCode());

 System.out.println(31 * ('C' - 'D') == ('B' - 'a'));
 System.out.println(31 * ('B' - 'A') == ('a' - 'B'));

 System.out.println("common_prefixDB".hashCode());
 System.out.println("common_prefixCa".hashCode());


If you run this example, you will see the same hashcode for "Aa" and "BB" — 2112 — and for the two last strings with "common_prefix" — 1027514780. You can generate your own collisions in a simple loop statement.

With this fact, it follows that the usage of a string like a key in HashMap is the mistake — for instance, if the programmer used a String key in a HashMap for storing HTTP headers, the attacker can use it for DoS attacks.

As a conclusion, be careful with hashcode functions.

Strings Data Types Collision (computer science) Java (programming language) PRIME (PLC) Data structure Java Development Kit Advantage (cryptography) Programmer (hardware) Clear (Unix)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Revolutionizing Supply Chain Management With AI: Improving Demand Predictions and Optimizing Operations
  • ChatGPT: The Unexpected API Test Automation Help
  • Efficiently Computing Permissions at Scale: Our Engineering Approach
  • Tech Layoffs [Comic]

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: