DZone
Java Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Java Zone > What's Wrong With Hashcode in java.lang.String?

What's Wrong With Hashcode in java.lang.String?

This overview of strings and hashcode covers how hashcode works, how collisions happen, and the danger of improperly using strings as keys.

Artem Rukavytsia user avatar by
Artem Rukavytsia
·
Jun. 06, 17 · Java Zone · Tutorial
Like (35)
Save
Tweet
57.29K Views

Join the DZone community and get the full member experience.

Join For Free

One of the most significant criteria of every hash function is the tendency for collisions. Hash functions inside the JDK are not the exception. The main idea of a collision attack is finding two different messages, m1 and m2, such that hash(m1) = hash(m2). In this article, I would like to show that this problem is reproducible in every program written in Java and how to get around it.

Firstly, let's consider the internals of the hashcode function in java.lang.String:

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }


As you see hashcode in java.lang.String class is computed as:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

This is nothing more than a polynomial accumulation method. The multiplier of this function is 31 and it has its own rationale: Joshua Bloch in Effective Java explained this solution: "The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically." (from Chapter 3, Item 9: Always override hashcode when you override equals, page 48).

The collisions in hashcode are generated trivially:

If String a and String b have a common prefix and the same length — if n and the statement 31*(b[n-2] - a[n-2]) == (a[n-1] - b[n-1]) is true — it means that first and second strings have the same hashcode.

Let's consider this example:

 String a = "Aa";
 String b = "BB";
 System.out.println(a.hashCode());
 System.out.println(b.hashCode());

 System.out.println(31 * ('C' - 'D') == ('B' - 'a'));
 System.out.println(31 * ('B' - 'A') == ('a' - 'B'));

 System.out.println("common_prefixDB".hashCode());
 System.out.println("common_prefixCa".hashCode());


If you run this example, you will see the same hashcode for "Aa" and "BB" — 2112 — and for the two last strings with "common_prefix" — 1027514780. You can generate your own collisions in a simple loop statement.

With this fact, it follows that the usage of a string like a key in HashMap is the mistake — for instance, if the programmer used a String key in a HashMap for storing HTTP headers, the attacker can use it for DoS attacks.

As a conclusion, be careful with hashcode functions.

Strings Data Types Collision (computer science) Java (programming language) PRIME (PLC) Data structure Java Development Kit Advantage (cryptography) Programmer (hardware) Clear (Unix)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 12 Modern CSS Techniques For Older CSS Problems
  • Is NoOps the End of DevOps?
  • Automation Testing vs. Manual Testing: What's the Difference?
  • DZone's Article Submission Guidelines

Comments

Java Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo