Why You Should Care About Equals and Hashcode

Take a look at the importance of equals and hashcode, something even experienced developers decide to ignore. See the role it plays in biased locking and more.

Jakub Kubrynski

Dec. 24, 16 · Tutorial

Likes (23)

Comment

Save

35.9K Views

Equals and hash code are fundamental elements of every Java object. Their correctness and performance are crucial for your applications. However often we see how even experienced programmers are ignoring this part of class development. In this post, I will go through some common mistakes and issues related to those two very basic methods.

Contract

What is crucial about mentioned methods is something called "contract." There are three rules about hashCode and five about equals (you can find them in the Java doc for Object class), but we'll talk about three essential. Let's start from hashCode():

"Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified."

That means the hashcode of an object doesn't have to be immutable. So let's take a look at the code of really simple Java object:

public class Customer {

    private UUID id;
    private String email;

    public UUID getId() {
        return id;
    }

    public void setId(final UUID id) {
        this.id = id;
    }

    public String getEmail() {
        return email;
    }

    public void setEmail(final String email) {
        this.email = email;
    }

    @Override
    public boolean equals(final Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        final Customer customer = (Customer) o;
        return Objects.equals(id, customer.id) &&
        Objects.equals(email, customer.email);
    }

   @Override
   public int hashCode() {
       return Objects.hash(id, email);
   }
}

As you probably noticed, equals and hashCode were generated automatically by our IDE. We are sure those methods are not immutable, and such classes definitely are widely used. Maybe if such classes are so common, there is nothing wrong with such an implementation? So let's take a look at a simple example:

def "should find cart for given customer after correcting email address"() {
    given:
        Cart sampleCart = new Cart()
        Customer sampleCustomer = new Customer()
        sampleCustomer.setId(UUID.randomUUID())
        sampleCustomer.setEmail("[email protected]")

        HashMap customerToCart = new HashMap<>()

    when:
        customerToCart.put(sampleCustomer, sampleCart)

    then:
        customerToCart.get(sampleCustomer) == sampleCart
    and:
        sampleCustomer.setEmail("[email protected]")
        customerToCart.get(sampleCustomer) == sampleCart
}

In the above test, we want to ensure that after changing the email of a sample customer, we're still able to find its cart. Unfortunately, this test fails. Why? Because HashMap stores keys in "buckets." Every bucket holds the particular range of hashes. This idea is why hash maps are so fast.

But what happens if we store the key in the first bucket (responsible for hashes between 1 and 10), and then the value of hashCode method returns 11 instead of 5 (because it's mutable)? The hash map tries to find the key, but it checks second bucket (holding hashes 11 to 20). And it's empty. So there is simply no cart for a given customer. That's why having immutable hash codes is so important! The simplest way to achieve it is to use immutable objects. If, for some reason, it's impossible in your implementation, then remember to limit the hashCode method to use only immutable elements of your objects.

The second hashCode rule tells us that if two objects are equal (according to the equals method), the hashes must be the same. That means those two methods must be related, which can be achieved by basing on the same information (basically fields).

Last but not least, this tells us something about equals transitivity. It looks trivial, but it's not — at least when you even think about inheritance. Imagine we have a date object extending the date-time object. It's easy to implement an equals method for a date — when both dates are same, we return true. The same goes for date-times. But what happens when I want to compare a date to a date-time? Is it enough they will have the same day, month, and year? Can wet compare the hour and minutes, as this information is not present on a date? If we decide to use such an approach, we're screwed. Try this:

2016-11-28 == 2016-11-28 12:20 2016-11-28 == 2016-11-28 15:52

Due to the transitive nature of equals, we can say, that 2016-11-28 12:20 is equal to 2016-11-28 15:52 which is, of course, stupid. But it's right when you think about the equals contract.

A JPA Use Case

Not let's talk about JPA. It looks like implementing equals and hashCode methods here is really simple. We have a unique primary key for each entity, so an implementation based on this information is right. But when is this unique ID assigned? During object creation, or just after flushing changes to the database? If you're assigning IDs manually it's OK, but if you rely on the underlying engine, you can fall into a trap. Imagine such a situation:

public class Customer {

    @OneToMany(cascade = CascadeType.PERSIST)
    private Set
    addresses = new HashSet<>();

    public void addAddress(Address newAddress) {
        addresses.add(newAddress);
    }

    public boolean containsAddress(Address address) {
        return addresses.contains(address);
    }
}

If the hashCode of the address is based on ID, before saving the Customer entity, we can assume all hash codes are equal to zero (because there is simply no ID yet). After flushing the changes, the ID is assigned, which results in a new hash code value.

Now you can invoke the containsAddress method, unfortunately, it will always return false, due to the same reasons which were explained in the first section talking about HashMap. How can we protect against such a problem? As far as I know, there is one valid solution — UUID.

class Address {

    @Id
    @GeneratedValue
    private Long id;

    private UUID uuid = UUID.randomUUID();

    // all other fields with getters and setters if you need

    @Override
    public boolean equals(final Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        final Address address = (Address) o;
        return Objects.equals(uuid, address.uuid);
    }

    @Override
    public int hashCode() {
        return Objects.hash(uuid);
    }
}

The UUID field (which can be UUID or simply String) is assigned during object creation and stays immutable during the whole entity lifecycle. It's stored in the database and loaded to the field just after querying for this object. It, of course, adds some overhead and footprint, but there is nothing for free. If you want to know more about UUID approach you can check two brilliant posts talking about that.

Biased Locking

For over 10 years, the default locking implementation in Java uses something called "biased locking." Brief information about this technique can be found in the flag comment (source: Java Tuning White Paper):

-XX:+UseBiasedLocking

This enables a technique for improving the performance of uncontended synchronization. An object is "biased" toward the thread, which first acquires its monitor via a monitor, entered bytecode or synchronized method invocation; subsequent monitor-related operations performed by that thread are relatively much faster on multiprocessor machines. Some applications with significant amounts of uncontended synchronization may attain significant speedups with this flag enabled; some applications with certain patterns of locking may see slowdowns, though attempts have been made to minimize the negative impact.

Something that is interesting for us regarding this post is how biased locking is implemented internally. Java is using the object header to store ID of the thread holding the lock. The problem is that the object header layout is well-defined (if you're interested, please refer to OpenJDK sources hotspot/src/share/vm/oops/markOop.hpp) and it cannot be "extended" just like that. In 64 bits, the JVM thread ID is 54 bits, long so we must decide if we want to keep this ID or something else.

Unfortunately "something else" means the object hash code (in fact the identity hash code, which is stored in the object header). This value is used whenever you invoke hashCode() method on any object which doesn't override it since Object class or when you directly call the System.identityHashCode() method. That means when you retrieve default hash code for any object, you disable biased locking support for this object. It's pretty easy to prove. Take a look at this code:

class BiasedHashCode {

    public static void main(String[] args) {
        Locker locker = new Locker();
        locker.lockMe();
        locker.hashCode();
    }

    static class Locker {
        synchronized void lockMe() {
            // do nothing
        }

        @Override
        public int hashCode() {
            return 1;
        }
    }
}

When you run the main method with the following VM flags...

-XX:BiasedLockingStartupDelay=0 -XX:+TraceBiasedLocking

...you can see that there is nothing interesting!

However, after removing the hashCode implementation from the Locker class, the situation changes. Now we can find in logs such line:

Revoking bias of object 0x000000076d2ca7e0
mark 0x00007ff83800a805
type BiasedHashCode$Locker 
prototype header 0x0000000000000005
allow rebias 0
requesting thread 0x00007ff83800a800

Why did it happen? Because we have asked for the identity hash code. To sum up this part: no hashCode in your classes means no biased locking.

Threading

Published at DZone with permission of Jakub Kubrynski. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending