Unique hashCodes is Not Enough to Avoid Collisions
Join the DZone community and get the full member experience.
Join For FreeThere is a common misconception that if you have unique hashCode() you won't have collisions. While unique, or almost unique, hashCodes are good, this is not the end of the story.
The problem is that the size of a HashMap is not unlimited (or at least 2^32 in size) This means the hashCode() number has to be reduced to a smaller number of bits.
The way HashMap, and thus HashSet and LinkedHashMap ,work is to mutate the bits in the following manner
h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4);
and then apply a mask for the lowest bits to select a bucket. The problem is that even with unique hashCode()s as Integer does, there will be values with different hash code map to the same bucket. You can research how Integer.hashCode() works ;)
public static void main(String[] args) { Set integers = new HashSet<>(); for (int i = 0; i <= 400; i++) if ((hash(i) & 0x1f) == 0) integers.add(i); Set integers2 = new HashSet<>(); for (int i = 400; i >= 0; i--) if ((hash(i) & 0x1f) == 0) integers2.add(i); System.out.println(integers); System.out.println(integers2); } static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); }
[373, 343, 305, 275, 239, 205, 171, 137, 102, 68, 34, 0] [0, 34, 68, 102, 137, 171, 205, 239, 275, 305, 343, 373]
The entries as in the reverse order they were added as the HashMap is acting as a linked list, placing all entries into the same bucket.
Solutions?
Other notes
Published at DZone with permission of Peter Lawrey, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments