Over a million developers have joined DZone.

Hash Table (Stuff Formally Trained Programmers Know)

DZone's Guide to

Hash Table (Stuff Formally Trained Programmers Know)

In this post, we give an overview the concepts of hash tables, how developers can use hash tables to work with data, and more.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

This post is one in a series of stuff formally trained programmers know – the rest of the series can be found in the series index.

The hash table, like the tree, needs a key. That key is then converted into a number using a hash function which results in the position in the array that is the data structure. So when you need to lookup an item, you pass in the key, get the same hash value and that points to the same place in memory so you get an O(1) lookup performance.

This is better than a pure array, where you cannot do a lookup so you have to search through giving an O(n) performance and better than a tree which also needs a search, so it ends at O(log n).

A hash table would be implemented with an array, which means that inserting is O(1) — unless the array is full then it is O(n), so this is a pretty good level of performance. An important note in comparing, O(1) when adding to the end of the array and O(1) with a hash table is that the hash table has additional computation so that while they have the same complexity it is still slower to use a hash table; this is why big O notation is useful for understanding it has limits on how it can help us choose what to use.

A hash table, in concept, is easy, but in practice, it is another thing altogether. First, you need a hash algorithm that ends up giving a value inside the bounds of the array - this is often done using modulus. For example, if we get a value of 14213 but our array is only 8 in size we can do 14213 % 8 to get its position, 5.

This converting of numbers brings a new problem, collisions; namely, we are taking a large number of possible numbers and converting them to a smaller number, so what happens when we get, say, 85? It also has a modulus of 5! Two items can't live in the same location. There is no single solution to collisions, there are many.

One option uses linked lists in the items, so if there is a collision then you can store it at the same index and then you only have to search the, hopefully, very limited set in the collisions list. Others use offsets to shift collided items around the existing array. For this series though, a complete analysis of the options is beyond the scope.


.NET has hashtable and Dictionary. Java comes with Hashtable and Kotlin has HashMap.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

big data ,hash table ,hashmap

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}