hashCode() and equals()

Before we go into Collection framework let us first understand few important concepts in simple language.

Hashing
We are given below characters and asked to store them in some way of our choice.

X, B, H, K, A, U, N, W, S, Q

We stored them in their order of coming from 0th index to 9th index as follows.

Somebody asked us to find 'N'. What do we do ? We have to start searching from 0th index until we find 'N'. So we searched at 0th index, at 1st index, at 2nd index, at 3rd index, at 4th index, at 5th index, at 6th index. Now we find 'N' and we stopped searching further. Here we wasted time by searching 6 indexes (0 to 5) unnecessarily. Is there any better technique where we can directly reach the 6th index where 'N' is present ?
Well, the answer is Yes. There is a technique called Hashing which is helpful in this kind of scenarios.

Let us define a function of our choice as below.
Give a number to each alphabet from 1 to 26 where A corresponds to 1, B corresponds to 2 and so on. Finally Z corresponds to 26.
Then sum up the digits. Example: for Z, the number is 26. Summing up the digits gives 2 + 6 = 8.
Then divide the sum by 10 and take only the remainder. This operation is called modulo and is represented as '%' operator in Java.

Example:

Similarly the results of our function for all the given characters are:


This function is called hash function and the results are called as hash values. Now based on the hash values of each character let's put them into the corresponding indexes (also called as Buckets). Example: the hash value of 'X' is 6. So we put 'X' into 6th index.


If somebody asks us to find 'N', we can calculate the hash value as per our hash function. Then we go to the 5th index directly (since 5 is the hash value of 'N'). There is possibility of getting same hash values for multiple inputs e.g. 'N' and 'W' both have hash value 5. This is called as Hash Collision. In this case they are stored in a linked list inside the 5th index. 


So we understand that hashing gives us faster access to an element position. Please note that this is an example of hash function and it doesn't say that all hash functions do the same calculation. There could be different functions to calculate the hash value. This example illustrates how to define a hash function, how to calculate a hash value and how to put the elements into a bucket based on the hash values.

hashCode() method in Object (super class of all the java classes) class performs similar tasks as described above. It calculates the hash value of an object.
We can override the hashCode() method to write our own hash function to calculate the  hash value. We may sometimes need to override this method for custom classes. 

In the above example, if we are asked to find 'W, then we have to calculate the hash value of 'W' i.e. 5. Then we go to the 5th bucket. But we see there is a linked list (similar to chain) instead of a single value. Now the question is how to find 'W' ?
Here we have no other choice than to visit the linked list from start until we find our desired element. 
So we start from the first node of the linked list. The value in the first node is 'N'. We will check if ('W' == 'N') is true. Since it is false we move to the next node and see the value is 'W'. ('W' == 'W') returns true, we found our desired element. 
You might ask that the idea of hashing is to avoid linear search. But we are again performing linear search inside the 5th bucket. Then how does it improve the performance. The answer is, instead of searching 6 indexes sequentially, we have searched only 2 indexes inside a bucket. That's how we improved the performance. The performance improvement would be significant where number of inputs are large (in this example we have total 10 inputs and 10 buckets).

Let us declare one custom class (Student) as follows and instead of storing the characters (as explained above) we shall store the objects of this custom class. In this case we can override the hashCode() method to calculate the hash value of an object and JVM will store the object into the corresponding bucket as already explained. But what if more than one object gets the same hash value and they are stored as linked list inside the bucket. How can we identify our desired object ? We cannot apply '==' operator for objects as we were doing in the previous example. Because '==' checks the memory address for objects and '==' checks the value for primitives (int, char, float, double, long). Now the question is how can we compare the values of two objects of same type (means both object belong to same class) ?
Here we need to override equals() method.


Here we have defined our own logic of comparing two objects of Student class i.e. if two student objects have same roll number, then we say both objects are same. We have also defined our logic to calculate the hash value of the object.

'HashMap' internally works exactly as described above. It stores the keys into different buckets and if any hash collision happens, then the key objects are stored in a linked list within the same bucket. 

Thanks,
S. R. Giri




Comments

Popular posts from this blog

Welcome