We help IT Professionals succeed at work.

# Regarding load factor in collections

on
Medium Priority
282 Views
Hi Friends,

can you please explain what will happen when load factor will increase to .95?will it effect any performance issue on hash table?
Comment
Watch Question

## View Solutions Only

Awarded 2011
Awarded 2011
Commented:
http://en.wikipedia.org/wiki/Hash_table

Collision resolution

Hash collisions are practically unavoidable when hashing a random subset of a large set of possible keys. For example, if 2,500 keys are hashed into a million buckets, even with a perfectly uniform random distribution, according to the birthday problem there is a 95% chance of at least two of the keys being hashed to the same slot.

Therefore, most hash table implementations have some collision resolution strategy to handle such events. Some common strategies are described below. All these methods require that the keys (or pointers to them) be stored in the table, together with the associated values.

The performance of most collision resolution methods does not depend directly on the number n of stored entries, but depends strongly on the table's load factor, the ratio n/s between n and the size s of its array of buckets. Sometimes this is referred to as the fill factor, as it represents the portion of the s buckets in the structure that are filled with one of the n stored entries. With a good hash function, the average lookup cost is nearly constant as the load factor increases from 0 up to 0.7(about 2/3 full) or so.[citation needed] Beyond that point, the probability of collisions and the cost of handling them increases.

On the other hand, as the load factor approaches zero, the proportion of the unused areas in the hash table increases but there is not necessarily any improvement in the search cost, resulting in wasted memory.
Awarded 2011
Awarded 2011
Commented:
This is more specifc about Java for the Hashatble API:

An instance of Hashtable has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. Note that the hash table is open: in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. The initial capacity and load factor parameters are merely hints to the implementation. The exact details as to when and whether the rehash method is invoked are implementation-dependent.

Generally, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the time cost to look up an entry (which is reflected in most Hashtable operations, including get and put).

The initial capacity controls a tradeoff between wasted space and the need for rehash operations, which are time-consuming. No rehash operations will ever occur if the initial capacity is greater than the maximum number of entries the Hashtable will contain divided by its load factor. However, setting the initial capacity too high can waste space.
Awarded 2011
Awarded 2011

Commented:
In practice I understand you are talking about huge Hashtables with many thousands entries.
With small Hashtables in most common tasks  it would probably all be not so relevant
CERTIFIED EXPERT
Commented:
Hash tables consist of two processes:
- first a look up for a bucket using the hash function.  This is a constant time operation (technically O(1)).
- second searching the elements within the bucket to find the exact match.

If the hash table is larger than the number of elements being stored (i.e. the load factor is low) then on average there is a single element in each bucket and the second step is also constant time (O(1)).

However as the load factor increases, this second step will become a search.  The usual algorithm is a linked list of values and just walking that list.  That means as the load factor increases, the second part of the lookup becomes O(average-bucket-size) and in general this is usually O(log n).

So yes it will get slower as the load factor increases, because the buckets must be searched.

Hope that helps,

Doug