Link to home
Start Free TrialLog in
Avatar of Rohit Bajaj
Rohit BajajFlag for India

asked on

Hashing strings to unique buckets

When hashing a list of strings to a hash-table with number of buckets = O(N) where N is the  size of the list of strings, could one assume that each string will map to a different
bucket ? If not, then if I use linear probing or quadratic probing in case of collision
, So as to make sure that each string maps to a different bucket what will be the average case
time complexity. It seem the time complexity will increase.
As normally the average case complexity is Theta(1) for search, delete and insert
ASKER CERTIFIED SOLUTION
Avatar of d-glitch
d-glitch
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Rohit Bajaj

ASKER

I find hash-tables somewhat confusing.
The average length of a bucket will be O(1)
But what if hash function is a bad hash function and puts everything inside one bucket.
When we say that average length will be O(1) a lot depends upon the choice of hash function also.
And in most of the cases I have seen the hash is always calculated along with % c
c is the table size. So even though hash function is a good hash function there could be lot of collisions.


Another line of thought could be. While calculating the average time complexity I consider all possible hash-tables and take average over them
Big_O is a measure of time or relative number of operations.

The hash function and table size have to work together.

You are as unlikely to find a good hash function as a bad one.  
The best strategy is to find a perfectly random one so you do the analysis.

You don't get O(1) performance from hashing N elements to N buckets.
Avatar of dpearson
dpearson

>> But what if hash function is a bad hash function and puts everything inside one bucket.
>> When we say that average length will be O(1) a lot depends upon the choice of hash function also.

Yes that's correct.  You will get O(1) hash table performance if your hashing function produces an even spread of assignments of elements to buckets.  So that the average length of the list inside a bucket remains a constant size (e.g. 1 or 2, not something proportion to the number of inserts).

If you chose a terrible hash function (e.g. hash(s) -> 100 i.e. a constant value) then you will end up with all elements in one bucket and now your hash table insert performance will be O(n).

People often say that a hash table has O(1) average time performance but O(n) worst case time performance (the worst case being where everything hashes to the same value).
>>  This is why most hash table implementations grow the buckets
       when you keep adding elements to them.

I don't see how resolving collisions via dynamic memory allocation can be efficient.  

Hashing is all and only about trading space for speed.
>> I don't see how resolving collisions via dynamic memory allocation can be efficient.  

It's not the collisions that cause memory allocation.  It's the growing number of elements in the table.

E.g. In Java, the standard hash table (HashMap) creates a table with an initial capacity and then expands that (creates more buckets and rebuilds the entire hash table) when that capacity is exceeded.  It uses a power of 2 expansion, so each rebuild will use twice as many buckets.

The individual rebuilds will be O(n), so a single insert can be slow but again over the average you get back to O(1) time complexity, but as you say, this comes at the use of an ever expanding amount of space (which doubles each time the table needs to be expanded).  In practical use, it's of course good to know how big your table will need to be and set the correct capacity at the start (which you can do).

But the goal here is that since the number of underlying buckets grows, the overall performance of the table remains what we'd expect, which is O(1) for inserts, deletes and gets - because the number of buckets remains proportional to n.  So space is growing but time is not.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial