Solved

Hash table lookup time

Posted on 2007-11-18
8
253 Views
Last Modified: 2010-04-15
I know that a regular hash table usually has O(1) lookup time, but of course this sometimes degrades due to collisions.

But the original paper on Linear Hashing says that Linear Hashing usually gets O(1) lookup time:

"A new bucket has therefore been appended to the last bucket of the file to which all the records have been moved in one access.  Since for all these records the bucket 100 is henceforward the primary bucket, they are all accessible in one access."

This seems to be saying that all the records which reside in buckets which were appended due to splits are accessible in O(1) time.  But this doesn't seem to be the case.  In order to access data in an appended bucket, first you need to look in the base region, and only then, if the key is not found, perform the second hash function and look in the extended region.  But how is this "one access", as the above quote says?  Shouldn't all records in the extended region require O(2) access time?
0
Comment
Question by:chsalvia
  • 5
  • 3
8 Comments
 
LVL 53

Accepted Solution

by:
Infinity08 earned 250 total points
ID: 20309405
>> But how is this "one access", as the above quote says?

Although you might have to use several hashing functions to calculate the bucket address, you only have to access one bucket. Because when you've found the address, you know the value you're looking for will be at that address.

So, you start with h0. If you need to, you use h1, h2 etc. untill you find the bucket address you need. Then you access the bucket and get the value out of it.
0
 

Author Comment

by:chsalvia
ID: 20309429
But wouldn't you still need to access the address in the base region to compare your key to see if it's there?

Say you have a table of size 10, and you append 2 buckets after 2 splits occur.  So now the split address is 2, and so buckets 0 and 1 are addressed with h1, while buckets 2 through 9 are addressed with h0.

Now say you lookup a key.  So you do h0(key).  And suppose this gives you bucket 0, which is below the split address.  Therefore, the key you're looking for could be in two places.  It might be in bucket 0, or it might be in bucket 10.  If it's not in bucket 0, you need call h1(key) to get to bucket 10.  Isn't this two accesses?  You have to access bucket 0 to check for the key, and then access bucket 10 since you don't find it in bucket 0.
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20309442
>> If it's not in bucket 0, you need call h1(key) to get to bucket 10.  Isn't this two accesses?

You don't need to check bucket 0 if you don't want to ... you can immediately apply h1 to know for sure which bucket it will be in. All depends on which is faster, and which is the more plausible :)


btw O(2) doesn't "exist" - it's the same as O(1).
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20309443
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:chsalvia
ID: 20309465
Okay.  Just to clarify, you mentioned that it could take several hash functions to calculate an address.  But wouldn't it only ever take 2 hash functions at max?

I thought it worked as follows:  You start with a base table of N buckets, and your base hash functions is h0.  All appended buckets are addressed with h1.

After N splits, the table size is N * 2 and now you go up another "level".  So now your "base" hash function is h1, and all appended buckets are addressed with h2.  Then after another N * 2 splits, your table size is N * 4, and your base hash function becomes h2, and all appended buckets are addressed with h3, and so on...

So wouldn't you only need two hash functions max to calculate any address, no matter how big the table got?
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20309483
>> But wouldn't it only ever take 2 hash functions at max?

Yes, you are absolutely right ;)
0
 

Author Comment

by:chsalvia
ID: 20309526
Actually, now that I look at it, what you originally said seems to be correct.  Because, for example, suppose you're up to h10.

hx = hx-1(key) + 2^(x-1) * N

So, if you're up to h10 you'd have

h10 = h9(key) + 2^9 * N

and h9 is h8(key) + 2^8 * N

Therefore, we have a recursive operation here.  h10 would actually be 10 different hash functions, as you retrace back to h0.  Right?
0
 
LVL 53

Expert Comment

by:Infinity08
ID: 20309534
>> hx = hx-1(key) + 2^(x-1) * N

That's just a requirement for the hx function. It does not actually involve recursively calling the other hashing functions. In other words :

h0 will generate 0 ... N-1
h1 will generate 0 ... 2N-1
h2 will generate 0 ... 4N-1
h3 will generate 0 ... 8N-1
etc.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Help with my python script 6 154
sumDigits  challenge 7 75
mapShare challenge 13 91
Re-position sub-options beneath the TAB 7 77
An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address. This address might be address of another variable/address of devices/address of fu…
This theoretical tutorial explains exceptions, reasons for exceptions, different categories of exception and exception hierarchy.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now