Before explain hash table, lets explain hash.

Hash is a number or a small set of numbers, calculated from a large set of numbers.

Supose you have a phonebook, composed by names and phone numbers and for each name you have a hash number. For example, if we add the code of the characters of each name, probably we have different results for all the names.

A=1, B=2, ..., Z=26

JOHN --> 10 + 15 + 8 + 14 = 47

ANNE --> 1 + 14 + 14 + 5 = 34

There are special algorithms to calculate hash numbers such that they are unique values or with few chances of equal hash numbers for different names.

Now suppose we have a phonebook where we have hash numbers as keys, something like:

01 0

02 0

:

:

33 0

34 2337=3431

35 0

:

:

47 2344-7839

In a common phonebook we have the names in alphabetic order and respectives numbers associated to them. If we want a phone number, the searching algorithm will scan the list in a character by character basis, until finding the name, then returning the phone number associated to such name.

In our hash oriented phonebook the algorithm gets the name, say JOHN, calculates its hash number, 34 and returns the phone number at register 34. No searching.

Easy to understand that the second algorithm is faster than the first one. There is a price to that: a lot of cells will be empty, and the database size is larger that the one of the traditional phonebook. Hash tables waste more memory and disk space, because the table is sparse. Optimized hashing can reduce the database size, but it will be larger than the traditional one.

Probably you will ask if different names can have the same hash number. Yes, it happens, but there are techniques to solve the collisions.

Now suppose we have much more data than a single phone number, say, for each name we have a street addres, security number, credit card number, e-mail, father's name, etc. We can't waste disk space with so large empty records, so we use a trick: we have a hash table as below:

01 0

02 0

:

:

33 0

34 1

35 0

:

:

47 2

such that if we want JONH's data, the algorithm returns "2", not the data. Then we use such number, 2, as a key number in another table, as below:

1 JONH 2344-7839 4532 Rodeo Street VISA 12345678

2 MARY 2337=3431 1234 Durango Rd MC 98765432

3 etc.

and find the data we are looking for.

That way, the hash table, which wastes space with empty registers, is small because stores just small key numbers. The real data is in a dense table, so no wasting room in the large data records.

By using hash tables, we find data very quickly and with the described trick, by wasting very few disk room.

Jose

Hash is a number or a small set of numbers, calculated from a large set of numbers.

Supose you have a phonebook, composed by names and phone numbers and for each name you have a hash number. For example, if we add the code of the characters of each name, probably we have different results for all the names.

A=1, B=2, ..., Z=26

JOHN --> 10 + 15 + 8 + 14 = 47

ANNE --> 1 + 14 + 14 + 5 = 34

There are special algorithms to calculate hash numbers such that they are unique values or with few chances of equal hash numbers for different names.

Now suppose we have a phonebook where we have hash numbers as keys, something like:

01 0

02 0

:

:

33 0

34 2337=3431

35 0

:

:

47 2344-7839

In a common phonebook we have the names in alphabetic order and respectives numbers associated to them. If we want a phone number, the searching algorithm will scan the list in a character by character basis, until finding the name, then returning the phone number associated to such name.

In our hash oriented phonebook the algorithm gets the name, say JOHN, calculates its hash number, 34 and returns the phone number at register 34. No searching.

Easy to understand that the second algorithm is faster than the first one. There is a price to that: a lot of cells will be empty, and the database size is larger that the one of the traditional phonebook. Hash tables waste more memory and disk space, because the table is sparse. Optimized hashing can reduce the database size, but it will be larger than the traditional one.

Probably you will ask if different names can have the same hash number. Yes, it happens, but there are techniques to solve the collisions.

Now suppose we have much more data than a single phone number, say, for each name we have a street addres, security number, credit card number, e-mail, father's name, etc. We can't waste disk space with so large empty records, so we use a trick: we have a hash table as below:

01 0

02 0

:

:

33 0

34 1

35 0

:

:

47 2

such that if we want JONH's data, the algorithm returns "2", not the data. Then we use such number, 2, as a key number in another table, as below:

1 JONH 2344-7839 4532 Rodeo Street VISA 12345678

2 MARY 2337=3431 1234 Durango Rd MC 98765432

3 etc.

and find the data we are looking for.

That way, the hash table, which wastes space with empty registers, is small because stores just small key numbers. The real data is in a dense table, so no wasting room in the large data records.

By using hash tables, we find data very quickly and with the described trick, by wasting very few disk room.

Jose