Link to home
Create AccountLog in
Avatar of cyber-33
cyber-33Flag for United States of America

asked on

Hash clusters in IBM DB2

Does IBM DB2 support the ability to store and retrieve rows according to a hash function (a.k.a. "hash clusters")?

Generally, databases locate rows using key values that the database stores in a separate index. Using the "hash clusters" feature, a database can physically store the rows of a table in a hash cluster and retrieve these rows according to a hash function. Very cool :)

Thank you.
Avatar of momi_sabag
Flag of United States of America image

Link to home
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Are you asking about hash joins? The IBM database product DB2 has such a feature available
and is explained here in this PDF:
Avatar of cyber-33


momi_sabag  and wolfgang_93 - thank you for your suggestions.

It looks like key sequence table and range-clustered tables do NOT involve hashing (although the idea seems to be very similar). Here is what IBM says on point:

Hash Joins also do not allow to store and retrieve records accroding to a hash function - instead it seems to be an efficient method of performing join operations on the matching columns.

Ok, getting back to your original question...

DB2 does have an option whereby you can have a clustered index, which I believe
answers 1/2 of your question as I do not believe it uses a hash method for doing
searches on it. See:

My understanding in DB2 is that the method of searching of any index is that the indexes is
built up as a structure called a balanced B-tree which allows for very efficient and optimal
searching. Though a hash index seems cool and fast, it does not approach the efficiency
of a balanced B-tree index.
Wolfgang 93,

The reference that you provided describes clustered indexes. The indexes store only record IDs and not the entire records. To retreive information, the database has to: (1) locate the id in the index and (2) retrieve the values according to the id. This is exactly what hash clusters are designed to avoid.

As explained in my question above, the data retrieval of hash indexes can be performed in a single step, because the entire record (as opposed to just ids) is stored within the cluster.

momi_sabag had correctly identified the concept. Unfortunately his solution did not use hashing, but his answer was the closest.