I have an overall view about db2 index and its usage. I read this article http://www.dbazine.com/mullins_db2indexing.shtml about db2 index and usage, where i have few doubts.

i am using db2 v7.2 on aix/win2000 with fickpack 11

1)How to decide which index to use (btree or cluster index or any) under what cirsumstances. By looking at a query can we decide or any other information is required to decide this. some example's of query and how to decide this."few points are also mentioned in this site".

2)For all columns in GROUP BY and ORDER BY can i create a cluster index or i need to decide this based on some factors. if ,so what factors should i look and recommend for cluster index.

3)what does "If indexing a table, explicitly create a clustering index" mean ?

please also post some web reference.


Who is Participating?

Improve company productivity with a Business Account.Sign Up

LordSilkConnect With a Mentor Commented:
to make it complete (btw. nice and clear explanation BigSchmuh) :

3/explicitly create a clustering index
If you don't define one of your indexes as clustering (the index by which your data is ordered on disk), DB2 will decide the clustering index for you. It will take the oldest (first created) index as clustering index.
Problem with this is, when you drop your first index and recreate it, it is no longer the oldest. DB2 will then change the clustering index to (again) the oldest index, which can have a big effect on performance.
To avoid this, you can create 1 index as clustering index so you are always in control of which index is determining the sorting order of your data on disk.


BigSchmuhConnect With a Mentor Commented:
1/ Some definitions may help
BTree index is an efficient way to store a tree. This means you can browse it from lowest to highest and back.
Other kind of index generally does not allow to browse but are faster to retrieve one single value. Hash index are good examples.
A Clustered index is an index which is although used to sort the data pages.
Data pages should be kept very dense to avoid fragmentation.
Indexes based on non selective columns are never used because query optimizer detects it will retrieve more than 12-15% of the overall data pages and start with a full scan (which is faster in this case because it is faster to read straight forward a 1Go file than reading 15% of it by small chuncks).

2/ What and when to use indexes
a) Depend on your query
Looking at a specific query, you will find that joining, criteria, grouping or sorting columns are good candidates.
==> The problem is you must "guess" how many rows are to be retrieved at every step of your query to create the best indexes...
Some rules are:
-Criteria columns are to be considered only if they are restrictive enough (<15% of the overall data rows)
-You may look at joining columns as criteria column for Inner join
-Create an index using only the most restrictive columns
-Create a cluster index on the columns you use most of the time to sort your data
-Your Grouping columns should although be ordered from the most restrictive to the least one
-A "criteria" index may be sufixed with Grouping columns and prefixed by Sorting columns
==> Always keep in mind an index is unusefull if not restrictive enough

b) Depend on how many users and updates
To avoid contention, you may use a different clustered index if too many users modified rows on the same data pages than the sorting one.
To allow for faster modifications, you may keep only the necessary indexes.

Hope this helps.
banumaranAuthor Commented:

Thank you BigSchmuh & silk for your valuable guidence
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.