asked on

Query Tuning: CBO chooses a Full table Scan over the available indexes and doesn't recommend a new index to be built

I have a query that performs an inner join on 2 large files and has a condition or two in the where clause. When I run this query the optimizer checks the applicable indexes and choose to perform a Full Table Scan over using any of these indexes. For some reason or another it doesn't suggest an index to be built to avoid the table scan.

Any ideas on how I can get the optimizer to use an index? Or even how I can get it to suggest a index to be built?

askanivg

Most cases we have a composite index and in the where clause we have a part of the index. This could be one reason. Also check the order in which indexes are defined and try to have them in the same way in the query.

Second reason which could be happening in your case is that you are using distinct clause which makes the table to have a table scan.

Can you please send the table structure along with indexes and also the query.. Any ways you can try having indexes on the columns present in the where clause and check for some improvement.

HtH

askanivg

you can use the index advisor to suggest the indexes
If you are using UDB then use the
db2advis command.

$db2advis -d <database_name> -i <file_name.sql> -t 1

Above 1 stands for time.

HtH

Jim_McLean

ASKER

Bingo: "Second reason which could be happening in your case is that you are using distinct clause which makes the table to have a table scan."

Is there no way around this? I have muchos data (several millon rows) in these files and the table scan on my DISTINCT query is a real killer.

Lowfatspread

post the query...

with a brief explanation as to what your trying to achieve..

askanivg

In the join try having the table having less number of records as the first table and then the table with more number of records as the second table. Try having a left outer join or an inner join and see if you are getting the desired results.

Or think of any other table which is giving the same output of column1 but you do not need to have a distinct clause. Might be you can use that table as dummy but that would definitely avoid using the distinct on millions of records. Check these possibilities..

HtH

BigSchmuh

Having a Table FullScan is not always bad if your criteria are not restrictive enough.
Generally, db optimizers starts a table FullScan instead of an IndexScan when they estimates they will retrieve more than 15-20% of the table because it is faster to read 1 Go straight than to read 150 Mo 4K by 4K pages.

Other facts:
-Without up to date Statistics (RUNSTATS...), DB2 optimizer is totally blind
-Build your index so that the columns are ordered from most restrictive criteria to the least one
-Keep your indexes small by removing your non restrictive columns (Flags excepted)
-You may add columns from your ORDER BY clause in your index, this is very important if you sort your data (GROUP BY, DISTINCT, ORDER BY)
-Order columns are to be added before the first indexed columns that is not used as an equality criteria (LIKE, OR, BETWEEN,<, >...) otherwise DB2 can not use your index to directly search and sort through the data
-Statistics can be modified to force the DB2 optimizer to choose a specific index
-Rarely, you may build a specific index for a query which contains every selected columns at the end of the previously defined "search and sort" index
-I never "Try" a query, I first think of what should be a good AccessPlan for it and I verify that DB2 does it well

By the way, you probably does NOT need the DISTINCT keyword but I can not investigate with no data...
==> I agree with Lowfatspread, please post the SQL query...

Hope this helps.

Lowfatspread

agree with Bigschmuh
I'm sceptical that you need the distinct until we see your actual query..

Jim_McLean

ASKER

FYI,
I am running this query against a iSeries(AS/400) machine. From my reading I see that there is now way (or need) to update optimizer stats because the OS automatically manages that.

Here is the query:

SELECT DISTINCT b3hdr.bhacct, b3hdr.bhinam, b3sub.bsvnum, b3sub.bsvnm1, b3sub.bsvnm2, b3sub.bscity, b3sub.bsstat, b3sub.bszip
FROM raff01.b3hdr b3hdr INNER JOIN raff01.b3sub b3sub ON bhref=bsref
WHERE bhacct=? AND UPPER(bsvnm1) LIKE '%?%'
ORDER BY bsvnm1;

The b3hdr table is a header table that has sub header records in the b3sub table. So obviously I have a 1 to many in b3hdr to b3sub. Now what I am trying to do is get vendor information from the b3sub table, but the problem is that vendors can be repeated over sub headers. For instance for a single header I could have 10 subheaders. In these subheaders I could have only 2 unique vendors. I'm not quite sure how I could do this without a distinct, but I guess thats why I in here asking questions. :)

Thanks for all those tips BigSchmuh! They'll come in handy for sure.

ASKER CERTIFIED SOLUTION

BigSchmuh

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

BigSchmuh

I forgot to mention that I doubt that the statistics are up to date but I am not very fluent in DB2/OS390 (Same name but definitely not the same product than DB2/UDB)
==> Please check that SYSIBM.SYSTABLES has its CARD and NPAGES filled in

SOLUTION

Lowfatspread

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Jim_McLean

ASKER

Sorry, I've gotten hung up on some other issue... I'll get back to this within a week or two (Hopefully)