Query Tuning: CBO chooses a Full table Scan over the available indexes and doesn't recommend a new index to be built

I have a query that performs an inner join on 2 large files and has a condition or two in the where clause.  When I run this query the optimizer checks the applicable indexes and choose to perform a Full Table Scan over using any of these indexes.  For some reason or another it doesn't suggest an index to be built to avoid the table scan.

Any ideas on how I can get the optimizer to use an index?  Or even how I can get it to suggest a index to be built?
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Most cases we have a composite index and in the where clause we have a part of the index. This could be one reason.  Also check the order in which indexes are defined and try to have them in the same way in the query.

Second reason which could be happening in your case is that you are using distinct clause which makes the table to have a table scan.

Can you please send the table structure along with indexes and also the query.. Any ways you can try having indexes on the columns present in the where clause and check for some improvement.

you can use the index advisor to suggest the indexes
If you are using UDB then use the
db2advis command.

$db2advis -d <database_name> -i <file_name.sql> -t 1

Above 1 stands for time.

Jim_McLeanAuthor Commented:
Bingo: "Second reason which could be happening in your case is that you are using distinct clause which makes the table to have a table scan."

Is there no way around this?  I have muchos data (several millon rows) in these files and the table scan on my DISTINCT query is a real killer.
Learn SQL Server Core 2016

This course will introduce you to SQL Server Core 2016, as well as teach you about SSMS, data tools, installation, server configuration, using Management Studio, and writing and executing queries.

post the query...

with a brief explanation as to what your trying to achieve..
In the join try having the table having less number of records as the first table and then the table with more number of records as the second table. Try having a left outer join or an inner join and see if  you are getting the desired results.

Or think of any other table which is giving the same output of column1 but you do not need to have a distinct clause. Might be you can use that table as dummy but that would definitely avoid using the distinct on millions of records. Check these possibilities..

Having a Table FullScan is not always bad if your criteria are not restrictive enough.
Generally, db optimizers starts a table FullScan instead of an IndexScan when they estimates they will retrieve more than 15-20% of the table because it is faster to read 1 Go straight than to read 150 Mo 4K by 4K pages.

Other facts:
-Without up to date Statistics (RUNSTATS...), DB2 optimizer is totally blind
-Build your index so that the columns are ordered from most restrictive criteria to the least one
-Keep your indexes small by removing your non restrictive columns (Flags excepted)
-You may add columns from your ORDER BY clause in your index, this is very important if you sort your data (GROUP BY, DISTINCT, ORDER BY)
-Order columns are to be added before the first indexed columns that is not used as an equality criteria (LIKE, OR, BETWEEN,<, >...) otherwise DB2 can not use your index to directly search and sort through the data
-Statistics can be modified to force the DB2 optimizer to choose a specific index
-Rarely, you may build a specific index for a query which contains every selected columns at the end of the previously defined "search and sort" index
-I never "Try" a query, I first think of what should be a good AccessPlan for it and I verify that DB2 does it well

By the way, you probably does NOT need the DISTINCT keyword but I can not investigate with no data...
==> I agree with Lowfatspread, please post the SQL query...

Hope this helps.
agree with Bigschmuh
I'm sceptical that you need the distinct until we see your actual query..
Jim_McLeanAuthor Commented:
I am running this query against a iSeries(AS/400) machine.  From my reading I see that there is now way (or need) to update optimizer stats because the OS automatically manages that.

Here is the query:

SELECT DISTINCT b3hdr.bhacct, b3hdr.bhinam, b3sub.bsvnum, b3sub.bsvnm1, b3sub.bsvnm2, b3sub.bscity, b3sub.bsstat, b3sub.bszip
FROM raff01.b3hdr b3hdr INNER JOIN raff01.b3sub b3sub ON bhref=bsref
WHERE bhacct=? AND UPPER(bsvnm1) LIKE '%?%'
ORDER BY bsvnm1;

The b3hdr table is a header table that has sub header records in the b3sub table.  So obviously I have a 1 to many in b3hdr to b3sub.  Now what I am trying to do is get vendor information from the b3sub table, but the problem is that vendors can be repeated over sub headers.  For instance for a single header I could have 10 subheaders.  In these subheaders I could have only 2 unique vendors.  I'm not quite sure how I could do this without a distinct, but I guess thats why I in here asking questions. :)

Thanks for all those tips BigSchmuh!  They'll come in handy for sure.
It looks like your are searching for specific Vendors only and use an Uppercase LIKE match.
==> Optimizers will always estimate this kind of criteria not to be very restrictive

Some more questions:
-Is "b3hdr.bhacct" column a not-too-bad criteria (means less than 10% of the b3hdr rows)
-What is your release of DB2 (DB2-UDB v8 may let you create summary table for this kind of query)

Assuming bhacct is a fair criteria and already has a simple index defined [b3hdr(bhacct) without any more columns], I think DB2 can be helped with an index on b3sub(bsref, bsvnm1).

Anyway, you need a DISTINCT (or GROUP BY) clause so DB2 needs to sort your data and that is now where the performance problem is...
==> We can investigate this as well but we need many info about :
-the Temporary tablespaces created and their associated bufferpools and disk sub-systems
-the SORTHEAP and SHEAPTHRES (DB and DBM) parameters
-the use of your db (Mainly transactional with some huge queries like yours or Datamining only)

Hope this helps.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
I forgot to mention that I doubt that the statistics are up to date but I am not very fluent in DB2/OS390 (Same name but definitely not the same product than DB2/UDB)
==> Please check that SYSIBM.SYSTABLES has its CARD and NPAGES filled in
Ok it looks like you need the distinct....

I think the iSeries(AS/400)  uses another DB2 variant... and isn't quite the same as the OS/390 version either..

If your DB2 automatically keeps the statistics up to date then fine, however this isn't the behaviour I'm used to (DB2/OS390)...

what type of indexes do you have ?
(binary radix or vector encoded)

please post the Table and Index DDL (structures)
any chance you could store the name on the row in upper case (as well) to allow for easier index searching?
any chance you can normalise the design to have the vendor information on a separate table and just referenced?

Jim_McLeanAuthor Commented:
Sorry, I've gotten hung up on some other issue... I'll get back to this within a week or two (Hopefully)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.