Why does Oracle ignore my index?

AID: 8393
  • Status: Published

8640 points

  • By
  • TypeFAQs
  • Posted on2011-10-24 at 04:07:56
Awards
  • Community Pick
  • Experts Exchange Approved
Why doesn't the Oracle optimizer use my index?


Querying too much data



Most Oracle developers know that an index is useful when you can use it to restrict your result set to a small number of the total rows in a table.

So, the obvious side-effect of this is, the optimizer may ignore your index if your query will end up returning most of the rows anyway.  Then the optimizer may decide it's less resource intensive  to simply read the whole table.  That's sort of intuitively obvious to understand, but may not be obvious when writing the query.

Let's look at two of the more interesting, yet still common ways indexes are ignored.

Why might a "narrow" query still ignore your index?

Index points to data in too many table blocks



You might think of result sets in terms of rows coming from disk; but Oracle reads the data in "blocks".  So, lets say you write a query that should return just 1 percent of the total number of rows.  That would seem like a good index candidate; but, if those 1% happen to be spread evenly across all of the blocks on disk; then the optimizer will very likely decide that your index isn't worth it.

In this case, the optimizer would see that the total number of disk reads will be approximately equal to the number of disk reads of a full table scan.  In that case, it's faster to ignore the index and read the table sequentially rather than hop around on the disk following index order.

Even worse, by reading in index order you can actually cause double disk hits.
Here's a generic example of how this might happen.

Lets say my table takes up 10 blocks on disk and each block holds about 1000 rows (for a total of 10,000 rows.)  I write a query that will return 50 rows, just one half of one percent of the total number of rows in the table.  If those rows are distributed evenly, 5 rows per block then the engine would have to read the whole table anyway and should ignore the index.  But, if we force the use of our index (probably by HINTing) then things may get even worse.

The index order most likely will not correspond to the block order (amazing coincidence if it was.)  So, reading the blocks in index order may look something like this: 1,2,0,3,1,3,4,6,7,8,9,4,5,3,2,5,6,0,7,1...
You can probably see where this going.  Depending on the amount of available memory some of the blocks may get swapped out.  In this example, block 0 is read on the third pass,  it might stick around for a while but as memory fills with the other blocks, block-0 will move down the LRU (least recently used) list and eventually fall off and get swapped out of memory.  So, on the 18th index hit, we come back to block-0 and have to read it from disk again.  A double i/0 hit.

If we had used a full table scan to begin with, then we would have performed 10 sequential block reads: 0,1,2,3,4,5,6,7,8,9 and been done.

The most frequently seen example of ignoring indexes in favor of full table scans are for small "lookup" tables.  In the ideal case, the entire table is read in single pass, adding an index operation just adds extra work.  In this case it will be more efficient to simply FULL TABLE SCAN the whole thing rather than using an index.  

If you want to gauge for yourself how many blocks are involved for a query, try the techniques below.  The first statement determines blocks accessed by your query, the second the total number of blocks in the table.  Note, the second query may not be necessary; just look at dba_tables the "blocks" column should have the same number or close to it if doing estimates and your stats are fairly recent.

SELECT COUNT(DISTINCT dbms_rowid.rowid_block_number(ROWID)) blocks 
  FROM my_big_table
 WHERE my_big_table.column1 = 'something' 
   AND my_big_table.column2 = 'something else'


SELECT COUNT(DISTINCT dbms_rowid.rowid_block_number(ROWID)) blocks 
  FROM my_big_table
                                    
1:
2:
3:
4:
5:
6:
7:
8:

Select allOpen in new window



The Oracle optimizer doesn't do these block counts for every query and every index.  Instead index statistics include an attribute called "clustering factor" (visible in DBA/ALL/USER_INDEXES).  This number will be somewhere between the number of blocks in the table and the number rows in the table.  As the number approaches the number of blocks (or smaller), it means the amount of data clustering in the table, relative to that index increases.  Conversely, as the factor approaches the number of rows, the data  becomes scattered relative to the index.  In terms of performance choices of the optimizer.  Highly clustered data means a leaf block on the index will point to more data but with fewer data blocks in the table.  Thus making index access paths more efficient.

The mini-example above may seem rare and contrived but it's a surprisingly common scenario to come across in real applications.  A simple example would be time-based data.  A table of utility meters for kilowatts each hour, customer order lines per day, medical monitors patient vitals per minute, etc.  A simplified vesion of these types of tables might look something like this...

CREATE TABLE measurement
(
    meter_id    INTEGER,
    meter_time  DATE,
    meter_value NUMBER,
    CONSTRAINT pk_measurement PRIMARY KEY(meter_id, meter_time)
);
                                    
1:
2:
3:
4:
5:
6:
7:

Select allOpen in new window



Next will simulate hourly data loading for 1000 hours (a little over a month's worth of data for each meter.)  The data is ordered by meter_time on insert to simulate that the data would be loaded incrementally over time.  1am first, and hour later 2am, an hour later the 3am measurement,  etc.  Then we'll gather statistics to give the optimizer information about the table and primary key index.

INSERT INTO measurement(meter_id, meter_time, meter_value)
    SELECT meter_id, meter_time, MOD(ROWNUM, 97) meter_value
      FROM (SELECT LEVEL meter_id
              FROM DUAL
            CONNECT BY LEVEL <= 100),
           (SELECT TRUNC(SYSDATE, 'yyyy') + LEVEL / 24 meter_time
              FROM DUAL
            CONNECT BY LEVEL < 1000)
    ORDER BY meter_time, meter_id;

BEGIN
    DBMS_STATS.gather_table_stats(
        ownname            => USER,
        tabname            => 'MEASUREMENT',
        estimate_percent   => NULL,
        cascade            => TRUE
    );
END;
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:

Select allOpen in new window



Now, looking up information for a single meter produces a full table scan even though meter_id is the leading column of the primary key.

EXPLAIN PLAN
    FOR
        SELECT *
          FROM measurement
         WHERE meter_id = 50;

SELECT * FROM TABLE(DBMS_XPLAN.display);

Plan hash value: 190790823
 
---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |   999 | 13986 |   103   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MEASUREMENT |   999 | 13986 |   103   (1)| 00:00:02 |
---------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("METER_ID"=50)
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:

Select allOpen in new window




Using the same table, I'll add a new index on time only.  Then I'll write a query that returns over 10 times the amount of data as the previous query but this one will use the index.

CREATE INDEX idx_measurement_time
    ON measurement(meter_time);

EXPLAIN PLAN
    FOR
        SELECT *
          FROM measurement
         WHERE meter_time > TRUNC(SYSDATE, 'yyyy') + 100 / 24
           AND meter_time < TRUNC(SYSDATE, 'yyyy') + 200 / 24;

SELECT * FROM TABLE(DBMS_XPLAN.display);

Plan hash value: 1512076386
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      | 10010 |   136K|    60   (0)| 00:00:01 |
|*  1 |  FILTER                      |                      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| MEASUREMENT          | 10010 |   136K|    60   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IDX_MEASUREMENT_TIME | 10010 |       |    28   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(TRUNC(SYSDATE@!,'fmyyyy')+4.16666666666666666666666666666666666667<TRUNC(SYSDAT
              E@!,'fmyyyy')+8.33333333333333333333333333333333333333)
   3 - access("METER_TIME">TRUNC(SYSDATE@!,'fmyyyy')+4.16666666666666666666666666666666666667
               AND "METER_TIME"<TRUNC(SYSDATE@!,'fmyyyy')+8.33333333333333333333333333333333333333)
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:

Select allOpen in new window



Why does this happen?  Adding more data "should" make the index less useful, right?
If the small query fails to use an index, then making it bigger should be even less likekly.

SELECT index_name, clustering_factor
  FROM user_indexes
 WHERE table_name = 'MEASUREMENT';
 
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
PK_MEASUREMENT                             26200
IDX_MEASUREMENT_TIME                         316
                                    
1:
2:
3:
4:
5:
6:
7:
8:

Select allOpen in new window



...and there's the answer.  The Primary points to data scattered throughout the table.  Whereas the time-based index points to data that is grouped together.  Using the queries above we can find exactly how many table blocks will be accessed to find the results.

All blocks in the table...

SELECT COUNT(DISTINCT DBMS_ROWID.rowid_block_number(ROWID)) blocks FROM measurement;

    BLOCKS
----------
       262
                                    
1:
2:
3:
4:
5:

Select allOpen in new window


        
Blocks needed to read one meter... Note, it's the same, meaning this a worst case scenario.  The data is distributed uniformly across the entire table.  The index will never help because it'll have read the whole table anyway.
        
SELECT COUNT(DISTINCT DBMS_ROWID.rowid_block_number(ROWID)) blocks
  FROM measurement
 WHERE meter_id = 50;

     BLOCKS
----------
       262
                                    
1:
2:
3:
4:
5:
6:
7:

Select allOpen in new window

        
        
Comparing the above to the time based query, we see dramatically fewer blocks accessed.

SELECT COUNT(DISTINCT DBMS_ROWID.rowid_block_number(ROWID)) blocks
  FROM measurement
 WHERE meter_time > TRUNC(SYSDATE, 'yyyy') + 100 / 24
   AND meter_time < TRUNC(SYSDATE, 'yyyy') + 200 / 24;

    BLOCKS
----------
        27   
                                    
1:
2:
3:
4:
5:
6:
7:
8:

Select allOpen in new window



This is an extreme example, but the same types of differences can and will occur anytime there is special grouping of data.  These were time based, but the same thing would happen if you loaded customer orders by customer id.  First all orders and line items for customer 1,  then all orders for customer 2, then customer 3, etc.

If you then try to index by order quantity, or popular line items, your index will be less useful than one where you index by the customer id.  To resolve this you could unload your data and then reload it sorted by quantity or item but then your indexes on customer will be less effective.  Alternately (if you have a license for it) you could partition by customer and then index by item.  Thus allowing the optimizer to take advantage of partition pruning and table clustering.

Functions applied to indexed columns



For the final index problem, let's take a small subset of the data in our measurement table.  Let's query a single day's worth of data.  This will return 2400 rows.  Once again, a small amount, and it's a time based query.  This should be nearly the ideal.


EXPLAIN PLAN
    FOR
        SELECT *
          FROM measurement
         WHERE TRUNC(meter_time) = TRUNC(SYSDATE, 'yyyy') + 10;
		 
SELECT * FROM TABLE(DBMS_XPLAN.display);		

Plan hash value: 190790823
 
---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |   999 | 13986 |   108   (6)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MEASUREMENT |   999 | 13986 |   108   (6)| 00:00:02 |
---------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(TRUNC(INTERNAL_FUNCTION("METER_TIME"))=TRUNC(SYSDATE@!,'fmy
              yyy')+10)
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:

Select allOpen in new window



Unfortunately, this simple query, once again, ignores our indexes.  Why?  Because we are applying a function to the indexed column.

Indexes are like special tables, they have data values stored in them.

meter_time - these table values are stored in the index.
TRUNC(meter_time)  - this is not a value stored in the index.

In order for the index to be used, it would have to apply the TRUNC function to every value in all of the leave nodes to determine which applied.  Oracle will not do this.

As a solution,  you can either create a function-based index (that is an index where the function results have already been determined and stored.

CREATE INDEX idx_measurement_day
    ON measurement(trunc(meter_time));         
    
    
EXPLAIN PLAN
    FOR
        SELECT *
          FROM measurement
         WHERE TRUNC(meter_time) = TRUNC(SYSDATE, 'yyyy') + 10;
         
SELECT * FROM TABLE(DBMS_XPLAN.display); 

Plan hash value: 242868325
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |   999 | 13986 |     8   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MEASUREMENT         |   999 | 13986 |     8   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_MEASUREMENT_DAY |   400 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(TRUNC(INTERNAL_FUNCTION("METER_TIME"))=TRUNC(SYSDATE@!,'fmyyyy')+10)        
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:

Select allOpen in new window




Or, as alternative, simply DON'T use functions on the indexed columns.  Write the query in such a way that the same results can be achieved without executing a function.

EXPLAIN PLAN
    FOR
        SELECT *
          FROM measurement
         WHERE meter_time >= TRUNC(SYSDATE, 'yyyy') + 10
           AND meter_time < TRUNC(SYSDATE, 'yyyy') + 11;

SELECT * FROM TABLE(DBMS_XPLAN.display);

Plan hash value: 1512076386
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |  2502 | 35028 |    16   (0)| 00:00:01 |
|*  1 |  FILTER                      |                      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| MEASUREMENT          |  2502 | 35028 |    16   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IDX_MEASUREMENT_TIME |  2502 |       |     8   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(TRUNC(SYSDATE@!,'fmyyyy')+10<TRUNC(SYSDATE@!,'fmyyyy')+11)
   3 - access("METER_TIME">=TRUNC(SYSDATE@!,'fmyyyy')+10 AND 
              "METER_TIME"<TRUNC(SYSDATE@!,'fmyyyy')+11)
       filter(TRUNC(INTERNAL_FUNCTION("METER_TIME"))>=TRUNC(TRUNC(SYSDATE@!,'fmyyyy')+10) 
              AND TRUNC(INTERNAL_FUNCTION("METER_TIME"))<=TRUNC(TRUNC(SYSDATE@!,'fmyyyy')+11))
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:

Select allOpen in new window




The above scenarios are the most common reasons I see on in my own work and here on Experts-Exchange where indexes aren't used.  To summarize:

1 - selecting a lot of the table data, more than can be efficiently accessed via index.
2 - the table data is scattered such that an index will end up visiting most/all of the table anyway
3 - a function is applied to indexed columns, thereby "disqualifying" that index

I hope you found this helpful.
Thanks for reading
Asked On
2011-10-24 at 04:07:56ID8393
Tags

Oracle

,

SQL

,

Tuning

,

Index

,

"not used"

Topic

Oracle Database

Views
2876

Comments

Expert Comment

by: wasimibm on 2011-12-19 at 02:48:33ID: 33795

Superb...Excellent article to understand how indexes works..!!!!

Add your Comment

Please Sign up or Log in to comment on this article.

Join Experts Exchange Today

Gain Access to all our Tech Resources

Get personalized answers

Ask unlimited questions

Access Proven Solutions

Search 3.2 million solutions

Read In-Depth How-To Guides

1000+ articles, demos, & tips

Watch Step by Step Tutorials

Learn direct from top tech pros

And Much More!

Your complete tech resource

See Plans and Pricing

30-day free trial. Register in 60 seconds.

Loading Advertisement...

Top Oracle Database Experts

  1. slightwv

    811,605

    Sage

    12,800 points yesterday

    Profile
    Rank: Genius
  2. sdstuber

    578,296

    Sage

    3,540 points yesterday

    Profile
    Rank: Genius
  3. wasimibm

    159,939

    Guru

    2,100 points yesterday

    Profile
    Rank: Guru
  4. tosse

    83,962

    Master

    1,510 points yesterday

    Profile
    Rank: Master
  5. dvz

    77,992

    Master

    900 points yesterday

    Profile
    Rank: Sage
  6. flow01

    70,666

    Master

    0 points yesterday

    Profile
    Rank: Sage
  7. OP_Zaharin

    66,395

    Master

    0 points yesterday

    Profile
    Rank: Sage
  8. Geert_Gruwez

    66,198

    Master

    800 points yesterday

    Profile
    Rank: Genius
  9. awking00

    63,850

    Master

    0 points yesterday

    Profile
    Rank: Genius
  10. MikeOM_DBA

    55,954

    Master

    10 points yesterday

    Profile
    Rank: Genius
  11. johnsone

    46,104

    0 points yesterday

    Profile
    Rank: Genius
  12. schwertner

    43,375

    0 points yesterday

    Profile
    Rank: Genius
  13. slobaray

    42,921

    20 points yesterday

    Profile
    Rank: Master
  14. Bajwa

    35,334

    1,000 points yesterday

    Profile
  15. gatorvip

    33,868

    0 points yesterday

    Profile
    Rank: Sage
  16. praveencpk

    33,355

    0 points yesterday

    Profile
    Rank: Master
  17. sujith80

    30,896

    0 points yesterday

    Profile
    Rank: Genius
  18. sventhan

    30,548

    0 points yesterday

    Profile
    Rank: Sage
  19. markgeer

    29,292

    0 points yesterday

    Profile
    Rank: Genius
  20. paquicuba

    26,860

    0 points yesterday

    Profile
    Rank: Genius
  21. HainKurt

    24,922

    0 points yesterday

    Profile
    Rank: Genius
  22. Milleniumaire

    20,300

    0 points yesterday

    Profile
    Rank: Sage
  23. ytarkan

    17,352

    0 points yesterday

    Profile
  24. mwvisa1

    15,600

    0 points yesterday

    Profile
    Rank: Genius
  25. angelIII

    14,509

    0 points yesterday

    Profile
    Rank: Elite

Hall Of Fame