<

Go Premium for a chance to win a PS4. Enter to Win

x

Why does Oracle ignore my index?

Published on
18,431 Points
6,731 Views
7 Endorsements
Last Modified:
Awarded
Why doesn't the Oracle optimizer use my index?


Querying too much data


Most Oracle developers know that an index is useful when you can use it to restrict your result set to a small number of the total rows in a table.

So, the obvious side-effect of this is, the optimizer may ignore your index if your query will end up returning most of the rows anyway.  Then the optimizer may decide it's less resource intensive  to simply read the whole table.  That's sort of intuitively obvious to understand, but may not be obvious when writing the query.

Let's look at two of the more interesting, yet still common ways indexes are ignored.

Why might a "narrow" query still ignore your index?

Index points to data in too many table blocks


You might think of result sets in terms of rows coming from disk; but Oracle reads the data in "blocks".  So, lets say you write a query that should return just 1 percent of the total number of rows.  That would seem like a good index candidate; but, if those 1% happen to be spread evenly across all of the blocks on disk; then the optimizer will very likely decide that your index isn't worth it.

In this case, the optimizer would see that the total number of disk reads will be approximately equal to the number of disk reads of a full table scan.  In that case, it's faster to ignore the index and read the table sequentially rather than hop around on the disk following index order.

Even worse, by reading in index order you can actually cause double disk hits.
Here's a generic example of how this might happen.

Lets say my table takes up 10 blocks on disk and each block holds about 1000 rows (for a total of 10,000 rows.)  I write a query that will return 50 rows, just one half of one percent of the total number of rows in the table.  If those rows are distributed evenly, 5 rows per block then the engine would have to read the whole table anyway and should ignore the index.  But, if we force the use of our index (probably by HINTing) then things may get even worse.

The index order most likely will not correspond to the block order (amazing coincidence if it was.)  So, reading the blocks in index order may look something like this: 1,2,0,3,1,3,4,6,7,8,9,4,5,3,2,5,6,0,7,1...
You can probably see where this going.  Depending on the amount of available memory some of the blocks may get swapped out.  In this example, block 0 is read on the third pass,  it might stick around for a while but as memory fills with the other blocks, block-0 will move down the LRU (least recently used) list and eventually fall off and get swapped out of memory.  So, on the 18th index hit, we come back to block-0 and have to read it from disk again.  A double i/0 hit.

If we had used a full table scan to begin with, then we would have performed 10 sequential block reads: 0,1,2,3,4,5,6,7,8,9 and been done.

The most frequently seen example of ignoring indexes in favor of full table scans are for small "lookup" tables.  In the ideal case, the entire table is read in single pass, adding an index operation just adds extra work.  In this case it will be more efficient to simply FULL TABLE SCAN the whole thing rather than using an index.  

If you want to gauge for yourself how many blocks are involved for a query, try the techniques below.  The first statement determines blocks accessed by your query, the second the total number of blocks in the table.  Note, the second query may not be necessary; just look at dba_tables the "blocks" column should have the same number or close to it if doing estimates and your stats are fairly recent.

SELECT COUNT(DISTINCT dbms_rowid.rowid_block_number(ROWID)) blocks 
  FROM my_big_table
 WHERE my_big_table.column1 = 'something' 
   AND my_big_table.column2 = 'something else'


SELECT COUNT(DISTINCT dbms_rowid.rowid_block_number(ROWID)) blocks 
  FROM my_big_table

Open in new window


The Oracle optimizer doesn't do these block counts for every query and every index.  Instead index statistics include an attribute called "clustering factor" (visible in DBA/ALL/USER_INDEXES).  This number will be somewhere between the number of blocks in the table and the number rows in the table.  As the number approaches the number of blocks (or smaller), it means the amount of data clustering in the table, relative to that index increases.  Conversely, as the factor approaches the number of rows, the data  becomes scattered relative to the index.  In terms of performance choices of the optimizer.  Highly clustered data means a leaf block on the index will point to more data but with fewer data blocks in the table.  Thus making index access paths more efficient.

The mini-example above may seem rare and contrived but it's a surprisingly common scenario to come across in real applications.  A simple example would be time-based data.  A table of utility meters for kilowatts each hour, customer order lines per day, medical monitors patient vitals per minute, etc.  A simplified vesion of these types of tables might look something like this...

CREATE TABLE measurement
(
    meter_id    INTEGER,
    meter_time  DATE,
    meter_value NUMBER,
    CONSTRAINT pk_measurement PRIMARY KEY(meter_id, meter_time)
);

Open in new window


Next will simulate hourly data loading for 1000 hours (a little over a month's worth of data for each meter.)  The data is ordered by meter_time on insert to simulate that the data would be loaded incrementally over time.  1am first, and hour later 2am, an hour later the 3am measurement,  etc.  Then we'll gather statistics to give the optimizer information about the table and primary key index.

INSERT INTO measurement(meter_id, meter_time, meter_value)
    SELECT meter_id, meter_time, MOD(ROWNUM, 97) meter_value
      FROM (SELECT LEVEL meter_id
              FROM DUAL
            CONNECT BY LEVEL <= 100),
           (SELECT TRUNC(SYSDATE, 'yyyy') + LEVEL / 24 meter_time
              FROM DUAL
            CONNECT BY LEVEL < 1000)
    ORDER BY meter_time, meter_id;

BEGIN
    DBMS_STATS.gather_table_stats(
        ownname            => USER,
        tabname            => 'MEASUREMENT',
        estimate_percent   => NULL,
        cascade            => TRUE
    );
END;

Open in new window


Now, looking up information for a single meter produces a full table scan even though meter_id is the leading column of the primary key.

EXPLAIN PLAN
    FOR
        SELECT *
          FROM measurement
         WHERE meter_id = 50;

SELECT * FROM TABLE(DBMS_XPLAN.display);

Plan hash value: 190790823
 
---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |   999 | 13986 |   103   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MEASUREMENT |   999 | 13986 |   103   (1)| 00:00:02 |
---------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("METER_ID"=50)

Open in new window



Using the same table, I'll add a new index on time only.  Then I'll write a query that returns over 10 times the amount of data as the previous query but this one will use the index.

CREATE INDEX idx_measurement_time
    ON measurement(meter_time);

EXPLAIN PLAN
    FOR
        SELECT *
          FROM measurement
         WHERE meter_time > TRUNC(SYSDATE, 'yyyy') + 100 / 24
           AND meter_time < TRUNC(SYSDATE, 'yyyy') + 200 / 24;

SELECT * FROM TABLE(DBMS_XPLAN.display);

Plan hash value: 1512076386
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      | 10010 |   136K|    60   (0)| 00:00:01 |
|*  1 |  FILTER                      |                      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| MEASUREMENT          | 10010 |   136K|    60   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IDX_MEASUREMENT_TIME | 10010 |       |    28   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(TRUNC(SYSDATE@!,'fmyyyy')+4.16666666666666666666666666666666666667<TRUNC(SYSDAT
              E@!,'fmyyyy')+8.33333333333333333333333333333333333333)
   3 - access("METER_TIME">TRUNC(SYSDATE@!,'fmyyyy')+4.16666666666666666666666666666666666667
               AND "METER_TIME"<TRUNC(SYSDATE@!,'fmyyyy')+8.33333333333333333333333333333333333333)

Open in new window


Why does this happen?  Adding more data "should" make the index less useful, right?
If the small query fails to use an index, then making it bigger should be even less likekly.

SELECT index_name, clustering_factor
  FROM user_indexes
 WHERE table_name = 'MEASUREMENT';
 
INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ -----------------
PK_MEASUREMENT                             26200
IDX_MEASUREMENT_TIME                         316

Open in new window


...and there's the answer.  The Primary points to data scattered throughout the table.  Whereas the time-based index points to data that is grouped together.  Using the queries above we can find exactly how many table blocks will be accessed to find the results.

All blocks in the table...

SELECT COUNT(DISTINCT DBMS_ROWID.rowid_block_number(ROWID)) blocks FROM measurement;

    BLOCKS
----------
       262

Open in new window

        
Blocks needed to read one meter... Note, it's the same, meaning this a worst case scenario.  The data is distributed uniformly across the entire table.  The index will never help because it'll have read the whole table anyway.
        
SELECT COUNT(DISTINCT DBMS_ROWID.rowid_block_number(ROWID)) blocks
  FROM measurement
 WHERE meter_id = 50;

     BLOCKS
----------
       262

Open in new window

        
        
Comparing the above to the time based query, we see dramatically fewer blocks accessed.

SELECT COUNT(DISTINCT DBMS_ROWID.rowid_block_number(ROWID)) blocks
  FROM measurement
 WHERE meter_time > TRUNC(SYSDATE, 'yyyy') + 100 / 24
   AND meter_time < TRUNC(SYSDATE, 'yyyy') + 200 / 24;

    BLOCKS
----------
        27   

Open in new window


This is an extreme example, but the same types of differences can and will occur anytime there is special grouping of data.  These were time based, but the same thing would happen if you loaded customer orders by customer id.  First all orders and line items for customer 1,  then all orders for customer 2, then customer 3, etc.

If you then try to index by order quantity, or popular line items, your index will be less useful than one where you index by the customer id.  To resolve this you could unload your data and then reload it sorted by quantity or item but then your indexes on customer will be less effective.  Alternately (if you have a license for it) you could partition by customer and then index by item.  Thus allowing the optimizer to take advantage of partition pruning and table clustering.

Functions applied to indexed columns


For the final index problem, let's take a small subset of the data in our measurement table.  Let's query a single day's worth of data.  This will return 2400 rows.  Once again, a small amount, and it's a time based query.  This should be nearly the ideal.


EXPLAIN PLAN
    FOR
        SELECT *
          FROM measurement
         WHERE TRUNC(meter_time) = TRUNC(SYSDATE, 'yyyy') + 10;
		 
SELECT * FROM TABLE(DBMS_XPLAN.display);		

Plan hash value: 190790823
 
---------------------------------------------------------------------------------
| Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |   999 | 13986 |   108   (6)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MEASUREMENT |   999 | 13986 |   108   (6)| 00:00:02 |
---------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(TRUNC(INTERNAL_FUNCTION("METER_TIME"))=TRUNC(SYSDATE@!,'fmy
              yyy')+10)

Open in new window


Unfortunately, this simple query, once again, ignores our indexes.  Why?  Because we are applying a function to the indexed column.

Indexes are like special tables, they have data values stored in them.

meter_time - these table values are stored in the index.
TRUNC(meter_time)  - this is not a value stored in the index.

In order for the index to be used, it would have to apply the TRUNC function to every value in all of the leave nodes to determine which applied.  Oracle will not do this.

As a solution,  you can either create a function-based index (that is an index where the function results have already been determined and stored.

CREATE INDEX idx_measurement_day
    ON measurement(trunc(meter_time));         
    
    
EXPLAIN PLAN
    FOR
        SELECT *
          FROM measurement
         WHERE TRUNC(meter_time) = TRUNC(SYSDATE, 'yyyy') + 10;
         
SELECT * FROM TABLE(DBMS_XPLAN.display); 

Plan hash value: 242868325
 
---------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                     |   999 | 13986 |     8   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MEASUREMENT         |   999 | 13986 |     8   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_MEASUREMENT_DAY |   400 |       |     6   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(TRUNC(INTERNAL_FUNCTION("METER_TIME"))=TRUNC(SYSDATE@!,'fmyyyy')+10)        

Open in new window



Or, as alternative, simply DON'T use functions on the indexed columns.  Write the query in such a way that the same results can be achieved without executing a function.

EXPLAIN PLAN
    FOR
        SELECT *
          FROM measurement
         WHERE meter_time >= TRUNC(SYSDATE, 'yyyy') + 10
           AND meter_time < TRUNC(SYSDATE, 'yyyy') + 11;

SELECT * FROM TABLE(DBMS_XPLAN.display);

Plan hash value: 1512076386
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |  2502 | 35028 |    16   (0)| 00:00:01 |
|*  1 |  FILTER                      |                      |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| MEASUREMENT          |  2502 | 35028 |    16   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | IDX_MEASUREMENT_TIME |  2502 |       |     8   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(TRUNC(SYSDATE@!,'fmyyyy')+10<TRUNC(SYSDATE@!,'fmyyyy')+11)
   3 - access("METER_TIME">=TRUNC(SYSDATE@!,'fmyyyy')+10 AND 
              "METER_TIME"<TRUNC(SYSDATE@!,'fmyyyy')+11)
       filter(TRUNC(INTERNAL_FUNCTION("METER_TIME"))>=TRUNC(TRUNC(SYSDATE@!,'fmyyyy')+10) 
              AND TRUNC(INTERNAL_FUNCTION("METER_TIME"))<=TRUNC(TRUNC(SYSDATE@!,'fmyyyy')+11))

Open in new window



The above scenarios are the most common reasons I see on in my own work and here on Experts-Exchange where indexes aren't used.  To summarize:

1 - selecting a lot of the table data, more than can be efficiently accessed via index.
2 - the table data is scattered such that an index will end up visiting most/all of the table anyway
3 - a function is applied to indexed columns, thereby "disqualifying" that index

I hope you found this helpful.
Thanks for reading
7
Comment
Author:sdstuber
1 Comment
 
LVL 16

Expert Comment

by:Wasim Akram Shaik
Superb...Excellent article to understand how indexes works..!!!!
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Join & Write a Comment

This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
This video explains at a high level about the four available data types in Oracle and how dates can be manipulated by the user to get data into and out of the database.

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month