Function Based Indexes may drive you crazy.

Javier MoralesOwner
CERTIFIED EXPERT
Oracle DBA teacher and consultant for almost 20 years.
Oracle Certified Professional. Writer of “Optimización SQL en Oracle".
Published:
Updated:
Many people say it’s quite difficult to tune queries that use Function Based Indexes and sometimes it’s because implicit conversions can transform the function used in the query not matching exactly the index function.

Here’s an example: let’s build a table called TEST with rows populated from DBA_OBJECTS, just to ensure we have a few sample rows to play with.

 
SQL> create table test (id number, dt date);
                      
                      Table created.
                      
                      SQL> insert into test select rownum,
                        2  to_date('07/01/2014 13:00','DD/MM/YYYY HH24:MI')+rownum/144
                        3  from dba_objects;
                      
                      98900 rows created.
                      
                      SQL> alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS';
                      
                      Session altered.
                      
                      SQL> select * from test where dt between '10-APR-2014 15:40:00' and '10-APR-2014 17:10:00';
                      
                              ID DT
                      ---------- --------------------
                           13408 10-APR-2014 15:40:00
                           13409 10-APR-2014 15:50:00
                           13410 10-APR-2014 16:00:00
                           13411 10-APR-2014 16:10:00
                           13412 10-APR-2014 16:20:00
                           13413 10-APR-2014 16:30:00
                           13414 10-APR-2014 16:40:00
                           13415 10-APR-2014 16:50:00
                           13416 10-APR-2014 17:00:00
                           13417 10-APR-2014 17:10:00
                      
                      10 rows selected.

Open in new window


Let’s see how it works with common indexes. If we create a standard B+Tree index for the date field, and we query the same range of data of 10 rows, the execution plan looks like this:

 
SQL> explain plan for
                        2  select * from test where dt between '10-APR-2014 15:40:00' and '10-APR-2014 17:10:00';
                      
                      Explained.
                      
                      SQL>  @?/rdbms/admin/utlxpls
                      
                      PLAN_TABLE_OUTPUT
                      ----------------------------------------------------------------------------------------------
                      Plan hash value: 3739404388
                      
                      ----------------------------------------------------------------------------------------------
                      | Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
                      ----------------------------------------------------------------------------------------------
                      |   0 | SELECT STATEMENT            |                |    10 |   220 |     3   (0)| 00:00:01 |
                      |   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |    10 |   220 |     3   (0)| 00:00:01 |
                      |*  2 |   INDEX RANGE SCAN          | IDX_TEST_FECHA |    10 |       |     2   (0)| 00:00:01 |
                      ----------------------------------------------------------------------------------------------
                      
                      Predicate Information (identified by operation id):
                      ---------------------------------------------------
                      
                         2 - access("DT">=TO_DATE(' 2014-04-10 15:40:00', 'syyyy-mm-dd hh24:mi:ss') AND
                                    "DT"<=TO_DATE(' 2014-04-10 17:10:00', 'syyyy-mm-dd hh24:mi:ss'))

Open in new window


Cool, 10 rows, but what if we want a timestamp precision for the filter? Timestamp datatypes allow also fractions of seconds as data precision.

 
SQL> explain plan for
                        2  select * from test where dt between to_timestamp('10-APR-2014 15:40:00') and '10-APR-2014 17:10:00';
                      
                      Explained.
                      
                      SQL> @?/rdbms/admin/utlxpls
                      
                      PLAN_TABLE_OUTPUT
                      ----------------------------------------------------------------------------------------------
                      Plan hash value: 3739404388
                      
                      ----------------------------------------------------------------------------------------------
                      | Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
                      ----------------------------------------------------------------------------------------------
                      |   0 | SELECT STATEMENT            |                |     3 |    66 |    28   (0)| 00:00:01 |
                      |   1 |  TABLE ACCESS BY INDEX ROWID| TEST           |     3 |    66 |    28   (0)| 00:00:01 |
                      |*  2 |   INDEX RANGE SCAN          | IDX_TEST_FECHA |   414 |       |    26   (0)| 00:00:01 |
                      ----------------------------------------------------------------------------------------------
                      
                      Predicate Information (identified by operation id):
                      ---------------------------------------------------
                      
                         2 - access("DT"<=TO_DATE(' 2014-04-10 17:10:00', 'syyyy-mm-dd hh24:mi:ss'))
                             filter(INTERNAL_FUNCTION("DT")>=TO_TIMESTAMP('10-APR-2014 15:40:00'))

Open in new window


Look at the internal function part in the access filter. The optimizer is getting all the dates prior to TO_DATE(' 2014-04-10 17:10:00') using the index and converting all DT field dates of 414 rows with an INTERNAL_FUNCTION (well, we may guess it’s an implicit to_timestamp() function) to filter by the entire range. The estimated number of rows now it’s 3 instead of 10.

Here, the index is only used because one value of the range is set as a date datatype. Just testing filtering by two timestamp dates makes the disaster:

 
SQL> explain plan for
                        2  select * from test
                        3  where dt between to_timestamp('10-APR-2014 15:40:00')
                        4               and to_timestamp('10-APR-2014 17:10:00');
                      
                      Explained.
                      
                      SQL> @?/rdbms/admin/utlxpls
                      
                      PLAN_TABLE_OUTPUT
                      --------------------------------------------------------------------------
                      Plan hash value: 1357081020
                      
                      --------------------------------------------------------------------------
                      | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
                      --------------------------------------------------------------------------
                      |   0 | SELECT STATEMENT  |      |     3 |    66 |    73   (6)| 00:00:01 |
                      |*  1 |  TABLE ACCESS FULL| TEST |     3 |    66 |    73   (6)| 00:00:01 |
                      --------------------------------------------------------------------------
                      
                      Predicate Information (identified by operation id):
                      ---------------------------------------------------
                      
                         1 - filter(INTERNAL_FUNCTION("DT")>=TO_TIMESTAMP('10-APR-2014
                                    15:40:00') AND INTERNAL_FUNCTION("DT")<=TO_TIMESTAMP('10-APR-2014
                                    17:10:00'))

Open in new window


A TEST table full scan. Cost has been increased from 3, to 28 and finally to 73. And so, we followed the standard rules about leaving the field part of the filter condition free of functions. But, well, nothing special by now.

Well, let’s play with Function Based Indexes. What if we create a FBI for being used with the function TO_TIMESTAMP(dt)?


 
SQL> create index idx_fbi_test_dt on test(to_timestamp(dt));
                      create index idx_fbi_test_dt on test(to_timestamp(dt))
                                                           *
                      ERROR at line 1:
                      ORA-01743: only pure functions can be indexed

Open in new window


TO_TIMESTAMP() isn’t pure. Even if it sounds weird, TO_TIMESTAMP() is a very nice function and pretty well implemented, but it’s not deterministic. It means that for the same input values, different results may occur. Keep in mind that timestamp datatypes also contain the GMT location and time may vary on globalization applications depending on the client’s local timezone.

So, let’s assume that our database and application run locally and the application doesn't need globalization support. We need a pure TO_TIMESTAMP function in order to use Function Based Indexes when filtering dates using timestamp values. In other words, in our particular case TO_TIMESTAMP looks quite deterministic for us (in fact, it is as we don’t use different timezones).

 
SQL> create or replace function to_timestamp_det (v_date timestamp) return timestamp DETERMINISTIC
                        2  is
                        3  begin
                        4      return to_timestamp(v_date);
                        5  end;
                        6  /
                      
                      Function created.
                      
                      SQL> create index idx_fbi_test_dt on test(to_timestamp_det(dt));
                      
                      Index created.

Open in new window


Now, we may change the query to use our “pure” TO_TIMESTAMP_DET() function and see how the optimizer is choosing our Function Based Index properly.

 
SQL> explain plan for
                        2  select * from test
                        3  where to_timestamp_det(dt) between to_timestamp_det('10-APR-2014 15:40:00')
                        4                                 and to_timestamp_det('10-APR-2014 17:10:00');
                      
                      Explained.
                      
                      SQL> @?/rdbms/admin/utlxpls
                      
                      PLAN_TABLE_OUTPUT
                      ------------------------------------------------------------------------------------------------
                      Plan hash value: 3697605509
                      
                      ------------------------------------------------------------------------------------------------
                      | Id  | Operation                    | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
                      ------------------------------------------------------------------------------------------------
                      |   0 | SELECT STATEMENT             |                 |   223 |  9812 |     5   (0)| 00:00:01 |
                      |*  1 |  FILTER                      |                 |       |       |            |          |
                      |   2 |   TABLE ACCESS BY INDEX ROWID| TEST            |   223 |  9812 |     5   (0)| 00:00:01 |
                      |*  3 |    INDEX RANGE SCAN          | IDX_FBI_TEST_DT |   401 |       |     3   (0)| 00:00:01 |
                      ------------------------------------------------------------------------------------------------
                      
                      Predicate Information (identified by operation id):
                      ---------------------------------------------------
                      
                         1 - filter("TO_TIMESTAMP_DET"(TO_TIMESTAMP('10-APR-2014
                                    15:40:00'))<="TO_TIMESTAMP_DET"(TO_TIMESTAMP('10-APR-2014 17:10:00')))
                         3 - access("SYS"."TO_TIMESTAMP_DET"(INTERNAL_FUNCTION("DT"))>="TO_TIMESTAMP_DET"(TO_T
                                    IMESTAMP('10-APR-2014 15:40:00')) AND "SYS"."TO_TIMESTAMP_DET"(INTERNAL_FUNCTION("DT"))<
                                    ="TO_TIMESTAMP_DET"(TO_TIMESTAMP('10-APR-2014 17:10:00')))

Open in new window


Please, take a look at the INTERNAL_FUNCTION part. We know that all these internal functions are TO_TIMESTAMP functions, but that’s causing an alteration to the estimation of results. The optimizer expects that Function Based Index returns 401 rows and a match with the filter operation by 223 rows. We know already that it’s only 10 rows all the time.

So, let’s go further in that general INTERNAL_FUNCTION behavior.

The following Function Based Index will set a text value ‘NULL’ for the NULL values. That would allow us to locate NULL values through the Function Based Index.

 
SQL> select * from test
                        2  where NVL(TO_CHAR(DT),'NULL') = 'NULL';
                      
                      no rows selected
                      
                      SQL> create index fbi_date on test(NVL(TO_CHAR(DT),'NULL'));
                      
                      Index created.
                      
                      SQL> explain plan for
                        2  select * from test
                        3  where NVL(TO_CHAR(DT),'NULL') = 'NULL';
                      
                      Explained.
                      
                      SQL> @?/rdbms/admin/utlxpls
                      
                      PLAN_TABLE_OUTPUT
                      --------------------------------------------------------------------------
                      Plan hash value: 1357081020
                      
                      --------------------------------------------------------------------------
                      | Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
                      --------------------------------------------------------------------------
                      |   0 | SELECT STATEMENT  |      |     3 |    66 |    73   (6)| 00:00:01 |
                      |*  1 |  TABLE ACCESS FULL| TEST |     3 |    66 |    73   (6)| 00:00:01 |
                      --------------------------------------------------------------------------
                      
                      Predicate Information (identified by operation id):
                      ---------------------------------------------------
                      
                         1 - filter(NVL(TO_CHAR(INTERNAL_FUNCTION("DT")),'NULL')='NULL')

Open in new window


Yep, another FULL SCAN operation. We’re back to 73 cost for a query that returns no rows and doesn’t use our Function Based Index. Why? The function used in the query is exactly the same as the function used in the index creation syntax. It’s NVL(TO_CHAR(DT),'NULL') It’s just a TO_CHAR over the date field with an NVL assignation for the NULL values. What’s wrong with it?

Is the function used by the query exactly the same as the index function expression? Well, it’s not. Just take a look at the filter operation. Again a INTERNAL_FUNCTION (we don’t know really which exact function is) made the functions defined in the query and the index don’t match.

The answer is hidden in the Function Based Index definition and in one of the first session parameters set in the article. We used the syntax NVL(TO_CHAR(DT),'NULL') but It’s not what’s really stored in index definition.

 
SQL> select index_name, column_expression
                        2  from user_ind_expressions
                        3  where index_name='FBI_DATE';
                      
                      INDEX_NAME                     COLUMN_EXPRESSION
                      ----------------------------- ----------------------------------------------------
                      FBI_DATE                       NVL(TO_CHAR("DT",'DD-MON-YYYY HH24:MI:SS'),'NULL')

Open in new window


The default data mask was stored in the Function Based Index expression definition, and it forced the INTERNAL_FUNCTION to make values match, unmatching the functions used. That made the optimizer not being able to consider the Function Based Index as valid.

Now, a slightly change in the query will make the functions finally match.

 
SQL> explain plan for
                        2  select * from test
                        3  where NVL(TO_CHAR(DT,'DD-MON-YYYY HH24:MI:SS'),'NULL') = 'NULL';
                      
                      Explained.
                      
                      SQL> @?/rdbms/admin/utlxpls
                      
                      PLAN_TABLE_OUTPUT
                      ----------------------------------------------------------------------------------------
                      Plan hash value: 2626164178
                      
                      ----------------------------------------------------------------------------------------
                      | Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
                      ----------------------------------------------------------------------------------------
                      |   0 | SELECT STATEMENT            |          |   891 | 30294 |     7   (0)| 00:00:01 |
                      |   1 |  TABLE ACCESS BY INDEX ROWID| TEST     |   891 | 30294 |     7   (0)| 00:00:01 |
                      |*  2 |   INDEX RANGE SCAN          | FBI_DATE |   356 |       |     3   (0)| 00:00:01 |
                      ----------------------------------------------------------------------------------------
                      
                      Predicate Information (identified by operation id):
                      ---------------------------------------------------
                      
                         2 - access(NVL(TO_CHAR(INTERNAL_FUNCTION("DT"),'DD-MON-YYYY
                                    HH24:MI:SS'),'NULL')='NULL')

Open in new window


As you can see, the key to know whatever our Function Based Index are not being used starts matching the function existing in the definition stored for these particular indexes, with the SQL code used for filtering. And so, implicit conversions may transform the code and it's important to understand the execution plan carefully to ensure the index it's being used or discarded efficiently.
1
2,974 Views
Javier MoralesOwner
CERTIFIED EXPERT
Oracle DBA teacher and consultant for almost 20 years.
Oracle Certified Professional. Writer of “Optimización SQL en Oracle".

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.