how do I create a function index for to sql commands

I am trying to improve the performance on a query I am running and it was suggested that I try a function index for the two sql commands. They are the following:

regexp_substr(DOCUMENT_ID,'[^/]+$')

and

site||'-'||pdf_name  

The query is as follows:

select folder_id from dcs_rule where regexp_substr(DOCUMENT_ID,'[^/]+$') in (select site||'-'||pdf_name from holdings where publication_date >='01-JAN-10' and publication_date<='31-DEC-10');

Now the text that is left after all content up until the forward slash in the dcs_rule table is the equivalent of two fields in the holdings table. They are the site and pdf_name fields. So for example:

In the dcs_rule table the value of the document_ID field would look like this: RDBAFM_ORA10://cpia_lib/CPIA-1995-0525

In the query the REGEXP_SUBSTR(DOCUMENT_ID,'[^/]+$') command would strip everything and leave the following: CPIA-1995-0525

In the holdings table CPIA would be the site value and 1995-0525 would be the pdf_name value

I am trying to get the records that are in both records.  What would be the syntax for creating a function index for both sql commands?


sikyalaSenior Database AdministratorAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
slightwv (䄆 Netminder)Connect With a Mentor Commented:
The syntax to create them is straight forward and samples are all over the web:
http://www.dba-oracle.com/oracle_tips_index_scan_fbi_sql.htm

The potential gotcha is (they may have fixed it by now) but not to long ago for Oracle to use a Function Based Index, the syntax/case in the query must match the EXACT case used to create the index.

for example: you create the index with: regexp_substr(DOCUMENT_ID,'[^/]+$')

and in your code you use: REGEXP_SUBSTR(DOCUMENT_ID,'[^/]+$')

the index would not be used.



0
 
sikyalaSenior Database AdministratorAuthor Commented:
I created the indexes now how do I use them in my query?
0
 
slightwv (䄆 Netminder) Commented:
Just use the exact regexp syntax in a query and it should use them.

The Explain Plan will tell you if it is using them or not.
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
AkenathonConnect With a Mentor Commented:
They are used automatically, you don't need to do anything else. Doublecheck that they are being used by firing the SQLs in Sqlplus after entering SET AUTOTRACE ON
0
 
sdstuberCommented:
create index fbi on dcs_rule (regexp_substr(DOCUMENT_ID,'[^/]+$') ) ;

Open in new window

this created an index with this syntax:
     REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')      


I also created an index on the publication_date of the holdings table

Then I analyzed the tables and generated a plan for this query...

SELECT folder_id
  FROM dcs_rule
 WHERE REGEXP_SUBSTR(document_id, '[^/]+$') IN
           (SELECT site || '-' || pdf_name
              FROM holdings
             WHERE publication_date >= TO_DATE('01-JAN-10', 'dd-mon-rr')
                   AND publication_date <= TO_DATE('31-DEC-10', 'dd-mon-rr'));

Open in new window



Plan hash value: 4128481947
 
-----------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |  3050 |   154K|   298   (2)| 00:00:04 |
|   1 |  VIEW                 | VM_NWVW_2 |  3050 |   154K|   298   (2)| 00:00:04 |
|   2 |   HASH UNIQUE         |           |  3050 |   190K|   298   (2)| 00:00:04 |
|*  3 |    FILTER             |           |       |       |            |          |
|*  4 |     HASH JOIN         |           |  3050 |   190K|   297   (2)| 00:00:04 |
|*  5 |      TABLE ACCESS FULL| HOLDINGS  |  3050 | 94550 |    74   (2)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| DCS_RULE  |   214K|  6899K|   221   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter(TO_DATE('01-JAN-10','dd-mon-rr')<=TO_DATE('31-DEC-10','dd-mon
              -rr'))
   4 - access( REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')="SITE"||'-'||"PDF_NAM
              E")
   5 - filter("PUBLICATION_DATE">=TO_DATE('01-JAN-10','dd-mon-rr') AND 
              "PUBLICATION_DATE"<=TO_DATE('31-DEC-10','dd-mon-rr'))

Open in new window



Note, neither index was used,  really surprising because I arranged test data so both the subquery and main query would each have exactly one row that matched their respective  where clauses.
0
 
sdstuberCommented:
in my test data:
dcs_rule has 214082 rows -
   with one row that has a document_id that matches the CPIA-1995-0525 expression

holdings has 50000 rows
    with one row with data in 2010

0
 
slightwv (䄆 Netminder)Connect With a Mentor Commented:
Check the case....

Back when I used them the syntax used to create the index must match EXACLY how it is used in the select.

try:
SELECT folder_id
  FROM dcs_rule
 WHERE regexp_substr(DOCUMENT_ID, '[^/]+$') IN
           (SELECT site || '-' || pdf_name
              FROM holdings
             WHERE publication_date >= TO_DATE('01-JAN-10', 'dd-mon-rr')
                   AND publication_date <= TO_DATE('31-DEC-10', 'dd-mon-rr'));
0
 
sdstuberCommented:
ah,  I reanalyzed with histograms and the indexes got picked up
Plan hash value: 3121295905
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                   |    15 | 30525 |    63   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                   |                   |    15 | 30525 |    63   (2)| 00:00:01 |
|   2 |   VIEW                          | VW_NSO_1          |    15 | 30030 |    18   (6)| 00:00:01 |
|   3 |    HASH UNIQUE                  |                   |    15 |   465 |    18   (6)| 00:00:01 |
|*  4 |     FILTER                      |                   |       |       |            |          |
|   5 |      TABLE ACCESS BY INDEX ROWID| HOLDINGS          |    15 |   465 |    17   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN          | HOLDINGS_PUB_DATE |    15 |       |     2   (0)| 00:00:01 |
|   7 |   TABLE ACCESS BY INDEX ROWID   | DCS_RULE          |     1 |    33 |     3   (0)| 00:00:01 |
|*  8 |    INDEX RANGE SCAN             | FBI               |     1 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter(TO_DATE('01-JAN-10','dd-mon-rr')<=TO_DATE('31-DEC-10','dd-mon-rr'))
   6 - access("PUBLICATION_DATE">=TO_DATE('01-JAN-10','dd-mon-rr') AND 
              "PUBLICATION_DATE"<=TO_DATE('31-DEC-10','dd-mon-rr'))
   8 - access("SITE||'-'||PDF_NAME"= REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$'))

Open in new window

0
 
sdstuberCommented:
and you'll note, the syntax I created the index with and the syntax I queryed with are not identical but both are canonicalized to the same format

REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')

you should check this with your own version though,  slightwv is correct that not all versions have been successful.  I haven't had any problems with this since 10.2.0.4 but definitely did have problems in 9i.
0
 
sdstuberCommented:
histograms may or may not be needed or even desired with your real data.

I needed them for my manually created and artificially skewed data set
0
 
sikyalaSenior Database AdministratorAuthor Commented:
I don't know how to interpret the results below:

Execution Plan
----------------------------------------------------------
Plan hash value: 2696889323

--------------------------------------------------------------------------------
--------------------------------------------------------

| Id  | Operation                          | Name                 | Rows  | Byte
s | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

--------------------------------------------------------------------------------
--------------------------------------------------------

|   0 | SELECT STATEMENT                   |                      |  1047 | 8899
5 | 32482   (1)| 00:06:30 |        |      |            |

|*  1 |  PX COORDINATOR                    |                      |       |
  |            |          |        |      |            |

|   2 |   PX SEND QC (RANDOM)              | :TQ10005             |  1047 | 8899
5 | 32482   (1)| 00:06:30 |  Q1,05 | P->S | QC (RAND)  |

|   3 |    HASH UNIQUE                     |                      |  1047 | 8899
5 | 32482   (1)| 00:06:30 |  Q1,05 | PCWP |            |

|   4 |     PX RECEIVE                     |                      | 15152 |  125
7K| 32480   (1)| 00:06:30 |  Q1,05 | PCWP |            |

|   5 |      PX SEND HASH                  | :TQ10004             | 15152 |  125
7K| 32480   (1)| 00:06:30 |  Q1,04 | P->P | HASH       |

|*  6 |       FILTER                       |                      |       |
  |            |          |  Q1,04 | PCWC |            |

|*  7 |        HASH JOIN                   |                      | 15152 |  125
7K| 32480   (1)| 00:06:30 |  Q1,04 | PCWP |            |

|   8 |         PX RECEIVE                 |                      | 10752 |   27
3K| 25664   (1)| 00:05:08 |  Q1,04 | PCWP |            |

|   9 |          PX SEND HASH              | :TQ10003             | 10752 |   27
3K| 25664   (1)| 00:05:08 |  Q1,03 | P->P | HASH       |

|  10 |           VIEW                     | index$_join$_002     | 10752 |   27
3K| 25664   (1)| 00:05:08 |  Q1,03 | PCWP |            |

|* 11 |            HASH JOIN BUFFERED      |                      |       |
  |            |          |  Q1,03 | PCWP |            |

|  12 |             BUFFER SORT            |                      |       |
  |            |          |  Q1,03 | PCWC |            |

|  13 |              PX RECEIVE            |                      | 10752 |   27
3K|  6872   (1)| 00:01:23 |  Q1,03 | PCWP |            |

|  14 |               PX SEND HASH         | :TQ10000             | 10752 |   27
3K|  6872   (1)| 00:01:23 |        | S->P | HASH       |

|  15 |                INDEX FAST FULL SCAN| HOLD_PK          | 10752 |   27
3K|  6872   (1)| 00:01:23 |        |      |            |

|  16 |             PX RECEIVE             |                      | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,03 | PCWP |            |

|  17 |              PX SEND HASH          | :TQ10002             | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,02 | P->P | HASH       |

|  18 |               PX BLOCK ITERATOR    |                      | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,02 | PCWC |            |

|* 19 |                INDEX FAST FULL SCAN| IX_BH_TITLE_PUB      | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,02 | PCWP |            |

|  20 |         BUFFER SORT                |                      |       |
  |            |          |  Q1,04 | PCWC |            |

|  21 |          PX RECEIVE                |                      |  2081K|   11
7M|  6793   (2)| 00:01:22 |  Q1,04 | PCWP |            |

|  22 |           PX SEND HASH             | :TQ10001             |  2081K|   11
7M|  6793   (2)| 00:01:22 |        | S->P | HASH       |

|  23 |            TABLE ACCESS FULL       | DCS_RULE |  2081K|   11
7M|  6793   (2)| 00:01:22 |        |      |            |

--------------------------------------------------------------------------------
--------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_DATE('01-JAN-10')<=TO_DATE('31-DEC-10'))
   6 - filter(TO_DATE('01-JAN-10')<=TO_DATE('31-DEC-10'))
   7 - access( REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')="SITE"||'-'||"PDF_NAME")
  11 - access(ROWID=ROWID)
  19 - filter("PUBLICATION_DATE"<='31-DEC-10' AND "PUBLICATION_DATE">='01-JAN-10
')

I didn't really see any improved performance.
0
 
AkenathonConnect With a Mentor Commented:
Try hinting your SQL like this for the execution plan to be more readable:

select /*+NOPARALLEL*/ folder_id from dcs_rule where regexp_substr(DOCUMENT_ID,'[^/]+$') in (select site||'-'||pdf_name from holdings where publication_date >='01-JAN-10' and publication_date<='31-DEC-10');

What's the name of your new function-based index? Note that you only need one index, on regexp_substr(...)

Exception: If your "holdings" between 01-JAN-10 and 31-DEC-10 are a small percentage of the table, you *could* create another function-based index on (publication_date, "SITE"||'-'||"PDF_NAME"), so that the subquery is entirely solved in the index.
0
 
sikyalaSenior Database AdministratorAuthor Commented:
Sorry for the late reply
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.