Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

how do I create a function index for to sql commands

Posted on 2011-03-01
13
Medium Priority
?
913 Views
Last Modified: 2013-12-12
I am trying to improve the performance on a query I am running and it was suggested that I try a function index for the two sql commands. They are the following:

regexp_substr(DOCUMENT_ID,'[^/]+$')

and

site||'-'||pdf_name  

The query is as follows:

select folder_id from dcs_rule where regexp_substr(DOCUMENT_ID,'[^/]+$') in (select site||'-'||pdf_name from holdings where publication_date >='01-JAN-10' and publication_date<='31-DEC-10');

Now the text that is left after all content up until the forward slash in the dcs_rule table is the equivalent of two fields in the holdings table. They are the site and pdf_name fields. So for example:

In the dcs_rule table the value of the document_ID field would look like this: RDBAFM_ORA10://cpia_lib/CPIA-1995-0525

In the query the REGEXP_SUBSTR(DOCUMENT_ID,'[^/]+$') command would strip everything and leave the following: CPIA-1995-0525

In the holdings table CPIA would be the site value and 1995-0525 would be the pdf_name value

I am trying to get the records that are in both records.  What would be the syntax for creating a function index for both sql commands?


0
Comment
Question by:sikyala
  • 5
  • 3
  • 3
  • +1
13 Comments
 
LVL 78

Accepted Solution

by:
slightwv (䄆 Netminder) earned 1000 total points
ID: 35010214
The syntax to create them is straight forward and samples are all over the web:
http://www.dba-oracle.com/oracle_tips_index_scan_fbi_sql.htm

The potential gotcha is (they may have fixed it by now) but not to long ago for Oracle to use a Function Based Index, the syntax/case in the query must match the EXACT case used to create the index.

for example: you create the index with: regexp_substr(DOCUMENT_ID,'[^/]+$')

and in your code you use: REGEXP_SUBSTR(DOCUMENT_ID,'[^/]+$')

the index would not be used.



0
 

Author Comment

by:sikyala
ID: 35010560
I created the indexes now how do I use them in my query?
0
 
LVL 78

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35010600
Just use the exact regexp syntax in a query and it should use them.

The Explain Plan will tell you if it is using them or not.
0
Learn Veeam advantages over legacy backup

Every day, more and more legacy backup customers switch to Veeam. Technologies designed for the client-server era cannot restore any IT service running in the hybrid cloud within seconds. Learn top Veeam advantages over legacy backup and get Veeam for the price of your renewal

 
LVL 11

Assisted Solution

by:Akenathon
Akenathon earned 1000 total points
ID: 35010622
They are used automatically, you don't need to do anything else. Doublecheck that they are being used by firing the SQLs in Sqlplus after entering SET AUTOTRACE ON
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35010747
create index fbi on dcs_rule (regexp_substr(DOCUMENT_ID,'[^/]+$') ) ;

Open in new window

this created an index with this syntax:
     REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')      


I also created an index on the publication_date of the holdings table

Then I analyzed the tables and generated a plan for this query...

SELECT folder_id
  FROM dcs_rule
 WHERE REGEXP_SUBSTR(document_id, '[^/]+$') IN
           (SELECT site || '-' || pdf_name
              FROM holdings
             WHERE publication_date >= TO_DATE('01-JAN-10', 'dd-mon-rr')
                   AND publication_date <= TO_DATE('31-DEC-10', 'dd-mon-rr'));

Open in new window



Plan hash value: 4128481947
 
-----------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |  3050 |   154K|   298   (2)| 00:00:04 |
|   1 |  VIEW                 | VM_NWVW_2 |  3050 |   154K|   298   (2)| 00:00:04 |
|   2 |   HASH UNIQUE         |           |  3050 |   190K|   298   (2)| 00:00:04 |
|*  3 |    FILTER             |           |       |       |            |          |
|*  4 |     HASH JOIN         |           |  3050 |   190K|   297   (2)| 00:00:04 |
|*  5 |      TABLE ACCESS FULL| HOLDINGS  |  3050 | 94550 |    74   (2)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| DCS_RULE  |   214K|  6899K|   221   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter(TO_DATE('01-JAN-10','dd-mon-rr')<=TO_DATE('31-DEC-10','dd-mon
              -rr'))
   4 - access( REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')="SITE"||'-'||"PDF_NAM
              E")
   5 - filter("PUBLICATION_DATE">=TO_DATE('01-JAN-10','dd-mon-rr') AND 
              "PUBLICATION_DATE"<=TO_DATE('31-DEC-10','dd-mon-rr'))

Open in new window



Note, neither index was used,  really surprising because I arranged test data so both the subquery and main query would each have exactly one row that matched their respective  where clauses.
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35010774
in my test data:
dcs_rule has 214082 rows -
   with one row that has a document_id that matches the CPIA-1995-0525 expression

holdings has 50000 rows
    with one row with data in 2010

0
 
LVL 78

Assisted Solution

by:slightwv (䄆 Netminder)
slightwv (䄆 Netminder) earned 1000 total points
ID: 35010776
Check the case....

Back when I used them the syntax used to create the index must match EXACLY how it is used in the select.

try:
SELECT folder_id
  FROM dcs_rule
 WHERE regexp_substr(DOCUMENT_ID, '[^/]+$') IN
           (SELECT site || '-' || pdf_name
              FROM holdings
             WHERE publication_date >= TO_DATE('01-JAN-10', 'dd-mon-rr')
                   AND publication_date <= TO_DATE('31-DEC-10', 'dd-mon-rr'));
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35010935
ah,  I reanalyzed with histograms and the indexes got picked up
Plan hash value: 3121295905
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                   |    15 | 30525 |    63   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                   |                   |    15 | 30525 |    63   (2)| 00:00:01 |
|   2 |   VIEW                          | VW_NSO_1          |    15 | 30030 |    18   (6)| 00:00:01 |
|   3 |    HASH UNIQUE                  |                   |    15 |   465 |    18   (6)| 00:00:01 |
|*  4 |     FILTER                      |                   |       |       |            |          |
|   5 |      TABLE ACCESS BY INDEX ROWID| HOLDINGS          |    15 |   465 |    17   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN          | HOLDINGS_PUB_DATE |    15 |       |     2   (0)| 00:00:01 |
|   7 |   TABLE ACCESS BY INDEX ROWID   | DCS_RULE          |     1 |    33 |     3   (0)| 00:00:01 |
|*  8 |    INDEX RANGE SCAN             | FBI               |     1 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter(TO_DATE('01-JAN-10','dd-mon-rr')<=TO_DATE('31-DEC-10','dd-mon-rr'))
   6 - access("PUBLICATION_DATE">=TO_DATE('01-JAN-10','dd-mon-rr') AND 
              "PUBLICATION_DATE"<=TO_DATE('31-DEC-10','dd-mon-rr'))
   8 - access("SITE||'-'||PDF_NAME"= REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$'))

Open in new window

0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35010951
and you'll note, the syntax I created the index with and the syntax I queryed with are not identical but both are canonicalized to the same format

REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')

you should check this with your own version though,  slightwv is correct that not all versions have been successful.  I haven't had any problems with this since 10.2.0.4 but definitely did have problems in 9i.
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35010965
histograms may or may not be needed or even desired with your real data.

I needed them for my manually created and artificially skewed data set
0
 

Author Comment

by:sikyala
ID: 35011205
I don't know how to interpret the results below:

Execution Plan
----------------------------------------------------------
Plan hash value: 2696889323

--------------------------------------------------------------------------------
--------------------------------------------------------

| Id  | Operation                          | Name                 | Rows  | Byte
s | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

--------------------------------------------------------------------------------
--------------------------------------------------------

|   0 | SELECT STATEMENT                   |                      |  1047 | 8899
5 | 32482   (1)| 00:06:30 |        |      |            |

|*  1 |  PX COORDINATOR                    |                      |       |
  |            |          |        |      |            |

|   2 |   PX SEND QC (RANDOM)              | :TQ10005             |  1047 | 8899
5 | 32482   (1)| 00:06:30 |  Q1,05 | P->S | QC (RAND)  |

|   3 |    HASH UNIQUE                     |                      |  1047 | 8899
5 | 32482   (1)| 00:06:30 |  Q1,05 | PCWP |            |

|   4 |     PX RECEIVE                     |                      | 15152 |  125
7K| 32480   (1)| 00:06:30 |  Q1,05 | PCWP |            |

|   5 |      PX SEND HASH                  | :TQ10004             | 15152 |  125
7K| 32480   (1)| 00:06:30 |  Q1,04 | P->P | HASH       |

|*  6 |       FILTER                       |                      |       |
  |            |          |  Q1,04 | PCWC |            |

|*  7 |        HASH JOIN                   |                      | 15152 |  125
7K| 32480   (1)| 00:06:30 |  Q1,04 | PCWP |            |

|   8 |         PX RECEIVE                 |                      | 10752 |   27
3K| 25664   (1)| 00:05:08 |  Q1,04 | PCWP |            |

|   9 |          PX SEND HASH              | :TQ10003             | 10752 |   27
3K| 25664   (1)| 00:05:08 |  Q1,03 | P->P | HASH       |

|  10 |           VIEW                     | index$_join$_002     | 10752 |   27
3K| 25664   (1)| 00:05:08 |  Q1,03 | PCWP |            |

|* 11 |            HASH JOIN BUFFERED      |                      |       |
  |            |          |  Q1,03 | PCWP |            |

|  12 |             BUFFER SORT            |                      |       |
  |            |          |  Q1,03 | PCWC |            |

|  13 |              PX RECEIVE            |                      | 10752 |   27
3K|  6872   (1)| 00:01:23 |  Q1,03 | PCWP |            |

|  14 |               PX SEND HASH         | :TQ10000             | 10752 |   27
3K|  6872   (1)| 00:01:23 |        | S->P | HASH       |

|  15 |                INDEX FAST FULL SCAN| HOLD_PK          | 10752 |   27
3K|  6872   (1)| 00:01:23 |        |      |            |

|  16 |             PX RECEIVE             |                      | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,03 | PCWP |            |

|  17 |              PX SEND HASH          | :TQ10002             | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,02 | P->P | HASH       |

|  18 |               PX BLOCK ITERATOR    |                      | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,02 | PCWC |            |

|* 19 |                INDEX FAST FULL SCAN| IX_BH_TITLE_PUB      | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,02 | PCWP |            |

|  20 |         BUFFER SORT                |                      |       |
  |            |          |  Q1,04 | PCWC |            |

|  21 |          PX RECEIVE                |                      |  2081K|   11
7M|  6793   (2)| 00:01:22 |  Q1,04 | PCWP |            |

|  22 |           PX SEND HASH             | :TQ10001             |  2081K|   11
7M|  6793   (2)| 00:01:22 |        | S->P | HASH       |

|  23 |            TABLE ACCESS FULL       | DCS_RULE |  2081K|   11
7M|  6793   (2)| 00:01:22 |        |      |            |

--------------------------------------------------------------------------------
--------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_DATE('01-JAN-10')<=TO_DATE('31-DEC-10'))
   6 - filter(TO_DATE('01-JAN-10')<=TO_DATE('31-DEC-10'))
   7 - access( REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')="SITE"||'-'||"PDF_NAME")
  11 - access(ROWID=ROWID)
  19 - filter("PUBLICATION_DATE"<='31-DEC-10' AND "PUBLICATION_DATE">='01-JAN-10
')

I didn't really see any improved performance.
0
 
LVL 11

Assisted Solution

by:Akenathon
Akenathon earned 1000 total points
ID: 35011837
Try hinting your SQL like this for the execution plan to be more readable:

select /*+NOPARALLEL*/ folder_id from dcs_rule where regexp_substr(DOCUMENT_ID,'[^/]+$') in (select site||'-'||pdf_name from holdings where publication_date >='01-JAN-10' and publication_date<='31-DEC-10');

What's the name of your new function-based index? Note that you only need one index, on regexp_substr(...)

Exception: If your "holdings" between 01-JAN-10 and 31-DEC-10 are a small percentage of the table, you *could* create another function-based index on (publication_date, "SITE"||'-'||"PDF_NAME"), so that the subquery is entirely solved in the index.
0
 

Author Closing Comment

by:sikyala
ID: 35148536
Sorry for the late reply
0

Featured Post

NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I remember the day when someone asked me to create a user for an application developement. The user should be able to create views and materialized views and, so, I used the following syntax: (CODE) This way, I guessed, I would ensure that use…
Why Shell Scripting? Shell scripting is a powerful method of accessing UNIX systems and it is very flexible. Shell scripts are required when we want to execute a sequence of commands in Unix flavored operating systems. “Shell” is the command line i…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
This video shows how to copy an entire tablespace from one database to another database using Transportable Tablespace functionality.
Suggested Courses

782 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question