asked on

how do I create a function index for to sql commands

I am trying to improve the performance on a query I am running and it was suggested that I try a function index for the two sql commands. They are the following:

regexp_substr(DOCUMENT_ID,'[^/]+$')

and

site||'-'||pdf_name

The query is as follows:

select folder_id from dcs_rule where regexp_substr(DOCUMENT_ID,'[^/]+$') in (select site||'-'||pdf_name from holdings where publication_date >='01-JAN-10' and publication_date<='31-DEC-10');

Now the text that is left after all content up until the forward slash in the dcs_rule table is the equivalent of two fields in the holdings table. They are the site and pdf_name fields. So for example:

In the dcs_rule table the value of the document_ID field would look like this: RDBAFM_ORA10://cpia_lib/CPIA-1995-0525

In the query the REGEXP_SUBSTR(DOCUMENT_ID,'[^/]+$') command would strip everything and leave the following: CPIA-1995-0525

In the holdings table CPIA would be the site value and 1995-0525 would be the pdf_name value

I am trying to get the records that are in both records. What would be the syntax for creating a function index for both sql commands?

ASKER CERTIFIED SOLUTION

slightwv (䄆 Netminder)

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sikyala

ASKER

I created the indexes now how do I use them in my query?

slightwv (䄆 Netminder)

Just use the exact regexp syntax in a query and it should use them.

The Explain Plan will tell you if it is using them or not.

SOLUTION

Akenathon

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Sean Stuber

create index fbi on dcs_rule (regexp_substr(DOCUMENT_ID,'[^/]+$') ) ;

Open in new window

this created an index with this syntax:
REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')

I also created an index on the publication_date of the holdings table

Then I analyzed the tables and generated a plan for this query...

SELECT folder_id
  FROM dcs_rule
 WHERE REGEXP_SUBSTR(document_id, '[^/]+$') IN
           (SELECT site || '-' || pdf_name
              FROM holdings
             WHERE publication_date >= TO_DATE('01-JAN-10', 'dd-mon-rr')
                   AND publication_date <= TO_DATE('31-DEC-10', 'dd-mon-rr'));

Open in new window

Plan hash value: 4128481947
 
-----------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |  3050 |   154K|   298   (2)| 00:00:04 |
|   1 |  VIEW                 | VM_NWVW_2 |  3050 |   154K|   298   (2)| 00:00:04 |
|   2 |   HASH UNIQUE         |           |  3050 |   190K|   298   (2)| 00:00:04 |
|*  3 |    FILTER             |           |       |       |            |          |
|*  4 |     HASH JOIN         |           |  3050 |   190K|   297   (2)| 00:00:04 |
|*  5 |      TABLE ACCESS FULL| HOLDINGS  |  3050 | 94550 |    74   (2)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| DCS_RULE  |   214K|  6899K|   221   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter(TO_DATE('01-JAN-10','dd-mon-rr')<=TO_DATE('31-DEC-10','dd-mon
              -rr'))
   4 - access( REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')="SITE"||'-'||"PDF_NAM
              E")
   5 - filter("PUBLICATION_DATE">=TO_DATE('01-JAN-10','dd-mon-rr') AND 
              "PUBLICATION_DATE"<=TO_DATE('31-DEC-10','dd-mon-rr'))

Open in new window

Note, neither index was used, really surprising because I arranged test data so both the subquery and main query would each have exactly one row that matched their respective where clauses.

Sean Stuber

in my test data:
dcs_rule has 214082 rows -
with one row that has a document_id that matches the CPIA-1995-0525 expression

holdings has 50000 rows
with one row with data in 2010

SOLUTION

slightwv (䄆 Netminder)

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Sean Stuber

ah, I reanalyzed with histograms and the indexes got picked up

Plan hash value: 3121295905
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                   |    15 | 30525 |    63   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                   |                   |    15 | 30525 |    63   (2)| 00:00:01 |
|   2 |   VIEW                          | VW_NSO_1          |    15 | 30030 |    18   (6)| 00:00:01 |
|   3 |    HASH UNIQUE                  |                   |    15 |   465 |    18   (6)| 00:00:01 |
|*  4 |     FILTER                      |                   |       |       |            |          |
|   5 |      TABLE ACCESS BY INDEX ROWID| HOLDINGS          |    15 |   465 |    17   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN          | HOLDINGS_PUB_DATE |    15 |       |     2   (0)| 00:00:01 |
|   7 |   TABLE ACCESS BY INDEX ROWID   | DCS_RULE          |     1 |    33 |     3   (0)| 00:00:01 |
|*  8 |    INDEX RANGE SCAN             | FBI               |     1 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter(TO_DATE('01-JAN-10','dd-mon-rr')<=TO_DATE('31-DEC-10','dd-mon-rr'))
   6 - access("PUBLICATION_DATE">=TO_DATE('01-JAN-10','dd-mon-rr') AND 
              "PUBLICATION_DATE"<=TO_DATE('31-DEC-10','dd-mon-rr'))
   8 - access("SITE||'-'||PDF_NAME"= REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$'))

Open in new window

Sean Stuber

and you'll note, the syntax I created the index with and the syntax I queryed with are not identical but both are canonicalized to the same format

REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')

you should check this with your own version though, slightwv is correct that not all versions have been successful. I haven't had any problems with this since 10.2.0.4 but definitely did have problems in 9i.

Sean Stuber

histograms may or may not be needed or even desired with your real data.

I needed them for my manually created and artificially skewed data set

sikyala

ASKER

I don't know how to interpret the results below:

Execution Plan
----------------------------------------------------------
Plan hash value: 2696889323

--------------------------------------------------------------------------------
--------------------------------------------------------

| Id | Operation | Name | Rows | Byte
s | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |

--------------------------------------------------------------------------------
--------------------------------------------------------

| 0 | SELECT STATEMENT | | 1047 | 8899
5 | 32482 (1)| 00:06:30 | | | |

|* 1 | PX COORDINATOR | | |
| | | | | |

| 2 | PX SEND QC (RANDOM) | :TQ10005 | 1047 | 8899
5 | 32482 (1)| 00:06:30 | Q1,05 | P->S | QC (RAND) |

| 3 | HASH UNIQUE | | 1047 | 8899
5 | 32482 (1)| 00:06:30 | Q1,05 | PCWP | |

| 4 | PX RECEIVE | | 15152 | 125
7K| 32480 (1)| 00:06:30 | Q1,05 | PCWP | |

| 5 | PX SEND HASH | :TQ10004 | 15152 | 125
7K| 32480 (1)| 00:06:30 | Q1,04 | P->P | HASH |

|* 6 | FILTER | | |
| | | Q1,04 | PCWC | |

|* 7 | HASH JOIN | | 15152 | 125
7K| 32480 (1)| 00:06:30 | Q1,04 | PCWP | |

| 8 | PX RECEIVE | | 10752 | 27
3K| 25664 (1)| 00:05:08 | Q1,04 | PCWP | |

| 9 | PX SEND HASH | :TQ10003 | 10752 | 27
3K| 25664 (1)| 00:05:08 | Q1,03 | P->P | HASH |

| 10 | VIEW | index$_join$_002 | 10752 | 27
3K| 25664 (1)| 00:05:08 | Q1,03 | PCWP | |

|* 11 | HASH JOIN BUFFERED | | |
| | | Q1,03 | PCWP | |

| 12 | BUFFER SORT | | |
| | | Q1,03 | PCWC | |

| 13 | PX RECEIVE | | 10752 | 27
3K| 6872 (1)| 00:01:23 | Q1,03 | PCWP | |

| 14 | PX SEND HASH | :TQ10000 | 10752 | 27
3K| 6872 (1)| 00:01:23 | | S->P | HASH |

| 15 | INDEX FAST FULL SCAN| HOLD_PK | 10752 | 27
3K| 6872 (1)| 00:01:23 | | | |

| 16 | PX RECEIVE | | 10752 | 27
3K| 25156 (1)| 00:05:02 | Q1,03 | PCWP | |

| 17 | PX SEND HASH | :TQ10002 | 10752 | 27
3K| 25156 (1)| 00:05:02 | Q1,02 | P->P | HASH |

| 18 | PX BLOCK ITERATOR | | 10752 | 27
3K| 25156 (1)| 00:05:02 | Q1,02 | PCWC | |

|* 19 | INDEX FAST FULL SCAN| IX_BH_TITLE_PUB | 10752 | 27
3K| 25156 (1)| 00:05:02 | Q1,02 | PCWP | |

| 20 | BUFFER SORT | | |
| | | Q1,04 | PCWC | |

| 21 | PX RECEIVE | | 2081K| 11
7M| 6793 (2)| 00:01:22 | Q1,04 | PCWP | |

| 22 | PX SEND HASH | :TQ10001 | 2081K| 11
7M| 6793 (2)| 00:01:22 | | S->P | HASH |

| 23 | TABLE ACCESS FULL | DCS_RULE | 2081K| 11
7M| 6793 (2)| 00:01:22 | | | |

--------------------------------------------------------------------------------
--------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(TO_DATE('01-JAN-10')<=TO_DATE('31-DEC-10'))
6 - filter(TO_DATE('01-JAN-10')<=TO_DATE('31-DEC-10'))
7 - access( REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')="SITE"||'-'||"PDF_NAME")
11 - access(ROWID=ROWID)
19 - filter("PUBLICATION_DATE"<='31-DEC-10' AND "PUBLICATION_DATE">='01-JAN-10
')

I didn't really see any improved performance.

SOLUTION

Akenathon

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

sikyala

ASKER

Sorry for the late reply