Solved

how do I create a function index for to sql commands

Posted on 2011-03-01
13
890 Views
Last Modified: 2013-12-12
I am trying to improve the performance on a query I am running and it was suggested that I try a function index for the two sql commands. They are the following:

regexp_substr(DOCUMENT_ID,'[^/]+$')

and

site||'-'||pdf_name  

The query is as follows:

select folder_id from dcs_rule where regexp_substr(DOCUMENT_ID,'[^/]+$') in (select site||'-'||pdf_name from holdings where publication_date >='01-JAN-10' and publication_date<='31-DEC-10');

Now the text that is left after all content up until the forward slash in the dcs_rule table is the equivalent of two fields in the holdings table. They are the site and pdf_name fields. So for example:

In the dcs_rule table the value of the document_ID field would look like this: RDBAFM_ORA10://cpia_lib/CPIA-1995-0525

In the query the REGEXP_SUBSTR(DOCUMENT_ID,'[^/]+$') command would strip everything and leave the following: CPIA-1995-0525

In the holdings table CPIA would be the site value and 1995-0525 would be the pdf_name value

I am trying to get the records that are in both records.  What would be the syntax for creating a function index for both sql commands?


0
Comment
Question by:sikyala
  • 5
  • 3
  • 3
  • +1
13 Comments
 
LVL 76

Accepted Solution

by:
slightwv (䄆 Netminder) earned 250 total points
ID: 35010214
The syntax to create them is straight forward and samples are all over the web:
http://www.dba-oracle.com/oracle_tips_index_scan_fbi_sql.htm

The potential gotcha is (they may have fixed it by now) but not to long ago for Oracle to use a Function Based Index, the syntax/case in the query must match the EXACT case used to create the index.

for example: you create the index with: regexp_substr(DOCUMENT_ID,'[^/]+$')

and in your code you use: REGEXP_SUBSTR(DOCUMENT_ID,'[^/]+$')

the index would not be used.



0
 

Author Comment

by:sikyala
ID: 35010560
I created the indexes now how do I use them in my query?
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35010600
Just use the exact regexp syntax in a query and it should use them.

The Explain Plan will tell you if it is using them or not.
0
 
LVL 11

Assisted Solution

by:Akenathon
Akenathon earned 250 total points
ID: 35010622
They are used automatically, you don't need to do anything else. Doublecheck that they are being used by firing the SQLs in Sqlplus after entering SET AUTOTRACE ON
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 35010747
create index fbi on dcs_rule (regexp_substr(DOCUMENT_ID,'[^/]+$') ) ;

Open in new window

this created an index with this syntax:
     REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')      


I also created an index on the publication_date of the holdings table

Then I analyzed the tables and generated a plan for this query...

SELECT folder_id
  FROM dcs_rule
 WHERE REGEXP_SUBSTR(document_id, '[^/]+$') IN
           (SELECT site || '-' || pdf_name
              FROM holdings
             WHERE publication_date >= TO_DATE('01-JAN-10', 'dd-mon-rr')
                   AND publication_date <= TO_DATE('31-DEC-10', 'dd-mon-rr'));

Open in new window



Plan hash value: 4128481947
 
-----------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |  3050 |   154K|   298   (2)| 00:00:04 |
|   1 |  VIEW                 | VM_NWVW_2 |  3050 |   154K|   298   (2)| 00:00:04 |
|   2 |   HASH UNIQUE         |           |  3050 |   190K|   298   (2)| 00:00:04 |
|*  3 |    FILTER             |           |       |       |            |          |
|*  4 |     HASH JOIN         |           |  3050 |   190K|   297   (2)| 00:00:04 |
|*  5 |      TABLE ACCESS FULL| HOLDINGS  |  3050 | 94550 |    74   (2)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| DCS_RULE  |   214K|  6899K|   221   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter(TO_DATE('01-JAN-10','dd-mon-rr')<=TO_DATE('31-DEC-10','dd-mon
              -rr'))
   4 - access( REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')="SITE"||'-'||"PDF_NAM
              E")
   5 - filter("PUBLICATION_DATE">=TO_DATE('01-JAN-10','dd-mon-rr') AND 
              "PUBLICATION_DATE"<=TO_DATE('31-DEC-10','dd-mon-rr'))

Open in new window



Note, neither index was used,  really surprising because I arranged test data so both the subquery and main query would each have exactly one row that matched their respective  where clauses.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 35010774
in my test data:
dcs_rule has 214082 rows -
   with one row that has a document_id that matches the CPIA-1995-0525 expression

holdings has 50000 rows
    with one row with data in 2010

0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 76

Assisted Solution

by:slightwv (䄆 Netminder)
slightwv (䄆 Netminder) earned 250 total points
ID: 35010776
Check the case....

Back when I used them the syntax used to create the index must match EXACLY how it is used in the select.

try:
SELECT folder_id
  FROM dcs_rule
 WHERE regexp_substr(DOCUMENT_ID, '[^/]+$') IN
           (SELECT site || '-' || pdf_name
              FROM holdings
             WHERE publication_date >= TO_DATE('01-JAN-10', 'dd-mon-rr')
                   AND publication_date <= TO_DATE('31-DEC-10', 'dd-mon-rr'));
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 35010935
ah,  I reanalyzed with histograms and the indexes got picked up
Plan hash value: 3121295905
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                   |    15 | 30525 |    63   (2)| 00:00:01 |
|   1 |  NESTED LOOPS                   |                   |    15 | 30525 |    63   (2)| 00:00:01 |
|   2 |   VIEW                          | VW_NSO_1          |    15 | 30030 |    18   (6)| 00:00:01 |
|   3 |    HASH UNIQUE                  |                   |    15 |   465 |    18   (6)| 00:00:01 |
|*  4 |     FILTER                      |                   |       |       |            |          |
|   5 |      TABLE ACCESS BY INDEX ROWID| HOLDINGS          |    15 |   465 |    17   (0)| 00:00:01 |
|*  6 |       INDEX RANGE SCAN          | HOLDINGS_PUB_DATE |    15 |       |     2   (0)| 00:00:01 |
|   7 |   TABLE ACCESS BY INDEX ROWID   | DCS_RULE          |     1 |    33 |     3   (0)| 00:00:01 |
|*  8 |    INDEX RANGE SCAN             | FBI               |     1 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - filter(TO_DATE('01-JAN-10','dd-mon-rr')<=TO_DATE('31-DEC-10','dd-mon-rr'))
   6 - access("PUBLICATION_DATE">=TO_DATE('01-JAN-10','dd-mon-rr') AND 
              "PUBLICATION_DATE"<=TO_DATE('31-DEC-10','dd-mon-rr'))
   8 - access("SITE||'-'||PDF_NAME"= REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$'))

Open in new window

0
 
LVL 73

Expert Comment

by:sdstuber
ID: 35010951
and you'll note, the syntax I created the index with and the syntax I queryed with are not identical but both are canonicalized to the same format

REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')

you should check this with your own version though,  slightwv is correct that not all versions have been successful.  I haven't had any problems with this since 10.2.0.4 but definitely did have problems in 9i.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 35010965
histograms may or may not be needed or even desired with your real data.

I needed them for my manually created and artificially skewed data set
0
 

Author Comment

by:sikyala
ID: 35011205
I don't know how to interpret the results below:

Execution Plan
----------------------------------------------------------
Plan hash value: 2696889323

--------------------------------------------------------------------------------
--------------------------------------------------------

| Id  | Operation                          | Name                 | Rows  | Byte
s | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

--------------------------------------------------------------------------------
--------------------------------------------------------

|   0 | SELECT STATEMENT                   |                      |  1047 | 8899
5 | 32482   (1)| 00:06:30 |        |      |            |

|*  1 |  PX COORDINATOR                    |                      |       |
  |            |          |        |      |            |

|   2 |   PX SEND QC (RANDOM)              | :TQ10005             |  1047 | 8899
5 | 32482   (1)| 00:06:30 |  Q1,05 | P->S | QC (RAND)  |

|   3 |    HASH UNIQUE                     |                      |  1047 | 8899
5 | 32482   (1)| 00:06:30 |  Q1,05 | PCWP |            |

|   4 |     PX RECEIVE                     |                      | 15152 |  125
7K| 32480   (1)| 00:06:30 |  Q1,05 | PCWP |            |

|   5 |      PX SEND HASH                  | :TQ10004             | 15152 |  125
7K| 32480   (1)| 00:06:30 |  Q1,04 | P->P | HASH       |

|*  6 |       FILTER                       |                      |       |
  |            |          |  Q1,04 | PCWC |            |

|*  7 |        HASH JOIN                   |                      | 15152 |  125
7K| 32480   (1)| 00:06:30 |  Q1,04 | PCWP |            |

|   8 |         PX RECEIVE                 |                      | 10752 |   27
3K| 25664   (1)| 00:05:08 |  Q1,04 | PCWP |            |

|   9 |          PX SEND HASH              | :TQ10003             | 10752 |   27
3K| 25664   (1)| 00:05:08 |  Q1,03 | P->P | HASH       |

|  10 |           VIEW                     | index$_join$_002     | 10752 |   27
3K| 25664   (1)| 00:05:08 |  Q1,03 | PCWP |            |

|* 11 |            HASH JOIN BUFFERED      |                      |       |
  |            |          |  Q1,03 | PCWP |            |

|  12 |             BUFFER SORT            |                      |       |
  |            |          |  Q1,03 | PCWC |            |

|  13 |              PX RECEIVE            |                      | 10752 |   27
3K|  6872   (1)| 00:01:23 |  Q1,03 | PCWP |            |

|  14 |               PX SEND HASH         | :TQ10000             | 10752 |   27
3K|  6872   (1)| 00:01:23 |        | S->P | HASH       |

|  15 |                INDEX FAST FULL SCAN| HOLD_PK          | 10752 |   27
3K|  6872   (1)| 00:01:23 |        |      |            |

|  16 |             PX RECEIVE             |                      | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,03 | PCWP |            |

|  17 |              PX SEND HASH          | :TQ10002             | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,02 | P->P | HASH       |

|  18 |               PX BLOCK ITERATOR    |                      | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,02 | PCWC |            |

|* 19 |                INDEX FAST FULL SCAN| IX_BH_TITLE_PUB      | 10752 |   27
3K| 25156   (1)| 00:05:02 |  Q1,02 | PCWP |            |

|  20 |         BUFFER SORT                |                      |       |
  |            |          |  Q1,04 | PCWC |            |

|  21 |          PX RECEIVE                |                      |  2081K|   11
7M|  6793   (2)| 00:01:22 |  Q1,04 | PCWP |            |

|  22 |           PX SEND HASH             | :TQ10001             |  2081K|   11
7M|  6793   (2)| 00:01:22 |        | S->P | HASH       |

|  23 |            TABLE ACCESS FULL       | DCS_RULE |  2081K|   11
7M|  6793   (2)| 00:01:22 |        |      |            |

--------------------------------------------------------------------------------
--------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_DATE('01-JAN-10')<=TO_DATE('31-DEC-10'))
   6 - filter(TO_DATE('01-JAN-10')<=TO_DATE('31-DEC-10'))
   7 - access( REGEXP_SUBSTR ("DOCUMENT_ID",'[^/]+$')="SITE"||'-'||"PDF_NAME")
  11 - access(ROWID=ROWID)
  19 - filter("PUBLICATION_DATE"<='31-DEC-10' AND "PUBLICATION_DATE">='01-JAN-10
')

I didn't really see any improved performance.
0
 
LVL 11

Assisted Solution

by:Akenathon
Akenathon earned 250 total points
ID: 35011837
Try hinting your SQL like this for the execution plan to be more readable:

select /*+NOPARALLEL*/ folder_id from dcs_rule where regexp_substr(DOCUMENT_ID,'[^/]+$') in (select site||'-'||pdf_name from holdings where publication_date >='01-JAN-10' and publication_date<='31-DEC-10');

What's the name of your new function-based index? Note that you only need one index, on regexp_substr(...)

Exception: If your "holdings" between 01-JAN-10 and 31-DEC-10 are a small percentage of the table, you *could* create another function-based index on (publication_date, "SITE"||'-'||"PDF_NAME"), so that the subquery is entirely solved in the index.
0
 

Author Closing Comment

by:sikyala
ID: 35148536
Sorry for the late reply
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Working with Network Access Control Lists in Oracle 11g (part 1) Part 2: http://www.e-e.com/A_9074.html So, you upgraded to a shiny new 11g database and all of a sudden every program that used UTL_MAIL, UTL_SMTP, UTL_TCP, UTL_HTTP or any oth…
This post first appeared at Oracleinaction  (http://oracleinaction.com/undo-and-redo-in-oracle/)by Anju Garg (Myself). I  will demonstrate that undo for DML’s is stored both in undo tablespace and online redo logs. Then, we will analyze the reaso…
This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now