How can I index a multiple column wildcard search using LIKE or CATSEARCH or CONTAINS in Oracle

hedgebob
hedgebob used Ask the Experts™
on
I have a table that has 5 VARCHAR2 columns that I need to do a "text" search on.  The WHERE clause used to be:

WHERE
DIST_NUM = '088880'
AND UPPER(field1) LIKE '%SOMETHING%'
AND UPPER(field2) LIKE '%SOMETHINGELSE%'
AND UPPER(field3) LIKE '%HIMOM%'
AND UPPER(field4) LIKE '%WHATEVER%'
AND UPPER(field5) LIKE '%LASTONE%'

There are two tables to index this way, one that is static and has about 4 millions rows, and one that will be growing at a rate of about 200,000 a month with an initial count of about 1 million.

Obviously the LIKE clause causes a full table scan, and I tried a CONTEXT DOMAIN index but when switching the LIKE clauses to CONTAINS, it only seems to use one of the DOMAIN indexes.  I read about CTXCAT, but I need multiple wildcard clauses.  The VARCHAR2 data in each of these columns is not very large at all, maybe 255 at most.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Naveen KumarProduction Manager / Application Support Manager

Commented:
why did it use the domain index ? any idea ?

1) paste the complete sql statement,
2) what context domain index did you create
3) explain plan statement for the sql statement

need this info. to understand what is going on...

Author

Commented:
It used the domain index because I switched to a CONTAINS clause for each of the LIKE clauses but I want it to use all the clauses for indexing and it only selects one.

Here are the indexes

CREATE INDEX idx_bfextract_txtepd on BFEXTRACT_FORTOM(EXT_PRODUCT_DESC) INDEXTYPE IS CTXSYS.CONTEXT;
CREATE INDEX idx_bfextract_txtedic on BFEXTRACT_FORTOM(EXT_DIST_I_CODE) INDEXTYPE IS CTXSYS.CONTEXT;
CREATE INDEX idx_bfextract_txtedmc on BFEXTRACT_FORTOM(EXT_D_MFG_CODE) INDEXTYPE IS CTXSYS.CONTEXT;
CREATE INDEX idx_bfextract_txtedmp on BFEXTRACT_FORTOM(EXT_D_MFG_PROD_CODE) INDEXTYPE IS CTXSYS.CONTEXT;
CREATE INDEX idx_bfextract_txteuom on BFEXTRACT_FORTOM(EXT_UOM) INDEXTYPE IS CTXSYS.CONTEXT;


Here is the SQL using CONTAINS (also trying CATSEARCH with CTXCAT indexes too with no luck yet)

SELECT * FROM bfextract_fortom
  WHERE  DIST_NUM = '000002'
  AND CONTAINS(ext_product_desc,'syringe') > 0
  AND CONTAINS(ext_d_mfg_code,'0000114314') > 0
  AND CONTAINS(ext_d_mfg_prod_code,'WREMIH442260') > 0
  AND CONTAINS(ext_uom,'CS') > 0    ;

And I am attaching the explain plan from the above statement exported via SQLDeveloper (only option is html)

I am not as concerned with why this particular query isn't optimized as much as I am looking for a solution for replacing the first query in the original post with a indexable solution and CONTEXT indexes don't seem to be it so far.



explainplan.html
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
What you're after here is a MULTI_COLUMN_DATASTORE.

I do a lot with Oracle Text and have used these in the past.  If you have any questions, please ask.

Finding certain Text topics in the online docs is problematic at times.  Certain nice features only have a line or two buried deep in only one chapter.

Since you want wild-cards this should be of interest:
SUBSTRING_INDEX

Here's a couple more that appear to be nice but when I tried them they GREATLY inflated the index size and never really helped search times:
PREFIX_INDEX and PREFIX_MIN_LENGTH

Not knowing how much you already know about Text indexes, I'm assuming you want to SYNC on commit.

You'll need to run an optimize every once in a while to keep performance up:
ctx_ddl.optimize_index
Success in ‘20 With a Profitable Pricing Strategy

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
This looks very promising form what I just researched on it.  You are right about the documentation and this is my first attempt using Oracle Text so thanks.  I will try it out and check back here when done.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
No problem.  I'll be here!!!

It's taken me 10 years of using it to get where I am and there's parts of it that I still don't understand completely.  And I didn't know about this site when I started using it (not even sure this sited existed back then).

Author

Commented:
Actually, after testing this out using some examples, I am not sure this is what I need.  I need to query specific columns for specific values (with the wildcard at the beginning and end), but the MULTI_COLUMN_DATASTORE seems to query the list of columns with the same value.  Here is the code. I was testing

CTX_DDL.CREATE_PREFERENCE('test_concat_datastore', 'MULTI_COLUMN_DATASTORE');
CTX_DDL.SET_ATTRIBUTE('test_concat_datastore', 'COLUMNS', 'ext_product_description', 'ext_d_mfg_code');

CREATE INDEX idx_bfextract_text
ON bfextract_fortom (INT_LAST_MODIFIED)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('datastore CTXSYS.test_concat_datastore')

SELECT *
FROM   bfextract_fortom
WHERE  CONTAINS (int_last_modified, 'syringe') > 0;

And what I need is to search the multiple columns in that table with different values for each.  Any thoughts?

Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Sorry I completely missed the question here.  Oracle text might be overkill.


I'm not at work right now so this is from memory.  I'll try to provide some better solutions tomorrow when I have DB access.

>>Obviously the LIKE clause causes a full table scan
Try function based indexes (FBI):

create index dx_bfextract_txtepd on BFEXTRACT_FORTOM(UPPER(EXT_PRODUCT_DESC));

The note here is the SQL needs to match the call exactly (again going from memory).  So given the index above the following query won't use it:
select * from BFEXTRACT_FORTOM where UpPeR(EXT_PRODUCT_DESC) = 'HI';


Back to Oracle Text:

The quick and dirty:
If the data in the individual columns is pretty distinct, check out NEAR (typed from memory so syntax is probably wrong):
contains(int_last_modified,'near(("syringe","000114314","WREMIH442260"),100,FALSE)') > 0;

Possible better (not sure it this works with multi column or not):
Set up section groups and search specific groups for terms.

Author

Commented:
The function-based index won't work because I would need WHERE EXT_PRODUCT_DESC LIKE '%whatevertext%' so that doesn't use an index unless I dropped off the first wildcard and just did WHERE EXT_PRODUCT_DESC LIKE 'whatevertext%'.  But that isn't an option

And, the data can be all over the place so we would never be able to be sure about the uniqueness of the data elemenent.  However, I think I may be on to something with the CONTAINS syntax and using the WITHIN keyword, so like CONTAINS(INT_LAST_MODIFIED,'whatever WITHIN EXT_PRODUCT_DESC AND himom WITHIN EXT_D_MFG_PROD') > 0.  Something like that which is what I think you meant with the sections.  I will keep at it and let you know, and if you think of anything in the meantime, feel free to comment.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
I have to apologize again.  Not knowing your data it's hard to come up with all possibilities.

Having thought about it for a few minutes, I'm now thinking that Text might be the way to go after all.

With the section groups you can do all the calls with one contains.

something like (again untested):
contains(int_last_modified,'(%syringe% within ext_product_desc) and (%0000114314% within ext_d_mfg_code,)') > 0;


http://download.oracle.com/docs/cd/B14117_01/text.101/b10730/cqoper.htm#i998525
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>Something like that which is what I think you meant with the
lol...  Caught me typing...


>>so that doesn't use an index unless I dropped off
DUH...  Correct.  To many ideas to little thought behind them.

Author

Commented:
So, I am trying the WITHIN syntax of CONTAINS and I was under the impression that when you included columns in the MULTI_COLUMN_DATASTORE it automatically created sections for each column, but I keep getting an error about my section not existing.  I even tried adding a section group with BASIC_SECTION_GROUP.  The documentation seems to indicate this would happen automatically, but no luck.  Here is what is going on....


First attempt....

BEGIN
CTX_DDL.CREATE_PREFERENCE('test_concat_datastore', 'MULTI_COLUMN_DATASTORE');
CTX_DDL.SET_ATTRIBUTE('test_concat_datastore', 'COLUMNS', 'ext_product_desc,ext_d_mfg_code,ext_d_mfg_prod_code');
END;
/

CREATE INDEX idx_bfextract_text
ON bfextract_fortom (INT_BKT_TARGET)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('datastore MI_DW.test_concat_datastore');

SELECT * FROM bfextract_fortom
WHERE CONTAINS(INT_BKT_TARGET,'SYRINGE WITHIN ext_product_desc')>0;

Get the error:
ORA-29902: error in executing ODCIIndexStart() routine
ORA-20000: Oracle Text error:
DRG-10837: section ext_product_desc does not exist
29902. 00000 -  "error in executing ODCIIndexStart() routine"
*Cause:    The execution of ODCIIndexStart routine caused an error.
*Action:   Examine the error messages produced by the indextype code and
           take appropriate action


Second attemp with the BASIC_SECTION_GROUP....

BEGIN
CTX_DDL.DROP_PREFERENCE ('test_concat_datastore');
CTX_DDL.CREATE_PREFERENCE('test_concat_datastore', 'MULTI_COLUMN_DATASTORE');
CTX_DDL.SET_ATTRIBUTE('test_concat_datastore', 'COLUMNS', 'ext_product_desc,ext_d_mfg_code,ext_d_mfg_prod_code');
CTX_DDL.create_section_group('field_group', 'BASIC_SECTION_GROUP');
END;
/


CREATE INDEX idx_bfextract_text
ON bfextract_fortom (INT_BKT_TARGET)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('datastore MI_DW.test_concat_datastore SECTION GROUP FIELD_GROUP');

SELECT * FROM bfextract_fortom
WHERE CONTAINS(INT_BKT_TARGET,'SYRINGE WITHIN ext_product_desc')>0;

Get the error:
ORA-29902: error in executing ODCIIndexStart() routine
ORA-20000: Oracle Text error:
DRG-10837: section ext_product_desc does not exist
29902. 00000 -  "error in executing ODCIIndexStart() routine"
*Cause:    The execution of ODCIIndexStart routine caused an error.
*Action:   Examine the error messages produced by the indextype code and
           take appropriate action

Author

Commented:
So, I revised my section group code and got it to work with the WITHIN clause as expected.

BEGIN
CTX_DDL.CREATE_PREFERENCE('test_concat_datastore', 'MULTI_COLUMN_DATASTORE');
CTX_DDL.SET_ATTRIBUTE('test_concat_datastore', 'COLUMNS', 'ext_product_desc as extproductdesc,ext_d_mfg_code as extdmfgcode,ext_d_mfg_prod_code as extdmfgprodcode');
CTX_DDL.create_section_group('field_group', 'BASIC_SECTION_GROUP');
CTX_DDL.ADD_FIELD_SECTION('field_group','extproductdesc','extproductdesc',TRUE);
CTX_DDL.ADD_FIELD_SECTION('field_group','extdmfgcode','extdmfgcode',TRUE);
CTX_DDL.ADD_FIELD_SECTION('field_group','extdmfgprodcode','extdmfgprodcode',TRUE);
END;
/

CREATE INDEX idx_bfextract_text
ON bfextract_fortom (INT_BKT_TARGET)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('datastore MI_DW.test_concat_datastore SECTION GROUP field_group');

This works great....

SELECT * FROM bfextract_fortom
WHERE CONTAINS(INT_BKT_TARGET,SYRINGE WITHIN extproductdesc AND KENDAL WITHIN extdmfgcode AND 8881833 WITHIN extdmfgprodcode')>0 AND dist_num = '002143';

However, what I really need is this...

SELECT * FROM bfextract_fortom
WHERE CONTAINS(INT_BKT_TARGET,%SYRINGE% WITHIN extproductdesc AND %KENDAL% WITHIN extdmfgcode AND %8881833% WITHIN extdmfgprodcode')>0 AND dist_num = '002143';

...and this is very slow (20 seconds compared to the 1 second of the first).  Do you know of any tricks to get it to index within words?  My meaning is if I have data that is 18881833810 in extdmfgprodcode, the first query doesn't find it, but the second does, but very slowly.  I would like to see something closer to the first query's 1 second performance with the functionality of the second.  Any parameters I can use for this to tell it to index it all?  I'll keep looking but I feel we are close.  Thanks for your help so far.
Most Valuable Expert 2012
Distinguished Expert 2018
Commented:
Thanks for the update.  I was just getting ready to start working on this for you.  You saved me a few steps.

I briefly mentioned it before but like the docs, very briefly.

Try this:

begin
      ctx_ddl.create_preference('MY_PREF', 'BASIC_WORDLIST');
      ctx_ddl.set_attribute('MY_PREF','SUBSTRING_INDEX','TRUE');
        --depending on the amount of data you have and resources you might
        --need to tweak the default on the following
      --ctx_ddl.set_attribute('MY_PREF','WILDCARD_MAXTERMS','50000');
end;
/

create index ...
PARAMETERS ('wordlist MY_PREF datastore MI_DW.test_concat_datastore SECTION GROUP field_group')


I also mentioned two other paramters that caused me more headaches with index size than I gained in performance but they might work for you.  I would do a LOT of testing without them before I would justify the increase in index size.

      --ctx_ddl.set_attribute('MY_PREF','PREFIX_INDEX','TRUE');
      --ctx_ddl.set_attribute('MY_PREF','PREFIX_MIN_LENGTH','3');



Author

Commented:
So that worked great for me, thanks.  I am able to query each of my five columns with the wildcards and it is fast (though creating the index takes hours, but understandable).  

Now I have another problem.  In addition to the 5 text columns I want to use with the CONTAINS statement, I have two VARCHAR2 columns that are not part of the wildcard search, just simple WHERE colA='sometihing' AND colB = 'somethingelse'...  So, when I plug this all in, the big query doesn't use tha CONTEXT index because of these two extra columns.  So, is there some way to combine this CONTEXT query with these two regularly indexed queries?  Something like the CTXCAT functionality, or should I just include those two extra columns in my original field list for the CONTEXT index?  If so, I need an exact match on those two fields, but the wildcard match on the pervious 5.  I suspect there must be some way in the CONTAINS syntax to specify this so I will look for that, but wanted to know if there was a better way using the CTXCAT type of functionality.  I was hoping for something like this...

WHERE dist_num = '000002' AND int_bkt = 'BKT_COMPLETE'
AND CONTAINS (INT_BKT_TARGET,'%123456778% WITHIN extdmfgcode AND %2323244% WITHIN extdisticode AND %EA% WITHIN extuom AND %XX66TT% WITHIN extdmfgprodcode')>0

If I have to put them all in the field list of the CONTEXT index, I would need a way to specify an EQUAL not a WITHIN for the CONTAINS syntax for these two columns.


Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>though creating the index takes hours, but understandable
Been there.  On my old hardware, it would take about a week to rebuild from scratch.

Later we'll also need to cover proper maintenance to keep optimal performance.  What I did was set up a database job to execute ctx_ddl.optimize_index('INDEX_NAME','FULL',max_minutes_to_run) in off peak times.  Again, the docs talk about this.

You might also look into 'logging' the output of this task: ctx_output.start_log and ctx_output.end_log.

>>when I plug this all in, the big query doesn't use tha CONTEXT index because of these two extra columns

I'm guessing you mean the query slows down because plan shows the domain index last in the list.  It has to hit the domain index at some point.

I don't CTXCAT will help here.

When was the last time you computed statistics on the table?  The optimizer must thing it's better to use the context index later in the plan.

Can you post the explain plans for the two examples (one with the two columns and one w/o)?

Author

Commented:
It doesn't show the domain index at all.  Just a full tablescan with the filter predicates being the contains and the other two column clauses.  I will try to get you an example since the query that this is being used in is a big confusing join.  I will break it down to the part I care about and send you the explain plans.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>filter predicates being the contains and the other two column clauses

Might be a 'feature' of the plan ouptut.  As long as if shows a predicate, I'm thinking it is working behind the scenes.  It might take openind an SR with Oracle Support to confirm this.

I'm guessing that the query slows down when adding the two additional columns or are you just looking at the plans for now?  We might be able to come up with a better plan with hints.

Author

Commented:
It is slower, but also, the filter predicates show the table name and the access as "FULL" as opposed to all the other tables in the query that show the index name being used to access the table along with the "ACCESS PREDICATES" plus, when I did it with just the contains clause it showed the domain index along with the "ACCESS PRECIDATES" as the contains clause so it is definitely doing a full tablescan.  I will get you an explain plan to show what I mean (along with the indexes that are on the table in question).
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Not sure why but last night I was thinking about this question and remembered NOT seeing something important.  I mentioned it before in passing but wanted to make sure you understood.

Oracle Text indexes do not automatically sync themselves.  I didn't see a 'sync(on commit)' in your create index statement.  If you don't have this, you will need to set up a database job that syncs new rows at some interval.

Waiting rows can be seen in the CTX_USER_PENDING view.

How goes the explain plans?

Author

Commented:
Yeah, I changed that in my final index after I read that in the documentation.  One of the tables is completely static so it isn't an issue, but the other isn't.

Also, I found out something funny about the queries as I was playing around with the explain plan, it was only the weird JOINs I was doing that made it not want to use the domain index with the other two columns included, when I did as a single-table query, it used the domain index whether those columns were present or not.  So, I redid the queries with those two exact match columns included in the column list of the CONTEXT index and changed my JOIN a bit to just use the CONTAINS statement for the 5 wildcard searches plus the two exact match columns and it seemed to work great.  Now I have to wait until a QA build tomorrow to see how it works in the wild.  I will keep you posted.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
Cool.  

I'm surprised to hear this type of problem using 11g.  The Optimizer 'glitches' were supposed to be a lot fewer in 11g.  Back in early 10g, I hated the CBO since it did some stupid things.

>>Yeah, I changed that in my final index after I read that in the documentation

Did you catch my post about a optimize_index job at regular intervals so the index doesn't get to badly fragmented?

Author

Commented:
Yep, I did.  I will see how often I want to run that based on user acceptance testing (and functional testing).  I assume it will take some system resources.
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
>>I assume it will take some system resources

Yes it will but it is necessary to keep the index in shape.  I was lucky in that we really aren't a 24X7 shop and I could beat up my database server after hours.  I let mine run for almost 10 hours every night.
Hello Slightwv and Author,

I am interested in this article i believe it would solve a massive problem that we have in our business on so many levels.

I would like to request that one of you put together a start to finish example of this question thread.  

Some final stats on performance would be ideal too if you can afford the time?

Many Thanks
Most Valuable Expert 2012
Distinguished Expert 2018

Commented:
ringoStasher,

I'm not sure what you are asking to have put together.  If you ask a question on the site I'll do everything I can to help you out.  As long as it's in an Oracle zone, I should get the alert.

If you are asking me to write an Oracle Text article for submission to Experts-Exchange, you aren't the only one that has suggested I write one on Oracle Text.  I might some day but really don't have the time right now to do an article justice.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial