Oracle Text Full Search PL/SQL Syntax

Posted on 2011-04-22
Medium Priority
Last Modified: 2012-05-11
I am developing a small search function using Oracle Text and so far it works pretty well but I am in the need of using fuzzy searches for my users. I am developing the search in Oracle Apex and my current PL/SQL to produce the search results in attached below. I just am not sure how to go about implementing fuzzy searches within this code. For example if I input into the text field: $calls the procedure will do a correct fuzzy search, however I do not want my suers to have to remember '$' is used for fuzzy.

I tried putting the term: fuzzy right after document and before the variable, however my syntax did not appear correct to oracle.

   v_query   VARCHAR2 (4000);
   v_query := 'SELECT   ';
   IF :p1_search_string IS NOT NULL
      v_query := v_query || 'score (1) relevance, ';
      v_query := v_query || '''100%'' relevance, ';
   END IF;
   v_query :=
      v_query || 'SUBSTR (website, INSTR (website, ''/'') + 1) '
              || 'document,' || 'ID, ';
   IF :p1_search_string IS NOT NULL
      v_query :=
         || 'NVL2 '
         || '(:p1_search_string, '
         || 'ctx_doc.snippet '
         || '(''docsx'', '
         || 'ROWID, '
         || 'NVL (:p1_search_string, ''%''), '
         || '''<span style="color:darkgreen;'
         || 'font-weight:bold;font-size:14px;">'', '
         || '''</span>'' '
         || '), '
         || 'NULL '
         || ') snippet ';
      v_query := v_query || 'NULL snippet ';
   END IF;
   v_query := v_query || 'FROM docs ';
   IF :p1_search_string IS NOT NULL
      v_query :=
         || 'WHERE contains (document, :p1_search_string, 1) > 0 ';
   END IF;
   v_query := v_query || 'ORDER BY 1 DESC ';
   RETURN (v_query);

Open in new window

Question by:mjfigur
  • 3
  • 3
LVL 78

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35450702
You say 'fuzzy' but mention '$'.  The '$' is 'stemming' and is different.

Which is it?

You will likely need to pre-process the 'terms' yourself to get it in a format you want.  You will likely need to do some of this anyway.  If you have a text index for testing:  Try searching for the term SYN.

That said:  Are you sure you want to do this?  I ask because I've been using Text for over 10 years and initially started out wanting fuzzy AND stemming.  What I found was it was returning a lot of noise and I was after targeted search results.

Did you see the Bing advertising campaign against Google?  Bign was marketing a more 'targeted' search result...  Did it actually work for them?

I hope you have run actual test cases through your app to verify you can use this as opposed to just reading about it and saying "This is EXACTLY what I need".

Also, SNIPPET does just that:  a snippet.  There is no weight or reason for what it returns.  I had to write my own 'context' around the selected words because I wanted ALL hits shown.

While we're at it:  In my opinion, SCORE isn't much better...


Author Comment

ID: 35451349
Thank you so much. I was just reviewing the documentation and got confused between fuzzy and stemming. Stemming seemed to work well for instance I was trying to search for the word: "call" and within the documents stemming helped to find "calls" "called" "calling", but I agree it has a major potential for noise rather than actual results especially once our index gets much larger.

I thought about using a custom <query> whereby I would used some relaxed methods and that worked well, however I read that SNIPPET will not work with <query>. How do you go about handling this and would I just need to write my own 'context' rather than use SNIPPET?
LVL 78

Accepted Solution

slightwv (䄆 Netminder) earned 1000 total points
ID: 35451393
I'm away from keyboard right now so theory only.

Originally I added radio buttons for the options: exact, sounds like (fuzzy) , 'the works' (stemming and fuzzy).

I'm not sure what you mean by snippet will not work with query.

After a time of monitoring what was used (I captured the searches in a table), I remove the 'advanced' options.

I used ctx_doc.highlight with a highlight tab.  This stores the position and offsets of the found results.  Then it's a semi-simple matter of looping through the highlight tab and build a CLOB as my 'result'.

I say semi-simple because you need to account for overlapping 'results' and how you want to deal with it.

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

LVL 24

Assisted Solution

johanntagle earned 1000 total points
ID: 35451457
Yes take into account what slightwv is saying.  If you still want to do it, I suggest you make the fuzzy search optional by having a checkbox on your search field to enable/disable it.  Then pass that as a parameter to your procedure.  Then you can have something like

if p_fuzzy_search = 1 then
  v_search_string := '?' || replace(pl_search_string,' ',' ?'); 
  v_search_string := pl_search_string;
end if;

Open in new window

Going further, I find SCORE more useful if used with progressive relaxation.  See http://www.oracle.com/technetwork/database/enterprise-edition/prog-relax-099280.html

With progressive relaxation you form v_search_string like this

v_search_string := '<query>
       <seq>'||replace(pl_search_string,' ',' $')||'</seq>
       <seq>'||replace(pl_search_string,' ',' ?')||'</seq>

Open in new window

Using the formed v_search_string in your contains function will score exact matches highest, then stem matches, then fuzzy matches.
LVL 24

Expert Comment

ID: 35451470
(took me a while to type, didn't realize there was another exchange between slightwv and mjfigur.  But I hope you still find what I posted useful).
LVL 78

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35451567
>>...didn't realize there...

Never a problem.  That's is what is great about the site.  I've used Text for ten years and never really knew progressive relaxation.  I'm interested.  Have you used it?  Does it make the SCORE actually usable?

I'll need to play around with it in my own setup!

LVL 24

Expert Comment

ID: 35451821
Hi slightwv,

Yes I've used progressive relaxation the moment we upgraded to 10g (it was only introduced then).  It makes SCORE usable in such that you can sort by it since progressive relaxation will always score those matching your first criterion higher.  This way I can also display the score so that if only fuzzy search got a match, the user will know from the score that it wasn't a very good match.


Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this series, we will discuss common questions received as a database Solutions Engineer at Percona. In this role, we speak with a wide array of MySQL and MongoDB users responsible for both extremely large and complex environments to smaller singl…
Lotus Notes has been used since a very long time as an e-mail client and is very popular because of it's unmatched security. In this article we are going to learn about  RRV Bucket corruption and understand various methods to Fix "RRV Bucket Corrupt…
Via a live example, show how to take different types of Oracle backups using RMAN.
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Suggested Courses

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question