Oracle Text Full Search PL/SQL Syntax

I am developing a small search function using Oracle Text and so far it works pretty well but I am in the need of using fuzzy searches for my users. I am developing the search in Oracle Apex and my current PL/SQL to produce the search results in attached below. I just am not sure how to go about implementing fuzzy searches within this code. For example if I input into the text field: $calls the procedure will do a correct fuzzy search, however I do not want my suers to have to remember '$' is used for fuzzy.

I tried putting the term: fuzzy right after document and before the variable, however my syntax did not appear correct to oracle.

Thanks
 
DECLARE
   v_query   VARCHAR2 (4000);
BEGIN
   v_query := 'SELECT   ';
 
   IF :p1_search_string IS NOT NULL
   THEN
      v_query := v_query || 'score (1) relevance, ';
   ELSE
      v_query := v_query || '''100%'' relevance, ';
   END IF;
 
   v_query :=
      v_query || 'SUBSTR (website, INSTR (website, ''/'') + 1) '
              || 'document,' || 'ID, ';
 
   IF :p1_search_string IS NOT NULL
   THEN
      v_query :=
            v_query
         || 'NVL2 '
         || '(:p1_search_string, '
         || 'ctx_doc.snippet '
         || '(''docsx'', '
         || 'ROWID, '
         || 'NVL (:p1_search_string, ''%''), '
         || '''<span style="color:darkgreen;'
         || 'font-weight:bold;font-size:14px;">'', '
         || '''</span>'' '
         || '), '
         || 'NULL '
         || ') snippet ';
   ELSE
      v_query := v_query || 'NULL snippet ';
   END IF;
 
   v_query := v_query || 'FROM docs ';
 
   IF :p1_search_string IS NOT NULL
   THEN
      v_query :=
            v_query
         || 'WHERE contains (document, :p1_search_string, 1) > 0 ';
   END IF;
 
   v_query := v_query || 'ORDER BY 1 DESC ';
   RETURN (v_query);
END;

Open in new window

mjfigurAsked:
Who is Participating?
 
slightwv (䄆 Netminder) Commented:
I'm away from keyboard right now so theory only.

Originally I added radio buttons for the options: exact, sounds like (fuzzy) , 'the works' (stemming and fuzzy).

I'm not sure what you mean by snippet will not work with query.

After a time of monitoring what was used (I captured the searches in a table), I remove the 'advanced' options.

I used ctx_doc.highlight with a highlight tab.  This stores the position and offsets of the found results.  Then it's a semi-simple matter of looping through the highlight tab and build a CLOB as my 'result'.

I say semi-simple because you need to account for overlapping 'results' and how you want to deal with it.

0
 
slightwv (䄆 Netminder) Commented:
You say 'fuzzy' but mention '$'.  The '$' is 'stemming' and is different.

Which is it?

You will likely need to pre-process the 'terms' yourself to get it in a format you want.  You will likely need to do some of this anyway.  If you have a text index for testing:  Try searching for the term SYN.

That said:  Are you sure you want to do this?  I ask because I've been using Text for over 10 years and initially started out wanting fuzzy AND stemming.  What I found was it was returning a lot of noise and I was after targeted search results.

Did you see the Bing advertising campaign against Google?  Bign was marketing a more 'targeted' search result...  Did it actually work for them?

I hope you have run actual test cases through your app to verify you can use this as opposed to just reading about it and saying "This is EXACTLY what I need".

Also, SNIPPET does just that:  a snippet.  There is no weight or reason for what it returns.  I had to write my own 'context' around the selected words because I wanted ALL hits shown.

While we're at it:  In my opinion, SCORE isn't much better...

0
 
mjfigurAuthor Commented:
Thank you so much. I was just reviewing the documentation and got confused between fuzzy and stemming. Stemming seemed to work well for instance I was trying to search for the word: "call" and within the documents stemming helped to find "calls" "called" "calling", but I agree it has a major potential for noise rather than actual results especially once our index gets much larger.

I thought about using a custom <query> whereby I would used some relaxed methods and that worked well, however I read that SNIPPET will not work with <query>. How do you go about handling this and would I just need to write my own 'context' rather than use SNIPPET?
0
Network Scalability - Handle Complex Environments

Monitor your entire network from a single platform. Free 30 Day Trial Now!

 
johanntagleCommented:
Yes take into account what slightwv is saying.  If you still want to do it, I suggest you make the fuzzy search optional by having a checkbox on your search field to enable/disable it.  Then pass that as a parameter to your procedure.  Then you can have something like

if p_fuzzy_search = 1 then
  v_search_string := '?' || replace(pl_search_string,' ',' ?'); 
else
  v_search_string := pl_search_string;
end if;

Open in new window


Going further, I find SCORE more useful if used with progressive relaxation.  See http://www.oracle.com/technetwork/database/enterprise-edition/prog-relax-099280.html

With progressive relaxation you form v_search_string like this

v_search_string := '<query>
   <textquery>
     <progression>
       <seq>'||pl_search_string||'</seq>
       <seq>'||replace(pl_search_string,' ',' $')||'</seq>
       <seq>'||replace(pl_search_string,' ',' ?')||'</seq>
     </progression>
   </textquery>
</query>';

Open in new window


Using the formed v_search_string in your contains function will score exact matches highest, then stem matches, then fuzzy matches.
0
 
johanntagleCommented:
(took me a while to type, didn't realize there was another exchange between slightwv and mjfigur.  But I hope you still find what I posted useful).
0
 
slightwv (䄆 Netminder) Commented:
>>...didn't realize there...

Never a problem.  That's is what is great about the site.  I've used Text for ten years and never really knew progressive relaxation.  I'm interested.  Have you used it?  Does it make the SCORE actually usable?

I'll need to play around with it in my own setup!

0
 
johanntagleCommented:
Hi slightwv,

Yes I've used progressive relaxation the moment we upgraded to 10g (it was only introduced then).  It makes SCORE usable in such that you can sort by it since progressive relaxation will always score those matching your first criterion higher.  This way I can also display the score so that if only fuzzy search got a match, the user will know from the score that it wasn't a very good match.

Johann
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.