Oracle Text Full Search PL/SQL Syntax

Posted on 2011-04-22
Last Modified: 2012-05-11
I am developing a small search function using Oracle Text and so far it works pretty well but I am in the need of using fuzzy searches for my users. I am developing the search in Oracle Apex and my current PL/SQL to produce the search results in attached below. I just am not sure how to go about implementing fuzzy searches within this code. For example if I input into the text field: $calls the procedure will do a correct fuzzy search, however I do not want my suers to have to remember '$' is used for fuzzy.

I tried putting the term: fuzzy right after document and before the variable, however my syntax did not appear correct to oracle.

   v_query   VARCHAR2 (4000);
   v_query := 'SELECT   ';
   IF :p1_search_string IS NOT NULL
      v_query := v_query || 'score (1) relevance, ';
      v_query := v_query || '''100%'' relevance, ';
   END IF;
   v_query :=
      v_query || 'SUBSTR (website, INSTR (website, ''/'') + 1) '
              || 'document,' || 'ID, ';
   IF :p1_search_string IS NOT NULL
      v_query :=
         || 'NVL2 '
         || '(:p1_search_string, '
         || 'ctx_doc.snippet '
         || '(''docsx'', '
         || 'ROWID, '
         || 'NVL (:p1_search_string, ''%''), '
         || '''<span style="color:darkgreen;'
         || 'font-weight:bold;font-size:14px;">'', '
         || '''</span>'' '
         || '), '
         || 'NULL '
         || ') snippet ';
      v_query := v_query || 'NULL snippet ';
   END IF;
   v_query := v_query || 'FROM docs ';
   IF :p1_search_string IS NOT NULL
      v_query :=
         || 'WHERE contains (document, :p1_search_string, 1) > 0 ';
   END IF;
   v_query := v_query || 'ORDER BY 1 DESC ';
   RETURN (v_query);

Open in new window

Question by:mjfigur
    LVL 76

    Expert Comment

    by:slightwv (䄆 Netminder)
    You say 'fuzzy' but mention '$'.  The '$' is 'stemming' and is different.

    Which is it?

    You will likely need to pre-process the 'terms' yourself to get it in a format you want.  You will likely need to do some of this anyway.  If you have a text index for testing:  Try searching for the term SYN.

    That said:  Are you sure you want to do this?  I ask because I've been using Text for over 10 years and initially started out wanting fuzzy AND stemming.  What I found was it was returning a lot of noise and I was after targeted search results.

    Did you see the Bing advertising campaign against Google?  Bign was marketing a more 'targeted' search result...  Did it actually work for them?

    I hope you have run actual test cases through your app to verify you can use this as opposed to just reading about it and saying "This is EXACTLY what I need".

    Also, SNIPPET does just that:  a snippet.  There is no weight or reason for what it returns.  I had to write my own 'context' around the selected words because I wanted ALL hits shown.

    While we're at it:  In my opinion, SCORE isn't much better...


    Author Comment

    Thank you so much. I was just reviewing the documentation and got confused between fuzzy and stemming. Stemming seemed to work well for instance I was trying to search for the word: "call" and within the documents stemming helped to find "calls" "called" "calling", but I agree it has a major potential for noise rather than actual results especially once our index gets much larger.

    I thought about using a custom <query> whereby I would used some relaxed methods and that worked well, however I read that SNIPPET will not work with <query>. How do you go about handling this and would I just need to write my own 'context' rather than use SNIPPET?
    LVL 76

    Accepted Solution

    I'm away from keyboard right now so theory only.

    Originally I added radio buttons for the options: exact, sounds like (fuzzy) , 'the works' (stemming and fuzzy).

    I'm not sure what you mean by snippet will not work with query.

    After a time of monitoring what was used (I captured the searches in a table), I remove the 'advanced' options.

    I used ctx_doc.highlight with a highlight tab.  This stores the position and offsets of the found results.  Then it's a semi-simple matter of looping through the highlight tab and build a CLOB as my 'result'.

    I say semi-simple because you need to account for overlapping 'results' and how you want to deal with it.

    LVL 24

    Assisted Solution

    Yes take into account what slightwv is saying.  If you still want to do it, I suggest you make the fuzzy search optional by having a checkbox on your search field to enable/disable it.  Then pass that as a parameter to your procedure.  Then you can have something like

    if p_fuzzy_search = 1 then
      v_search_string := '?' || replace(pl_search_string,' ',' ?'); 
      v_search_string := pl_search_string;
    end if;

    Open in new window

    Going further, I find SCORE more useful if used with progressive relaxation.  See

    With progressive relaxation you form v_search_string like this

    v_search_string := '<query>
           <seq>'||replace(pl_search_string,' ',' $')||'</seq>
           <seq>'||replace(pl_search_string,' ',' ?')||'</seq>

    Open in new window

    Using the formed v_search_string in your contains function will score exact matches highest, then stem matches, then fuzzy matches.
    LVL 24

    Expert Comment

    (took me a while to type, didn't realize there was another exchange between slightwv and mjfigur.  But I hope you still find what I posted useful).
    LVL 76

    Expert Comment

    by:slightwv (䄆 Netminder)
    >>...didn't realize there...

    Never a problem.  That's is what is great about the site.  I've used Text for ten years and never really knew progressive relaxation.  I'm interested.  Have you used it?  Does it make the SCORE actually usable?

    I'll need to play around with it in my own setup!

    LVL 24

    Expert Comment

    Hi slightwv,

    Yes I've used progressive relaxation the moment we upgraded to 10g (it was only introduced then).  It makes SCORE usable in such that you can sort by it since progressive relaxation will always score those matching your first criterion higher.  This way I can also display the score so that if only fuzzy search got a match, the user will know from the score that it wasn't a very good match.


    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Join & Write a Comment

    This article explains all about SQL Server Piecemeal Restore with examples in step by step manner.
    From implementing a password expiration date, to datatype conversions and file export options, these are some useful settings I've found in Jasper Server.
    This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
    This video shows syntax for various backup options while discussing how the different basic backup types work.  It explains how to take full backups, incremental level 0 backups, incremental level 1 backups in both differential and cumulative mode a…

    754 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now