[Webinar] Streamline your web hosting managementRegister Today

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1653
  • Last Modified:

Similarity search - PL/SQL

Hi all!
I am developing a database containg hundreds of thousands of rows - will increase to over a million within the year.
They each have a 'title'-value which is possible to perform a search on. However, I am looking for a way to provide a search that is not an exact match (either on the whole field or parts of the field), but som sort of query that returns all rows that are similar. This is to allow for users to write the search criteria like they think it is written (for instance a search for 'misisipi' should return 'Mississippi'). The title-field is a varchar2 column.

I have installed the Intermedia Context-option (on Oracle 8.1.7), but I can't seem to make it work properly. As far as I have understood, this is supposed to be what I am looking for.

I have also tried to use a procedure that calculates the number of characters that must be altered to make a match (the Levenshtein Distance) - details can be found at http://www.merriampark.com/ld.htm , which is excactly what I want, but it is far too slow in my case due to the amount of data.

Any ideas?

Help will be greatly appreciated!!!
0
lassetyr
Asked:
lassetyr
1 Solution
 
venkotchCommented:
Probably these examples may help ...

SELECT resume
  FROM emp WHERE CONTAINS (resume, 'DBA') > 0;

-- Order results by fuzzy score (range: 0 to 80)
SELECT SCORE(1), empname FROM emp
 WHERE CONTAINS(resume, 'dog', 1 ) > 0
 ORDER BY SCORE(1) DESC;

-- Use ABOUT to narrow things down
SELECT empname, resume FROM emp
 WHERE  CONTAINS (resume, 'about(EJB)', 1) > 0;

-- $ = Stem searching (eg, $sing: sing, sang, sung)
SELECT empname, resume FROM emp
 WHERE  CONTAINS (resume, '$consultant', 1) > 0;

-- ? = Fuzzy matching (dog, doug, dojo)
SELECT empname, resume FROM emp
 WHERE  CONTAINS (resume, '?developer', 1) > 0;

I've got them from
http://www.orafaq.org/faqctx.htm
0
 
SANALCommented:

Hi lassetyr,

You need not worry about the Levenshtein distance algorithm and the implementation of similar stuff. Use the advance search facility in Oracle; use fuzzy, stem ... etc. Oracle will do it for you.

For more details check out :

http://metalink.oracle.com/help/usaeng/Search/search.html


~Sanal
0
 
iozturkCommented:
Hi lassetyr,
I think you did most of the work. Try partition your table as if a dictionary if you can.
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
SwanyCommented:
use the fuzzy search query operator on your CONTEXT type
index.  

select score(1), title, id
  from all_titles /* don't know your table name */
 where contains(title,
                '?misisipi',  
                1)
 order by score(1) desc            
0
 
lassetyrAuthor Commented:
The problem is that the fuzzy-search included in Oracle is for English language text only (please tell me if I am wrong) - most of the title-fields are in Norwegian, and thus the fuzzy query returns unnatural results due to the linguistic differences.

Am I wrong? Is the fuzzy search available for several languages?

The reason I started on the Levenshtein distance algorithm is that it is language independent - it only looks at the separate characters in the strings...

0
 
lassetyrAuthor Commented:
Found a solution to provide support several languages on http://ccm.redhat.com/bboard-archive/webdb/000cQl.html

Thanks for the help anyway...
0
 
SpideyModCommented:
PAQ'd and all 250 points refunded.

SpideyMod
Community Support Moderator @Experts Exchange
0

Featured Post

The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now