?
Solved

Similarity search - PL/SQL

Posted on 2003-04-01
7
Medium Priority
?
1,618 Views
Last Modified: 2007-12-19
Hi all!
I am developing a database containg hundreds of thousands of rows - will increase to over a million within the year.
They each have a 'title'-value which is possible to perform a search on. However, I am looking for a way to provide a search that is not an exact match (either on the whole field or parts of the field), but som sort of query that returns all rows that are similar. This is to allow for users to write the search criteria like they think it is written (for instance a search for 'misisipi' should return 'Mississippi'). The title-field is a varchar2 column.

I have installed the Intermedia Context-option (on Oracle 8.1.7), but I can't seem to make it work properly. As far as I have understood, this is supposed to be what I am looking for.

I have also tried to use a procedure that calculates the number of characters that must be altered to make a match (the Levenshtein Distance) - details can be found at http://www.merriampark.com/ld.htm , which is excactly what I want, but it is far too slow in my case due to the amount of data.

Any ideas?

Help will be greatly appreciated!!!
0
Comment
Question by:lassetyr
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
7 Comments
 
LVL 6

Expert Comment

by:venkotch
ID: 8245596
Probably these examples may help ...

SELECT resume
  FROM emp WHERE CONTAINS (resume, 'DBA') > 0;

-- Order results by fuzzy score (range: 0 to 80)
SELECT SCORE(1), empname FROM emp
 WHERE CONTAINS(resume, 'dog', 1 ) > 0
 ORDER BY SCORE(1) DESC;

-- Use ABOUT to narrow things down
SELECT empname, resume FROM emp
 WHERE  CONTAINS (resume, 'about(EJB)', 1) > 0;

-- $ = Stem searching (eg, $sing: sing, sang, sung)
SELECT empname, resume FROM emp
 WHERE  CONTAINS (resume, '$consultant', 1) > 0;

-- ? = Fuzzy matching (dog, doug, dojo)
SELECT empname, resume FROM emp
 WHERE  CONTAINS (resume, '?developer', 1) > 0;

I've got them from
http://www.orafaq.org/faqctx.htm
0
 

Expert Comment

by:SANAL
ID: 8248043

Hi lassetyr,

You need not worry about the Levenshtein distance algorithm and the implementation of similar stuff. Use the advance search facility in Oracle; use fuzzy, stem ... etc. Oracle will do it for you.

For more details check out :

http://metalink.oracle.com/help/usaeng/Search/search.html


~Sanal
0
 
LVL 4

Expert Comment

by:iozturk
ID: 8249366
Hi lassetyr,
I think you did most of the work. Try partition your table as if a dictionary if you can.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Expert Comment

by:Swany
ID: 8249814
use the fuzzy search query operator on your CONTEXT type
index.  

select score(1), title, id
  from all_titles /* don't know your table name */
 where contains(title,
                '?misisipi',  
                1)
 order by score(1) desc            
0
 

Author Comment

by:lassetyr
ID: 8251729
The problem is that the fuzzy-search included in Oracle is for English language text only (please tell me if I am wrong) - most of the title-fields are in Norwegian, and thus the fuzzy query returns unnatural results due to the linguistic differences.

Am I wrong? Is the fuzzy search available for several languages?

The reason I started on the Levenshtein distance algorithm is that it is language independent - it only looks at the separate characters in the strings...

0
 

Author Comment

by:lassetyr
ID: 8251949
Found a solution to provide support several languages on http://ccm.redhat.com/bboard-archive/webdb/000cQl.html

Thanks for the help anyway...
0
 

Accepted Solution

by:
SpideyMod earned 0 total points
ID: 8253468
PAQ'd and all 250 points refunded.

SpideyMod
Community Support Moderator @Experts Exchange
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Configuring and using Oracle Database Gateway for ODBC Introduction First, a brief summary of what a Database Gateway is.  A Gateway is a set of driver agents and configurations that allow an Oracle database to communicate with other platforms…
How to Unravel a Tricky Query Introduction If you browse through the Oracle zones or any of the other database-related zones you'll come across some complicated solutions and sometimes you'll just have to wonder how anyone came up with them.  …
This video shows information on the Oracle Data Dictionary, starting with the Oracle documentation, explaining the different types of Data Dictionary views available by group and permissions as well as giving examples on how to retrieve data from th…
This video shows syntax for various backup options while discussing how the different basic backup types work.  It explains how to take full backups, incremental level 0 backups, incremental level 1 backups in both differential and cumulative mode a…
Suggested Courses

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question