Solved

Oracle SQL - Parsing & Searching Strings

Posted on 2014-01-15
14
517 Views
Last Modified: 2014-01-27
Hi,

I was wondering if there is a way to parse texts (such as business names) in Oracle SQL and search for components. For example, if my search word is '100 General Store Express', I would like to find records that contain all 4 components of my search word - '100', 'General','Store', 'Express' in any order.

So, if my records are :

1. 100 Express General Store
2. 100 General Store Express Way
3. 100 Express General

Then, the search would return only record 1 and 2.

Thank you.
0
Comment
Question by:iamnamja
  • 5
  • 5
  • 4
14 Comments
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39783220
Brute force way:

select * from mytable where
instr(mycolumn,'100') > 0 and  instr(mycolumn,'General')>0 and instr(mycolumn,'Store')>0 and  instr(mycolumn,'Express')>0;


Depending on your actual data/requirements Oracle Text is a special index type that can easily do this and make searching large amounts of text pretty efficient.

If you can provide more information on your requirements we can go from there.
0
 

Author Comment

by:iamnamja
ID: 39783416
Thank you for your comment.
However, my search word list (Table A) contains about 80,000 names, and my target data (Table B) where I have to perform the serach contains about 15 million names.

Target Data (Table B) may contain a little variation of names from Table A, such as special characters, branch number, etc. I tried the following code:

SELECT  A.*, B.* FROM Table A
INNER JOIN Table B
ON REGEXP_REPLACE(UPPER(A.NAME),'[^A-Z0-9 ]','')  LIKE '%' || UPPER(B.NAME) || '%'

Open in new window


However, this returns too many false hits if B.NAME contains A.NAME in any part of the string. So I want to enhance the fuzzy matching logic. Any help would be greatly appreciated.

Thanks.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39783434
Is this a one-time type of search or will you be searching 80,000 names against 15 million rows on a regular basis?

How often does the 15 million rows change?

I'm leaning towards Oracle Text but using Text isn't as simple as building a normal index.  It requires some hand holding.  I don't want to over-complicate things if I don't need to.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39783442
From your query,  I assume tableB.name looks like this ...

100 Express General Store
100 General Store Express Way
100 Express General

What does tableA.name look like?
0
 

Author Comment

by:iamnamja
ID: 39783501
TableA.name looks like

100 General Store Express
25 Ink, Inc
3566 Food Market
....

Basically, they look same as tableB.name but table A may contain small typos, special characters, store number.. etc.. small variations.

This is a one-time type of search. They are both static data.
0
 
LVL 76

Assisted Solution

by:slightwv (䄆 Netminder)
slightwv (䄆 Netminder) earned 200 total points
ID: 39783503
I'm taking a guess that tableA.name contains '100 General Store Express'

Assuming that is correct, below is a working example of using Oracle Text to return rows 1 & 2.

It uses dynamic SQL.  I couldn't think of a quick way to convert '100 General Store Express' into something Oracle Text could work with.

drop table tab1 purge;
create table tab1(
	name varchar2(100)
);

drop table tab2 purge;
create table tab2(
	id	number,
	name varchar2(100)
);

insert into tab1 values('100 General Store Express');

insert into tab2 values(1,'100 Express General Store');
insert into tab2 values(2,'100 General Store Express Way');
insert into tab2 values(3,'100 Express General ');
commit;

create index name_idx on tab2(name) indextype is ctxsys.context parameters('sync(on commit)');

declare
	mysql varchar2(4000) := 'select id from tab2 where contains(name,:myvar) > 0';
	myresult number;
	mycur sys_refcursor;
begin
	for i in (select name  from tab1) loop
		open mycur for mysql using 'near((' || replace(i.name,' ',',') || '),100,false)';
		fetch mycur into myresult;
		loop
			exit when mycur%notfound;
			dbms_output.put_line('Got: ' || myresult);
			fetch mycur into myresult;
		end loop;
		close mycur;
	end loop;
end;
/

Open in new window

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39783536
>>but table A may contain small typos, special characters

This could cause mine to barf.  You might want to add your original regexp_replace inside of my replace to get rid of them:

replace(regexp_replace(i.name,),'[^A-Z0-9 ]'),' ',',')

If they can have mumtiple spaces, it will barf as well.  I created my own clean_string function that takes raw data and makes it as pretty as I can before I pass it off to the Text query.

If you want to use Text, I can continue helping but don't want to add a lot of stuff if it will never be used.
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 73

Expert Comment

by:sdstuber
ID: 39783542
80K rows of regexps against 15M rows of regexps will likely take awhile, but this should do the trick  (worked for me anyway on small scale data)


SELECT bname,aname
  FROM (SELECT b.name bname, a.name aname,
               (    SELECT SET(
                               CAST(
                                   COLLECT(
                                       REGEXP_SUBSTR(
                                           REGEXP_REPLACE(UPPER(a.name), '[^A-Z0-9 ]'),
                                           '[^ ]+',
                                           1,
                                           LEVEL
                                       )
                                   ) AS ora_mining_varchar2_nt
                               )
                           )
                      FROM DUAL
                CONNECT BY LEVEL <=
                               REGEXP_COUNT(REGEXP_REPLACE(UPPER(a.name), '[^A-Z0-9 ]'), '[^ ]+'))
                   awords,
               (    SELECT SET(
                               CAST(
                                   COLLECT(
                                       REGEXP_SUBSTR(
                                           UPPER(b.name),
                                           '[^ ]+',
                                           1,
                                           LEVEL
                                       )
                                   ) AS ora_mining_varchar2_nt
                               )
                           )
                      FROM DUAL
                CONNECT BY LEVEL <= REGEXP_COUNT(b.name, '[^ ]+'))
                   bwords
          FROM tablea a, tableb b)
 WHERE awords MULTISET INTERSECT bwords = awords
0
 

Author Comment

by:iamnamja
ID: 39783697
slightwv - I'm getting the following error:
ORA-29902: error in executing ODCIIndexStart() routine
ORA-20000: Oracle Text error:
DRG-50901: text query parser syntax error on line 1, column 14

sdstuber - I'm getting the following error:
ORA-00904: "REGEXP_COUNT": invalid identifier

Would it be because I have a lower version of Oracle?
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39783721
>>Would it be because I have a lower version of Oracle?

What is your version of Oracle (all 4 numbers please, like 10.2.0.4)?

>>slightwv - I'm getting the following error:

When running my example test case or trying to incorporate into what you have?
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39783724
>>> Would it be because I have a lower version of Oracle?

yes

change

CONNECT BY LEVEL <=
                               REGEXP_COUNT(REGEXP_REPLACE(UPPER(a.name), '[^A-Z0-9 ]'), '[^ ]+')

to

CONNECT BY REGEXP_SUBSTR(UPPER(a.name), '[^A-Z0-9 ]', '[^ ]+',1,LEVEL) is not null



and change

 CONNECT BY LEVEL <= REGEXP_COUNT(b.name, '[^ ]+')

to

CONNECT BY  REGEXP_SUBSTR(b.name, '[^ ]+',1,LEVEL) is not null
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39783729
also note,  if you don't have a type called ora_mining_varchar2_nt

you may need to create one


create or replace type ora_mining_varchar2_nt as table of varchar2(4000);


you can rename it if you want, just use the same name in CAST
0
 

Author Comment

by:iamnamja
ID: 39783799
sdstuber - I still get this error after changing to what you suggested above:

ORA-01722: invalid number
01722. 00000 -  "invalid number"
*Cause:    
*Action:
0
 
LVL 73

Accepted Solution

by:
sdstuber earned 300 total points
ID: 39783810
oh sorry, in my cut-and-paste I dropped an entire function call


change


CONNECT BY LEVEL <=
                               REGEXP_COUNT(REGEXP_REPLACE(UPPER(a.name), '[^A-Z0-9 ]'), '[^ ]+')


to

CONNECT BY REGEXP_SUBSTR(REGEXP_REPLACE(UPPER(a.name), '[^A-Z0-9 ]'), '[^ ]+',1,LEVEL) is not null
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Truncate is a DDL Command where as Delete is a DML Command. Both will delete data from table, but what is the difference between these below statements truncate table <table_name> ?? delete from <table_name> ?? The first command cannot be …
Configuring and using Oracle Database Gateway for ODBC Introduction First, a brief summary of what a Database Gateway is.  A Gateway is a set of driver agents and configurations that allow an Oracle database to communicate with other platforms…
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines
This video shows how to copy an entire tablespace from one database to another database using Transportable Tablespace functionality.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now