Solved

Search string for whole words in Oracle

Posted on 2011-02-22
24
640 Views
Last Modified: 2012-05-11
Hi,

I am looking for the most simple and down-to-earth formulation of an SQL search query for whole words in a stored string field.

The input will be a single word, e.g. 'foo'. And the string stored in the DB will be a semicolon-delimited string that might contain the single search word.

How should I write the SQL so that I will get a hit if the field looks like
'foo; foo1; foo2'
...but NO hit if the field looks like
'foo1; foo2''
?

Obviously, this won't work:
SELECT *
FROM thetable
WHERE thesemicolonstring LIKE '%foo%'
...since that will get me a hit in both cases.

Thanks,

Cos
0
Comment
Question by:thecosimist
  • 10
  • 7
  • 4
  • +1
24 Comments
 
LVL 76

Accepted Solution

by:
slightwv (䄆 Netminder) earned 150 total points
ID: 34951119
If you can guarantee the spacing:

select * from thetable where ';' || thesemicolonstring || ';' like '%;foo;%'
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 34951140
You might also research Oracle Text indexes.  They require a little more maintenance because of how they work.  You will want to optimize them on a regular basis.

create index thetable_idx on thetable(thesemicolonstring) indextype is ctxsys.context;

Then you can do:
select * from thetable where contains(thesemicolonstring,'foo) > 0;
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34951226
where regexp_like(thesemicolonstring,' ?foo;')
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34951238
or, if the semicolon might not be present at the end

where regexp_like(thesemicolonstring,' ?foo;?')
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34951242
in your example you have spaces after your semi-colons,  if that's not guaranteed then

where regexp_like(thesemicolonstring,'[ ;]?foo;?')
0
 

Author Comment

by:thecosimist
ID: 34951245
Thanks!

However, the semicolon-delimited string cannot be guaranteed to have semicolons on each side of the values. So two values may be stored like this: 'foo; foo1'

For the other solution - this SQL is going to be used in a system that is not maintained by very database-savvy people. So preferably nothing that needs regular optimizing.

Aren't there special characters that can be used for quasi-regexp purposes? Like the underscore character '_' or percent '%'.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34951253
or maybe this would be better

where regexp_like(thesemicolonstring,'([ ;]?foo;)|([ ;]?foo$)')
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 34951260
>>However, the semicolon-delimited string cannot be guaranteed

This is why I pre-pended and appended my own semi colon in the very first post.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 34951270
>>'([ ;]?foo;)|([ ;]?foo$)

Won't this also hit:  many_foo
0
 
LVL 73

Assisted Solution

by:sdstuber
sdstuber earned 100 total points
ID: 34951277
jeez, not enough caffeine yet this morning,
not sure why I was trying to make the space and semicolon optional


where regexp_like(thesemicolonstring,'(^foo[;])|([ ;]foo;)|([ ;]foo$)')
0
 

Author Comment

by:thecosimist
ID: 34951301
sdstuber,

I tried your solution but realized that a search for "foo" will result in a hit where thesemicolonstring is just "foo" but also if it's "foo1", i.e., if there is just one value and no semicolons.

So to clarify, a search for 'foo' should return a hit if thesemicolonstring looks like this:
'foo'
- or -
'xyz; foo; abc'

...but result in NO hits if the thesemicolonstring looks like this:
'foo1'
- or -
'foo1; foo2'
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 34951322
Did you not try my LIKE select above?
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 34951327
I would still suggest Oracle Text if you have a large amount of data to search through.  It is what it was created to do.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34951342
>> I tried your solution

which one?  I posted several

anyway, I forgot the simplest case of just 'foo'
also, you never mentioned the space-semicolon,  is that reliable or not?


 WHERE REGEXP_LIKE(thesemicolonstring, '^foo$|(^foo[;])|([ ;]foo;)|([ ;]foo$)')
0
 

Author Comment

by:thecosimist
ID: 34951347
:-)

Thanks for the answers!... I think I've had too much caffeine actually.

However, sorry for being picky, but I would prefer to not use the advanced (and neat!) regexp solution you posted sdstuber, since the system is used by people who are not very good in Oracle/advanced SQL, or very good at reading regexps. Is there a simpler solution?

If there's no simpler solution, that's an answer as well of course. Part of the question is to find out what's best practice, and curiosity about the simplest possible SQL.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34951401
simplest - use the concatenated semicolon string suggested by slightwv, unless the ";" vs " ;" isn't reliable

next - use text indexes, also suggested by slightwv

other suggestion, use a function that will parse yoru string into a collection and check for membership in the collection (search EE for str2tbl)
0
 

Author Comment

by:thecosimist
ID: 34951408
Ooops, I misread your first suggestion, slightwv. That will work perfectly. Too much caffeine, indeed. However, I will also have use for the regexp you posted, sdstuber.

Since I'm kinda new to experts-exchange, is it possible to accept two posts as solutions? I remember trying it once before and failing miserably.

Thanks a lot for the fast and reliable input!!!

/Cos
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34951417
another option, DON'T use a concatenated string,  instead split the values and put them into a child table keyed back to that row
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34951430
yes you can accept multiple solutions and divide the points however you want among all accepted answers
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 34951435
>> but I would prefer to not use the advanced

This would be Oracle Text.  It's a simple 'contains' query.

>> is it possible to accept two posts as solutions?

Yes.  Click the 'Accept Multiple Solutions link.
0
 
LVL 20

Expert Comment

by:gatorvip
ID: 34951467
How about replacing punctuation with a space, then searching with INSTR (LIKE should work too):

instr(
   translate( ' '|| lower(thesemicolonstring), ';', ' '),
   ' foo '
)>0

Due to the space before and after 'foo', neither foo1 nor foo2 will be returned.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 34951472
gatorvip,  that doesn't find "foo"
0
 
LVL 20

Expert Comment

by:gatorvip
ID: 34951507
True, would have needed another space at the end. I should have refreshed the page before posting as I missed a bunch of replies!

instr(
   translate( ' '|| lower(thesemicolonstring) || ' ', ';', ' '),
   ' foo '
)>0
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

This article started out as an Experts-Exchange question, which then grew into a quick tip to go along with an IOUG presentation for the Collaborate confernce and then later grew again into a full blown article with expanded functionality and legacy…
Background In several of the companies I have worked for, I noticed that corporate reporting is off loaded from the production database and done mainly on a clone database which needs to be kept up to date daily by various means, be it a logical…
Via a live example show how to connect to RMAN, make basic configuration settings changes and then take a backup of a demo database
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now