[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Oracle string extraction via regular expression

Posted on 2011-04-28
29
Medium Priority
?
416 Views
Last Modified: 2012-05-11
I am parsing an Oracle (11gR2) table field (varchar2 datatype) to extract some values with specific pattern.

I use this:
Select TRIM(REGEXP_SUBSTR(upper(F1) ,'ik([^ :.><,_]+)',1,1,'i')) as F1_updated

These are the needed INPUT/OUTPUT:
IK234-565 / IK234-565
This is ik34 ff / ik34
These are ikval-3456 test / ikval-3456
These are IKva-345996_456 test/  IKva-345996
These are IK-345996:456 test/  IK-345996
These are IK-345996-456 test/ IK-345996

All above example works except the last one. It returns IK-345996-456 instead. But I need IK-345996. It needs to stop when it sees the second "-" char in the string. Is there any way to use the same function with some changes to achieve this?
0
Comment
Question by:toooki
  • 10
  • 10
  • 8
28 Comments
 
LVL 74

Expert Comment

by:sdstuber
ID: 35485766
your expected results don't reflect the functionality of forcing UPPER case
0
 

Author Comment

by:toooki
ID: 35485979
Thank you.
Sorry, case is not that important but still wanted the upper case in the output. So the correct pairs are:

IK234-565 / IK234-565
This is ik34 ff / IK34
These are ikval-3456 test / IKVAL-3456
These are IKva-345996_456 test/  IKVA-345996
These are IK-345996:456 test/  IK-345996
These are IK-345996-456 test/ IK-345996
These are IK-345996-456AbcD test/ IK-345996
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35486006
try this...

REGEXP_SUBSTR(UPPER(F1) ,'IK[^ :.><,_-]*-?([^ :.><,_-]+)')
0
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

 

Author Comment

by:toooki
ID: 35486465
Thanks a lot.
I tested and it work for:

select
TRIM(REGEXP_SUBSTR(UPPER('IKTEST23TEST-191-FRAMEWORK-RELATED') ,'IK[^ :.><,_-]*-?([^ :.><,_-]+)')) from dual;
INPUT/OUTPUT:
IKTEST23TEST-191-FRAMEWORK-RELATED / IKTEST23TEST-191

But does not work for:
select
TRIM(REGEXP_SUBSTR(UPPER('IKTEST23TEST-191FRAMEWORK-RELATED') ,'IK[^ :.><,_-]*-?([^ :.><,_-]+)')) from dual;

I actually need:
INPUT/OUTPUT:
IKTEST23TEST-191FRAMEWORK-RELATED / IKTEST23TEST-191
(It needs to stop right after getting a non-numeric value (here "F") after the first "-" character in the first string starting with "IK")....


0
 
LVL 78

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35487380
See if this works.   If it doesn't please add records to the test and explain why it doesn't work.
drop table tab1 purge;
create table tab1 (col1 varchar(50));

insert into tab1 values('IK234-565');
insert into tab1 values('This is ik34 ff');
insert into tab1 values('These are ikval-3456 test');
insert into tab1 values('These are IKva-345996_456 test');
insert into tab1 values('These are IK-345996:456 test');
insert into tab1 values('These are IK-345996-456 test');
insert into tab1 values('IKTEST23TEST-191FRAMEWORK-RELATED');
commit;

select regexp_substr(upper(col1),
	'IK[-A-Z0-9]*[0-9]+'
)
from tab1;

Open in new window

0
 
LVL 78

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35488341
I was informed I had a bug in my previous post.  See if this is better:


select regexp_substr(upper(col1),
      'IK[A-Z0-9_]*-?[A-Z0-9]*[0-9]+'
)
from tab1;


0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35488372
REGEXP_SUBSTR(UPPER(col1) ,'IK[^ :.><,_-]*-?[0-9]+')
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35488389
oops, forgot the after dash part

REGEXP_SUBSTR(UPPER(col1) ,'IK[^ :.><,_-]*-?[^ :.><,_-]*[0-9]+')


if only letters and numbers are allowed then slightwv's   [A-Z0-9]  is simpler than [^ :.><,_-]
0
 
LVL 78

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35488415
Yes, please clarify the actual requirements.

For example, what is the expected results for:  
'IKTEST23@TEST-191FRAMEWORK-RELATED'
0
 

Author Comment

by:toooki
ID: 35488942
Many thanks to you all.
This is INPUT/OUTPUT
IKTEST23@TEST-191FRAMEWORK-RELATED / IKTEST23@TEST-191

slightwv's query I tried and that fixed the issue for the string I was having problem with:
select regexp_substr(upper(col1),
      'IK[-A-Z0-9]*[0-9]+'
)

I am trying this on other strings in the table to see if the values I get are as needed..
I am also trying other queries you sent.
0
 
LVL 78

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35490653
>>IKTEST23@TEST-191FRAMEWORK-RELATED / IKTEST23@TEST-191

Then sdstubers works and mine does not.

Mine does with a quick add of 'allowed' values:  'IK[-A-Z0-9@]*[0-9]+'

It all depends on which path you wish to go down:  include all 'allowed' or exclude all not-allowed.  That is basically the difference between the two methods.
0
 

Author Comment

by:toooki
ID: 35497728
Thank you all.

Yes that is right..

I tried:
TRIM(REGEXP_SUBSTR(UPPER(col1), 'IK[^ :.><,_-]*-?[^ :.><,_-]*[0-9]+'))
And that gives me the correct O/P for all inputs I had in the field in the table.

select regexp_substr(upper(col1),
      'IK[-A-Z0-9]*[0-9]+'
)

The above as giving a bit incorrect Output for Input strings like "IKTEST-12325-12345"
Was giving IKTEST-12325 but wanted "IKTEST-12325-12345"

For TRIM(REGEXP_SUBSTR(UPPER(col1), 'IK[^ :.><,_-]*-?[^ :.><,_-]*[0-9]+'))
I was trying to understand the meaning.
One question, which syntax in this query cause to exclude the pattern we put in it?
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35498381
'IK[^ :.><,_-]*-?[^ :.><,_-]*[0-9]+'

breaking it into pieces...


IK                = IK

[^ :.><,_-]* =  0 or more characters that are NOT one of these ":.><,_-"

-?                = 0 or 1  "-"

[0-9]+         = 1 or more digits
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35498382
you should ask the exclusion question as a new question
0
 
LVL 78

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35498494
>>The above as giving a bit incorrect Output for Input strings like "IKTEST-12325-12345"
>>Was giving IKTEST-12325 but wanted "IKTEST-12325-12345"

Your requirements keep changing.  In the original post you said "These are IK-345996-456 test/ IK-345996"

Which is it?

If you want:  IKTEST-12325-12345

Just add the '-' to the second part of mine:        'IK[A-Z0-9_]*-?[A-Z0-9-]*[0-9]+'

0
 

Author Comment

by:toooki
ID: 35498837
Thank you.

Thank you sdstuber and slightwv for the explanations. I now understand.

My apology. My original post was correct:
Needed INPUT/OUTPUT:
"These are IK-345996-456 test/ IK-345996"

It was my mistake in the post 35497728.
I wanted to mean just the opposite:
/* The above as giving a bit incorrect Output for Input strings like "IKTEST-12325-12345"
Was getting "IKTEST-12325-12345" but wanted "IKTEST-12325". Very sorry. */

So far
TRIM(REGEXP_SUBSTR(UPPER(col1), 'IK[^ :.><,_-]*-[^ :.><,_-]*[0-9]+'))
this worked for me . The "-" is always there in input so I made that mandatory. But I need a small change and  for that I submitted a new question...
http://www.experts-exchange.com/Database/Oracle/Q_26987344.html
0
 

Author Comment

by:toooki
ID: 35498869
Ok, thank you slightwv. Sorry I misunderstood. So I put the details here:

Needed this Input/Output:
This is the needed INPUT/OUTPOT:
These are IK-345996-456 test/ IK-345996
There are IK there areIK-345996-456 test/ IK-345996
sdcaIKsdIKtest-1234TTT / IKtest-1234

The query:
TRIM(REGEXP_SUBSTR(UPPER(col1), 'IK[^ :.><,_-]*-[^ :.><,_-]*[0-9]+'))
gives me correct O/P for all of the above except the last.
I needed to extract the string starting with IK before the first occurrence of "IK" before the "-" string and ending at the last occurrence of a number (0 to 9) after "-" character.

I tried:
TRIM(REGEXP_SUBSTR(UPPER(col1), 'IK[^ IK:.><,_-]*-[^ :.><,_-]*[0-9]+'))
that works for:
sdcaIKsdIKtest-1234TTT / IKtest-1234
But fails if the input is IKtesti-1234 or IKteskt-1234
0
 

Author Comment

by:toooki
ID: 35498875
In the last post I wanted to mean:
I needed to extract the string starting with IK with the first occurrence of "IK" before the "-" string and ending at the last occurrence of a number (0 to 9) after the "-" character.

0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35498902
if the requirements change they should be new questions.

I think this one has been answered hasn't it?
0
 
LVL 78

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35498918
Please add to my test case.

Insert new rows to the table as needed and post expected results.  This way we can tweak the expression and ensure all previous ones still work.  Also post why or why the row doesn't work.

Check out below

Here is the output from that run:

IK234-565
IK34
IKVAL-3456
IKVA-345996
IK-345996
IK-345996
IKTEST23TEST-191
IKTEST23TEST-191
IKTEST23TEST-191
IKTEST23
IK-345996
IKTEST-1234
IKTESTI-1234
IKTESKT-1234

14 rows selected.

Open in new window

drop table tab1 purge;
create table tab1 (col1 varchar(50));

insert into tab1 values('IK234-565');
insert into tab1 values('This is ik34 ff');
insert into tab1 values('These are ikval-3456 test');
insert into tab1 values('These are IKva-345996_456 test');
insert into tab1 values('These are IK-345996:456 test');
insert into tab1 values('These are IK-345996-456 test');
insert into tab1 values('IKTEST23TEST-191FRAMEWORK-RELATED');
insert into tab1 values('IKTEST23TEST-191FRAMEWORK-RELATED');
insert into tab1 values('IKTEST23TEST-191FRAMEWORK-RELATED');
insert into tab1 values('IKTEST23@TEST-191FRAMEWORK-RELATED');
insert into tab1 values('These are IK-345996-456 test/ IK-345996');
insert into tab1 values('There are IK there areIK-345996-456 test/ IK-345996');
insert into tab1 values('sdcaIKsdIKtest-1234TTT / IKtest-1234');
insert into tab1 values('IKtesti-1234');
insert into tab1 values('IKteskt-1234');
commit;

select regexp_replace(upper(col1),
	'(.*)(IK[A-Z0-9_]*-?[A-Z0-9]*[0-9]+)(.*)','\2'
)
from tab1;

Open in new window

0
 
LVL 78

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35498926
I disagree with sdstuber on asking a new question every time you change the requirements.  It could end up with 100 questions every time you find a new example that breaks.

What I do ask is to please make sure you provide us with as complete requirements as possible.

I feel that you are running what we provide, find a 'failure', the change the requirements.  Can you not go through all the examples and provide a complete list?
0
 
LVL 78

Assisted Solution

by:slightwv (䄆 Netminder)
slightwv (䄆 Netminder) earned 948 total points
ID: 35498950
Found a bug in my last code.  forgot the '@' example.

You could just as easily go with the 'everything except' method probably by tweaking the '[A-Z0-9_@]'

I'm just not clear on the final requirements and which approach is best:
either 'include all these' or 'exclude these'.

select id, regexp_replace(upper(col1),
      '(.*)(IK[A-Z0-9_@]*-?[A-Z0-9]*[0-9]+)(.*)','\2'
) newcol1
from tab1;

0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35499144
>>> with 100 questions every time you find a new example that breaks

that's fine

because I've already answered this question and the first exception and the followup.

It's time to close this one and if there are new questions (even if related) that's fine.
0
 

Author Comment

by:toooki
ID: 35500634
Thank you all.

select regexp_replace(upper(col1),
      '(.*)(IK[A-Z0-9_@]*-?[A-Z0-9]*[0-9]+)(.*)','\2'
)

The above looks to be working for most sample inputs I have.  And it does work on example:
INPUT/OUTPUT
sdcaIKsdIKtest-1234TTT / IKtest-1234


Previous solutions that you and sdstuber gave to me used REGEXP_SUBSTR but this is using REGEXP_REPLACE. Like in post 35498381 could you please write a line or two to explain this query, if possible?
Many thanks.

0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35500648
\2 is called a backreference

count parentheses from the left and the 2nd block is the substring that is referenced, in this case...   (IK[A-Z0-9_@]*-?[A-Z0-9]*[0-9]+)

the replace looks for the entire string as described above. and replaces it with the back referenced substring


the only significantly different from what was posted before is the bracketed lists

[A-Z0-9]  - this an INCLUSION list,  as opposed to [^ IK:.><,_-]  which is an EXCLUSION list  note the "^" as the first character.
0
 

Author Comment

by:toooki
ID: 35501074
Thanks a lot. That explanation really helps.

Sorry I still have a question.

All outputs are coming correctly.
Except..
If the INPUT does not match the pattern IK*number
Then it needs to return null...

Example (needed):
This is IK /
IK-1234 / IK-1234
IK1234 / IK1234
0
 
LVL 74

Accepted Solution

by:
sdstuber earned 952 total points
ID: 35501249
select
case where regexp_like(upper(col1),
      '(.*)(IK[A-Z0-9_@]*-?[A-Z0-9]*[0-9]+)(.*)') then  
regexp_replace(upper(col1),
      '(.*)(IK[A-Z0-9_@]*-?[A-Z0-9]*[0-9]+)(.*)','\2')
end
from....
0
 

Author Comment

by:toooki
ID: 35509565
Many thanks. It seems I am all ok with this solution.
Thanks!
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Instead of error trapping or hard-coding for non-updateable fields when using QODBC, let VBA automatically disable them when forms open. This way, users can view but not change the data. Part 1 explained how to use schema tables to do this. Part 2 h…
MSSQL DB-maintenance also needs implementation of multiple activities. However, unprecedented errors can hamper the database management. In that case, deploying Stellar SQL Database Toolkit ensures fast and accurate database and backup repair as wel…
This video shows how to recover a database from a user managed backup
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
Suggested Courses

834 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question