toooki
asked on
Oracle string extraction via regular expression
I am parsing an Oracle (11gR2) table field (varchar2 datatype) to extract some values with specific pattern.
I use this:
Select TRIM(REGEXP_SUBSTR(upper(F 1) ,'ik([^ :.><,_]+)',1,1,'i')) as F1_updated
These are the needed INPUT/OUTPUT:
IK234-565 / IK234-565
This is ik34 ff / ik34
These are ikval-3456 test / ikval-3456
These are IKva-345996_456 test/ IKva-345996
These are IK-345996:456 test/ IK-345996
These are IK-345996-456 test/ IK-345996
All above example works except the last one. It returns IK-345996-456 instead. But I need IK-345996. It needs to stop when it sees the second "-" char in the string. Is there any way to use the same function with some changes to achieve this?
I use this:
Select TRIM(REGEXP_SUBSTR(upper(F
These are the needed INPUT/OUTPUT:
IK234-565 / IK234-565
This is ik34 ff / ik34
These are ikval-3456 test / ikval-3456
These are IKva-345996_456 test/ IKva-345996
These are IK-345996:456 test/ IK-345996
These are IK-345996-456 test/ IK-345996
All above example works except the last one. It returns IK-345996-456 instead. But I need IK-345996. It needs to stop when it sees the second "-" char in the string. Is there any way to use the same function with some changes to achieve this?
your expected results don't reflect the functionality of forcing UPPER case
ASKER
Thank you.
Sorry, case is not that important but still wanted the upper case in the output. So the correct pairs are:
IK234-565 / IK234-565
This is ik34 ff / IK34
These are ikval-3456 test / IKVAL-3456
These are IKva-345996_456 test/ IKVA-345996
These are IK-345996:456 test/ IK-345996
These are IK-345996-456 test/ IK-345996
These are IK-345996-456AbcD test/ IK-345996
Sorry, case is not that important but still wanted the upper case in the output. So the correct pairs are:
IK234-565 / IK234-565
This is ik34 ff / IK34
These are ikval-3456 test / IKVAL-3456
These are IKva-345996_456 test/ IKVA-345996
These are IK-345996:456 test/ IK-345996
These are IK-345996-456 test/ IK-345996
These are IK-345996-456AbcD test/ IK-345996
try this...
REGEXP_SUBSTR(UPPER(F1) ,'IK[^ :.><,_-]*-?([^ :.><,_-]+)')
REGEXP_SUBSTR(UPPER(F1) ,'IK[^ :.><,_-]*-?([^ :.><,_-]+)')
ASKER
Thanks a lot.
I tested and it work for:
select
TRIM(REGEXP_SUBSTR(UPPER(' IKTEST23TE ST-191-FRA MEWORK-REL ATED') ,'IK[^ :.><,_-]*-?([^ :.><,_-]+)')) from dual;
INPUT/OUTPUT:
IKTEST23TEST-191-FRAMEWORK -RELATED / IKTEST23TEST-191
But does not work for:
select
TRIM(REGEXP_SUBSTR(UPPER(' IKTEST23TE ST-191FRAM EWORK-RELA TED') ,'IK[^ :.><,_-]*-?([^ :.><,_-]+)')) from dual;
I actually need:
INPUT/OUTPUT:
IKTEST23TEST-191FRAMEWORK- RELATED / IKTEST23TEST-191
(It needs to stop right after getting a non-numeric value (here "F") after the first "-" character in the first string starting with "IK")....
I tested and it work for:
select
TRIM(REGEXP_SUBSTR(UPPER('
INPUT/OUTPUT:
IKTEST23TEST-191-FRAMEWORK
But does not work for:
select
TRIM(REGEXP_SUBSTR(UPPER('
I actually need:
INPUT/OUTPUT:
IKTEST23TEST-191FRAMEWORK-
(It needs to stop right after getting a non-numeric value (here "F") after the first "-" character in the first string starting with "IK")....
See if this works. If it doesn't please add records to the test and explain why it doesn't work.
drop table tab1 purge;
create table tab1 (col1 varchar(50));
insert into tab1 values('IK234-565');
insert into tab1 values('This is ik34 ff');
insert into tab1 values('These are ikval-3456 test');
insert into tab1 values('These are IKva-345996_456 test');
insert into tab1 values('These are IK-345996:456 test');
insert into tab1 values('These are IK-345996-456 test');
insert into tab1 values('IKTEST23TEST-191FRAMEWORK-RELATED');
commit;
select regexp_substr(upper(col1),
'IK[-A-Z0-9]*[0-9]+'
)
from tab1;
I was informed I had a bug in my previous post. See if this is better:
select regexp_substr(upper(col1),
'IK[A-Z0-9_]*-?[A-Z0-9]*[0 -9]+'
)
from tab1;
select regexp_substr(upper(col1),
'IK[A-Z0-9_]*-?[A-Z0-9]*[0
)
from tab1;
REGEXP_SUBSTR(UPPER(col1) ,'IK[^ :.><,_-]*-?[0-9]+')
oops, forgot the after dash part
REGEXP_SUBSTR(UPPER(col1) ,'IK[^ :.><,_-]*-?[^ :.><,_-]*[0-9]+')
if only letters and numbers are allowed then slightwv's [A-Z0-9] is simpler than [^ :.><,_-]
REGEXP_SUBSTR(UPPER(col1) ,'IK[^ :.><,_-]*-?[^ :.><,_-]*[0-9]+')
if only letters and numbers are allowed then slightwv's [A-Z0-9] is simpler than [^ :.><,_-]
Yes, please clarify the actual requirements.
For example, what is the expected results for:
'IKTEST23@TEST-191FRAMEWOR K-RELATED'
For example, what is the expected results for:
'IKTEST23@TEST-191FRAMEWOR
ASKER
Many thanks to you all.
This is INPUT/OUTPUT
IKTEST23@TEST-191FRAMEWORK -RELATED / IKTEST23@TEST-191
slightwv's query I tried and that fixed the issue for the string I was having problem with:
select regexp_substr(upper(col1),
'IK[-A-Z0-9]*[0-9]+'
)
I am trying this on other strings in the table to see if the values I get are as needed..
I am also trying other queries you sent.
This is INPUT/OUTPUT
IKTEST23@TEST-191FRAMEWORK
slightwv's query I tried and that fixed the issue for the string I was having problem with:
select regexp_substr(upper(col1),
'IK[-A-Z0-9]*[0-9]+'
)
I am trying this on other strings in the table to see if the values I get are as needed..
I am also trying other queries you sent.
>>IKTEST23@TEST-191FRAMEWO RK-RELATED / IKTEST23@TEST-191
Then sdstubers works and mine does not.
Mine does with a quick add of 'allowed' values: 'IK[-A-Z0-9@]*[0-9]+'
It all depends on which path you wish to go down: include all 'allowed' or exclude all not-allowed. That is basically the difference between the two methods.
Then sdstubers works and mine does not.
Mine does with a quick add of 'allowed' values: 'IK[-A-Z0-9@]*[0-9]+'
It all depends on which path you wish to go down: include all 'allowed' or exclude all not-allowed. That is basically the difference between the two methods.
ASKER
Thank you all.
Yes that is right..
I tried:
TRIM(REGEXP_SUBSTR(UPPER(c ol1), 'IK[^ :.><,_-]*-?[^ :.><,_-]*[0-9]+'))
And that gives me the correct O/P for all inputs I had in the field in the table.
select regexp_substr(upper(col1),
'IK[-A-Z0-9]*[0-9]+'
)
The above as giving a bit incorrect Output for Input strings like "IKTEST-12325-12345"
Was giving IKTEST-12325 but wanted "IKTEST-12325-12345"
For TRIM(REGEXP_SUBSTR(UPPER(c ol1), 'IK[^ :.><,_-]*-?[^ :.><,_-]*[0-9]+'))
I was trying to understand the meaning.
One question, which syntax in this query cause to exclude the pattern we put in it?
Yes that is right..
I tried:
TRIM(REGEXP_SUBSTR(UPPER(c
And that gives me the correct O/P for all inputs I had in the field in the table.
select regexp_substr(upper(col1),
'IK[-A-Z0-9]*[0-9]+'
)
The above as giving a bit incorrect Output for Input strings like "IKTEST-12325-12345"
Was giving IKTEST-12325 but wanted "IKTEST-12325-12345"
For TRIM(REGEXP_SUBSTR(UPPER(c
I was trying to understand the meaning.
One question, which syntax in this query cause to exclude the pattern we put in it?
'IK[^ :.><,_-]*-?[^ :.><,_-]*[0-9]+'
breaking it into pieces...
IK = IK
[^ :.><,_-]* = 0 or more characters that are NOT one of these ":.><,_-"
-? = 0 or 1 "-"
[0-9]+ = 1 or more digits
breaking it into pieces...
IK = IK
[^ :.><,_-]* = 0 or more characters that are NOT one of these ":.><,_-"
-? = 0 or 1 "-"
[0-9]+ = 1 or more digits
you should ask the exclusion question as a new question
>>The above as giving a bit incorrect Output for Input strings like "IKTEST-12325-12345"
>>Was giving IKTEST-12325 but wanted "IKTEST-12325-12345"
Your requirements keep changing. In the original post you said "These are IK-345996-456 test/ IK-345996"
Which is it?
If you want: IKTEST-12325-12345
Just add the '-' to the second part of mine: 'IK[A-Z0-9_]*-?[A-Z0-9-]*[ 0-9]+'
>>Was giving IKTEST-12325 but wanted "IKTEST-12325-12345"
Your requirements keep changing. In the original post you said "These are IK-345996-456 test/ IK-345996"
Which is it?
If you want: IKTEST-12325-12345
Just add the '-' to the second part of mine: 'IK[A-Z0-9_]*-?[A-Z0-9-]*[
ASKER
Thank you.
Thank you sdstuber and slightwv for the explanations. I now understand.
My apology. My original post was correct:
Needed INPUT/OUTPUT:
"These are IK-345996-456 test/ IK-345996"
It was my mistake in the post 35497728.
I wanted to mean just the opposite:
/* The above as giving a bit incorrect Output for Input strings like "IKTEST-12325-12345"
Was getting "IKTEST-12325-12345" but wanted "IKTEST-12325". Very sorry. */
So far
TRIM(REGEXP_SUBSTR(UPPER(c ol1), 'IK[^ :.><,_-]*-[^ :.><,_-]*[0-9]+'))
this worked for me . The "-" is always there in input so I made that mandatory. But I need a small change and for that I submitted a new question...
https://www.experts-exchange.com/questions/26987344/Oracle-string-extraction-via-regular-expression.html
Thank you sdstuber and slightwv for the explanations. I now understand.
My apology. My original post was correct:
Needed INPUT/OUTPUT:
"These are IK-345996-456 test/ IK-345996"
It was my mistake in the post 35497728.
I wanted to mean just the opposite:
/* The above as giving a bit incorrect Output for Input strings like "IKTEST-12325-12345"
Was getting "IKTEST-12325-12345" but wanted "IKTEST-12325". Very sorry. */
So far
TRIM(REGEXP_SUBSTR(UPPER(c
this worked for me . The "-" is always there in input so I made that mandatory. But I need a small change and for that I submitted a new question...
https://www.experts-exchange.com/questions/26987344/Oracle-string-extraction-via-regular-expression.html
ASKER
Ok, thank you slightwv. Sorry I misunderstood. So I put the details here:
Needed this Input/Output:
This is the needed INPUT/OUTPOT:
These are IK-345996-456 test/ IK-345996
There are IK there areIK-345996-456 test/ IK-345996
sdcaIKsdIKtest-1234TTT / IKtest-1234
The query:
TRIM(REGEXP_SUBSTR(UPPER(c ol1), 'IK[^ :.><,_-]*-[^ :.><,_-]*[0-9]+'))
gives me correct O/P for all of the above except the last.
I needed to extract the string starting with IK before the first occurrence of "IK" before the "-" string and ending at the last occurrence of a number (0 to 9) after "-" character.
I tried:
TRIM(REGEXP_SUBSTR(UPPER(c ol1), 'IK[^ IK:.><,_-]*-[^ :.><,_-]*[0-9]+'))
that works for:
sdcaIKsdIKtest-1234TTT / IKtest-1234
But fails if the input is IKtesti-1234 or IKteskt-1234
Needed this Input/Output:
This is the needed INPUT/OUTPOT:
These are IK-345996-456 test/ IK-345996
There are IK there areIK-345996-456 test/ IK-345996
sdcaIKsdIKtest-1234TTT / IKtest-1234
The query:
TRIM(REGEXP_SUBSTR(UPPER(c
gives me correct O/P for all of the above except the last.
I needed to extract the string starting with IK before the first occurrence of "IK" before the "-" string and ending at the last occurrence of a number (0 to 9) after "-" character.
I tried:
TRIM(REGEXP_SUBSTR(UPPER(c
that works for:
sdcaIKsdIKtest-1234TTT / IKtest-1234
But fails if the input is IKtesti-1234 or IKteskt-1234
ASKER
In the last post I wanted to mean:
I needed to extract the string starting with IK with the first occurrence of "IK" before the "-" string and ending at the last occurrence of a number (0 to 9) after the "-" character.
I needed to extract the string starting with IK with the first occurrence of "IK" before the "-" string and ending at the last occurrence of a number (0 to 9) after the "-" character.
if the requirements change they should be new questions.
I think this one has been answered hasn't it?
I think this one has been answered hasn't it?
Please add to my test case.
Insert new rows to the table as needed and post expected results. This way we can tweak the expression and ensure all previous ones still work. Also post why or why the row doesn't work.
Check out below
Insert new rows to the table as needed and post expected results. This way we can tweak the expression and ensure all previous ones still work. Also post why or why the row doesn't work.
Check out below
Here is the output from that run:
IK234-565
IK34
IKVAL-3456
IKVA-345996
IK-345996
IK-345996
IKTEST23TEST-191
IKTEST23TEST-191
IKTEST23TEST-191
IKTEST23
IK-345996
IKTEST-1234
IKTESTI-1234
IKTESKT-1234
14 rows selected.
drop table tab1 purge;
create table tab1 (col1 varchar(50));
insert into tab1 values('IK234-565');
insert into tab1 values('This is ik34 ff');
insert into tab1 values('These are ikval-3456 test');
insert into tab1 values('These are IKva-345996_456 test');
insert into tab1 values('These are IK-345996:456 test');
insert into tab1 values('These are IK-345996-456 test');
insert into tab1 values('IKTEST23TEST-191FRAMEWORK-RELATED');
insert into tab1 values('IKTEST23TEST-191FRAMEWORK-RELATED');
insert into tab1 values('IKTEST23TEST-191FRAMEWORK-RELATED');
insert into tab1 values('IKTEST23@TEST-191FRAMEWORK-RELATED');
insert into tab1 values('These are IK-345996-456 test/ IK-345996');
insert into tab1 values('There are IK there areIK-345996-456 test/ IK-345996');
insert into tab1 values('sdcaIKsdIKtest-1234TTT / IKtest-1234');
insert into tab1 values('IKtesti-1234');
insert into tab1 values('IKteskt-1234');
commit;
select regexp_replace(upper(col1),
'(.*)(IK[A-Z0-9_]*-?[A-Z0-9]*[0-9]+)(.*)','\2'
)
from tab1;
I disagree with sdstuber on asking a new question every time you change the requirements. It could end up with 100 questions every time you find a new example that breaks.
What I do ask is to please make sure you provide us with as complete requirements as possible.
I feel that you are running what we provide, find a 'failure', the change the requirements. Can you not go through all the examples and provide a complete list?
What I do ask is to please make sure you provide us with as complete requirements as possible.
I feel that you are running what we provide, find a 'failure', the change the requirements. Can you not go through all the examples and provide a complete list?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
>>> with 100 questions every time you find a new example that breaks
that's fine
because I've already answered this question and the first exception and the followup.
It's time to close this one and if there are new questions (even if related) that's fine.
that's fine
because I've already answered this question and the first exception and the followup.
It's time to close this one and if there are new questions (even if related) that's fine.
ASKER
Thank you all.
select regexp_replace(upper(col1) ,
'(.*)(IK[A-Z0-9_@]*-?[A-Z0 -9]*[0-9]+ )(.*)','\2 '
)
The above looks to be working for most sample inputs I have. And it does work on example:
INPUT/OUTPUT
sdcaIKsdIKtest-1234TTT / IKtest-1234
Previous solutions that you and sdstuber gave to me used REGEXP_SUBSTR but this is using REGEXP_REPLACE. Like in post 35498381 could you please write a line or two to explain this query, if possible?
Many thanks.
select regexp_replace(upper(col1)
'(.*)(IK[A-Z0-9_@]*-?[A-Z0
)
The above looks to be working for most sample inputs I have. And it does work on example:
INPUT/OUTPUT
sdcaIKsdIKtest-1234TTT / IKtest-1234
Previous solutions that you and sdstuber gave to me used REGEXP_SUBSTR but this is using REGEXP_REPLACE. Like in post 35498381 could you please write a line or two to explain this query, if possible?
Many thanks.
\2 is called a backreference
count parentheses from the left and the 2nd block is the substring that is referenced, in this case... (IK[A-Z0-9_@]*-?[A-Z0-9]*[ 0-9]+)
the replace looks for the entire string as described above. and replaces it with the back referenced substring
the only significantly different from what was posted before is the bracketed lists
[A-Z0-9] - this an INCLUSION list, as opposed to [^ IK:.><,_-] which is an EXCLUSION list note the "^" as the first character.
count parentheses from the left and the 2nd block is the substring that is referenced, in this case... (IK[A-Z0-9_@]*-?[A-Z0-9]*[
the replace looks for the entire string as described above. and replaces it with the back referenced substring
the only significantly different from what was posted before is the bracketed lists
[A-Z0-9] - this an INCLUSION list, as opposed to [^ IK:.><,_-] which is an EXCLUSION list note the "^" as the first character.
ASKER
Thanks a lot. That explanation really helps.
Sorry I still have a question.
All outputs are coming correctly.
Except..
If the INPUT does not match the pattern IK*number
Then it needs to return null...
Example (needed):
This is IK /
IK-1234 / IK-1234
IK1234 / IK1234
Sorry I still have a question.
All outputs are coming correctly.
Except..
If the INPUT does not match the pattern IK*number
Then it needs to return null...
Example (needed):
This is IK /
IK-1234 / IK-1234
IK1234 / IK1234
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Many thanks. It seems I am all ok with this solution.
Thanks!
Thanks!