need Sql trick to strip out non-standard text in Oracle

I need to strip out "BY ARRANGEMENT" from text, but it's spelled differently :-(

Here are the variations
F2,T56,BY ARRANGEM
M5,T56,BY ARRANGEM
M6,T56,BY ARRANGEM
MTWR1256,FBYARRANG
MWF3,BYARRANGEMENT
MW6,BY ARRANGMENT
T56,BYARRANGEMENT
T56,F2,BYARRANGEME
TWF7,BY ARRANGEMN
TWF7,BYARRANGEMENT
TR34,BY ARRANGEMEN
TR56,BY ARRANGEMEN
TR10,BYARRANGEMENT
TR8,BYARRANGEMENT
TR9,BYARRANGEMENT
WF5,BYARRANGEMENT
T56,BYARRANGEMENT
BY ARRANGEMENT,F1
T56,BYARRANGEMENT
W6,BYARRANGEMENT


BY ARRANGEMENT
F2,BY ARRANGEMENT
F3,BY ARRANGEMENT
F4,BY ARRANGEMENT
M3, BY ARRANGEMENT
M5,BY ARRANGEMENT
M6,BY ARRANGEMENT
MF3,BY ARRANGEMENT
MF7,BY ARRANGEMENT
MW2,BY ARRANGEMENT
MW6,BY ARRANGEMENT
R12,BY ARRANGEMENT
R34,BY ARRANGEMENT
R56,BY ARRANGEMENT
T2,BY ARRANGEMENT
T34,BY ARRANGEMENT
T5,BY ARRANGEMENT
TF7,BY ARRANGEMENT
W2,BY ARRANGEMENT
W5,BY ARRANGEMENT
W7,BY ARRANGEMENT
WF2,BY ARRANGEMENT
WF3,BY ARRANGEMENT
T34,BY ARRANGEMENT
F1,BY ARRANGEMENT
T34,BY ARRANGEMENT
Gadsden ConsultingIT SpecialistAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

slightwv (䄆 Netminder) Commented:
Interesting...

I think I'm close but need clarification.

What is the expected output from:
 MTWR1256,FBYARRANG
0
sdstuberCommented:
Is it always the last string in the comma delimited list?

if so, try this

regexp_replace(str,'[^,]+$')

and if you need to remove the trailing comma when there is one try this

regexp_replace(str,',?[^,]+$')
0
slightwv (䄆 Netminder) Commented:
>>Is it always the last string in the comma delimited list?

Thought that myself until I found:
BY ARRANGEMENT,F1
BY ARRANGEMENT

Here is what I have:
select regexp_replace(col1,'([ ,]+)?BY[^,]+([ ,]+)?') from tab1;

It produces the following:

F2,T56
M5,T56
M6,T56
MTWR1256,F
MWF3
MW6
T56
T56,F2
TWF7
TWF7
TR34
TR56
TR10
TR8
TR9
WF5
T56
F1
T56
W6

F2
F3
F4
M3
M5
M6
MF3
MF7
MW2
MW6
R12
R34
R56
T2
T34
T5
TF7
W2
W5
W7
WF2
WF3
T34
F1
T34

Open in new window

0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

sdstuberCommented:
alternately, if you're looking for strings of  M,T,W,R,F  followed by digits then try this


regexp_substr(str,'([MWTRF]+[0-9]+,?)+')

and again, if you want to remove the trailing comma

rtrim(regexp_substr(str,'([MWTRF]+[0-9]+,?)+'),',')
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
slightwv (䄆 Netminder) Commented:
To add to my last post:  It might work until they misspell 'BY'...

It will fail on something like:
MI6,BI ARRANGEMENT
MI6,B Y ARRANGEMENT
0
sdstuberCommented:
I didn't see the "BY ARRANGEMT,F1"  - thanks

my first post would have handled the "BY ARRANGEMENT"  though.

My second post does them all (at least as far as I can tell)
0
Gadsden ConsultingIT SpecialistAuthor Commented:
ok, I'm working on it, thanks very much !

this is an outlier: MTWR1256,FBYARRANG - that's "Friday By Arrangement" but breaks the rules . . . :-(
0
Gadsden ConsultingIT SpecialistAuthor Commented:
>>select regexp_replace(col1,'([ ,]+)?BY[^,]+([ ,]+)?') from tab1;
- Whoa ! looks really good, checking now . . .
0
slightwv (䄆 Netminder) Commented:
Forget mine for now.

Look at what sdstuber posted in  http:#a40354505

I think that is more what you want and is much cleaner than mine.
0
Gadsden ConsultingIT SpecialistAuthor Commented:
sdstuber, thanks ! works perfectly !!!!

---
slightwv, you had all but the FBYArrangement, but that was slick too !
0
sdstuberCommented:
Since the part you want seems to have more rigid construction rules, it's a lot easier to search for that

here's a slight improvement on my last post, this explictly checks for the pattern at the beginning of the string and also correctly checks patterns like   M2W3 (bad)  vs M2,W3 (good)

REGEXP_SUBSTR(str, '^([MWTRF]+[0-9]+,)+([MWTRF]+[0-9]+)?')     -- keeping the trailing comma


REGEXP_SUBSTR(str, '^([MWTRF]+[0-9]+,)*[MWTRF]+[0-9]+')    -- removing the trailing comma
0
Gadsden ConsultingIT SpecialistAuthor Commented:
Thanks, my boss will be impressed !!!! :-)
0
sdstuberCommented:
you already identified one outlier   "MTWR1256,FBYARRANG"  - the expression in post 40354502 doesn't handle that correctly, at least I don't think so.  You didn't actually specify what the output should be for that one, so I guessed with mine.
0
sdstuberCommented:
The expressions produce different results - so they can't both be correct.
0
Gadsden ConsultingIT SpecialistAuthor Commented:
sdstuber,
>>also correctly checks patterns like   M2W3 (bad)  vs M2,W3 (good)
- actually M2W3 is valid, and your expression works --> F1R56 produces F1R56

I'm using regexp_substr(REPLACE(mtg_time,' ',''),'([MWTRFS]+[0-9]+,?)+')

which also produces correctly: MTWR1256,FBYARRANG = MTWR1256, which is correct.

I'm not worried about the comma.
0
slightwv (䄆 Netminder) Commented:
Gadsden Consulting,

As much as I appreciate the points I don't feel I deserve them.  With your permission I would like to unaccept this so you can award the points where they are due.
0
awking00Commented:
Just to add a kink to the solutions, if a record looked like 'BY34, BY ARRANGEM', neither solution works, although slightwv's could be modifed to - select regexp_replace(text,'([ ,]+)?BY ?[A-Z]+,?') from table to accommodate that scenario.
0
slightwv (䄆 Netminder) Commented:
>> if a record looked like 'BY34 ...

Not an issue.  Check the asker's previous question.  Allowed values are MTWRF

these are class periods, so M=Mon, T=Tues, R=Thurs, etc.

http://www.experts-exchange.com/Database/Oracle/Q_28528466.html
0
awking00Commented:
Whew! I didn't see all of those posts. So the 'MTWR1256,FBYARRANG' should produce just MTRW1256? Have you confirmed that only the letters M, W, T, R, F, or S can be found prior to the "...ARRANG..." segment?
0
sdstuberCommented:
even without precedent - I declared the criteria as part of my suggestions.  

If that assumption was not true it was on the asker to clarify that point.
I also stated the goal of my approach was to search for rigorous "inclusive" rules, rather than fuzzy "exclusive" rules.

So, whatever the inclusive part was, if it could be rigorously defined, then that was the way to go.
0
Gadsden ConsultingIT SpecialistAuthor Commented:
slightwv,
>>As much as I appreciate the points I don't feel I deserve them.  I would like to unaccept this so you can award the points where they are due.
- sure, but I like to give points when someone is making a significant contribution.

awking00, thanks for weighing in
>>BY34, BY ARRANGEM
-- this wouldn't happen (but it could if the user was having a really bad day . . . )
- the provided solution returns null, which is correct.

>> Have you confirmed that only the letters M, W, T, R, F, or S can be found prior to the "...ARRANG..." segment?
- yes, but the user could start typing with their nose, I suppose . . . the value isn't validated.

--------------
note: I was about to post another question to strip out characters and leave only numbers. That way, I can compare the numbers in the original meeting time, e.g.,  F2,T56,BYARRANGEM --> F2,T56,

then I said to myself, hey, why not try out your new skill . . . and voila
regexp_substr(mtg_time,'([0-9])+') works like a champ :-)

so MTWR1256,FBYARRANG = 1256 as does the stripped period MTWR1256,

and I am good-to-go . . .
0
sdstuberCommented:
don't use the () unless you have to insider regular expressions they define sub-expressions which are expensive to unnest

regexp_substr(mtg_time,'[0-9]+')
0
sdstuberCommented:
>>>  sure, but I like to give points when someone is making a significant contribution.

you can leave as is, or resplit with weighted points for example 300/200, 400/100  instead of 250/250
0
Gadsden ConsultingIT SpecialistAuthor Commented:
Oh, hope you don't mind a quickie . . .

SELECT regexp_substr('M78910,','([710])+') FROM dual;
- this returns 7, how come it doesn't return 710 ?

SELECT regexp_substr('M78910,','([0-8])+') FROM dual;
- this returns 78, how come it doesn't return 7810 ?
0
slightwv (䄆 Netminder) Commented:
([710])+ means 7 or 1 or 0 the "+" means one or more in series.

So, you are asking for 1 or more characters that are 7, 1 or 0 "together" in series.

In M78910, the first string that matches that pattern is the 7 (the 8 isn't in the list so it stops the match).

If you only want the 7's 1's and 0's 0 to remain:
SELECT regexp_replace('M78910,','[^710]+') FROM dual;

The '^ inside the square brackets means NOT.  So the replace says, replace everything that is NOT a 7, 1 or 0.
0
Gadsden ConsultingIT SpecialistAuthor Commented:
ok, thanks, slightwv. I'm still scratching my head on this a bit . . . will post other questions as needed.
0
slightwv (䄆 Netminder) Commented:
Keep this in mind when dealing with regular expressions:
It is all about patterns.

If you can find the pattern in what you want to do, you can write a regex to get it.

They are INCREDIBLY POWERFUL and as such, can be INCREDIBLY CONFUSING.

The other frustrating thing about them is different implementations allow different 'features'.

Not everything in the book I'm about to recommend can be used in Oracle's implementation but I found it a good book to learn about regular expressions.  Well, back when I actually purchased books...

I'm also partial to this publisher when it comes to tech books:
http://shop.oreilly.com/product/9780596528126.do
0
sdstuberCommented:
small correction/expansion on slightwv's explanation

'([710])+'

can be read as...

start expression
"(" - start a sub-expression    (you don't want this, as noted above)
"[" - start a character set
"710" - these are the characters in the character set
"]" - end the character set
")" - end the sub expression
"+" - repeat the previous pattern 1 or more times, the previous pattern is ([710])
end expression

the sub-expression in this usage does nothing for you except make your expression slower

regular expression parsers are generally very literal, so even though the () would be removed as a dummy operation in many languages, not so in a regexp.  Those () are instructions and the parser follows them even if they aren't helpful
0
Gadsden ConsultingIT SpecialistAuthor Commented:
slighttwv,
>>They are INCREDIBLY POWERFUL and as such, can be INCREDIBLY CONFUSING.
- agreed on the confusing part :-)

>>I'm also partial to this publisher when it comes to tech books:
- excellent suggestion, thx.

----
sdstuber,
great, thanks for the expln.
0
sdstuberCommented:
another great resource on regexps

http://www.regular-expressions.info/tutorial.html
0
Gadsden ConsultingIT SpecialistAuthor Commented:
ststuber,

thanks as well, a new trick for my toolbag . . .
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.