Solved

regexp_replace - further filtering needed

Posted on 2011-03-08
18
489 Views
Last Modified: 2012-05-11
I'm currently using:

regexp_replace(ee.sku,' ?\([^)]+\)') as sku_cleansed

which makes the string before:
MC HDMIB 2m (6.56 ft.)

look like:
MC HDMIB 2m

Is there a way to further filter the "2m"  - or anything matching: ##.##m ? I would also want to remove "12.4m"  (and ft.).

Thanks!
0
Comment
Question by:globalwm2
  • 10
  • 8
18 Comments
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35073448
Why did you close your previous question if that part of the requirement wasn't complete?

It is also why I asked what I did in:
http://www.experts-exchange.com/Database/Oracle/Q_26869077.html?#a35060315

We need to know more about the entire set of 'good' and 'bad' data or we'll solve one then you'll need to continue posting new questions.

For example can the following be in the data:  'QWE 2m' and you only want QWE?
0
 

Author Comment

by:globalwm2
ID: 35073506
Sorry Expert, at the time in my validation it was workable. Then the specs changed from the Business Unit. Originally we thought we satisfied the requirement.

Back to your q:

yes, want to strip of references to meters and feet (measurements).
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35073693
>>Then the specs changed

removing "2m" was part of the original question "would also like to strip out the "0.6m" if possible too."

>>meters and feet (measurements).

So the Feet will ALWAYS be in paranthesis.  Meters will ALWAYS be numbers, a possible decimal point, no space and a lowercase m ( ###.###m ).

You can never have 1,000.98m?

It's also safe to remove ANYTHING in that format or will the meters always appear to the left of the feet in parans?
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35073745
Try this:

with mytab as (
select 'Chocolate 0.6m (1.96 ft.) Braided' myCol from dual
union all
select 'MC HDMIB 2m (6.56 ft.)' from dual
)
select regexp_replace(regexp_replace(myCol,'\(.*\)|[0-9]*.[0-9]+m',''),'[ ]+',' ') from mytab
/
0
 

Author Comment

by:globalwm2
ID: 35073770
I have all kinds of information within the parens - here are some examples:

AP4N Bamboo (Pr)
SoundWare White (Ea)
Pearl 0.6m (1.96 ft.) Black/White
Solo HD (PRODUCT)RED
SoundDock (R) Portable Black
ML MTH (2 pr)
FPTAG01(U)
CFBFAG (2 Pr)

but when the parens has "ft" - then preceding it is some metric measurement like '0.6m'

0
 

Author Comment

by:globalwm2
ID: 35073795
That looks promising.

So, as a newbie here -- you suggesting I don't close this until I complete my script?

I had posted some examples above while you were also replying....

Sorry, just trying to be fair and not have an endless string of comments...

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35073828
'ft' with no space ALWAYS?

Also your sample data says 'ft.'.  Is it 'ft' or 'ft.' or both?

Try this one

with mytab as (
select 'Chocolate 0.6m (1.96 ft.) Braided' myCol from dual
union all
select 'MC HDMIB 2m (6.56 ft.)' from dual
union all
select 'AP4N Bamboo (Pr)' from dual
union all
select 'SoundWare White (Ea)' from dual
union all
select 'Pearl 0.6m (1.96 ft.) Black/White' from dual
union all
select 'Solo HD (PRODUCT)RED' from dual
union all
select 'SoundDock (R) Portable Black' from dual
union all
select 'ML MTH (2 pr)' from dual
union all
select 'FPTAG01(U)' from dual
union all
select 'CFBFAG (2 Pr)' from dual
)
select regexp_replace(regexp_replace(myCol,'\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m',''),'[ ]+',' ') from mytab
/

Open in new window

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35073879
>>So, as a newbie here -- you suggesting I don't close this until I complete my script?

First: Welcome to Experts-Exchange!

Close it when you feel the question asked, has been answered.  If we didn't answer it, then you shouldn't close.  If you aren't getting the response you feel you should, click the 'Request Attention' link above and a Moderator will do everything they can to find you the right Experts.

Occasionally people will ask more than one question per question and the Experts will ask you to close the current one and as a related one.  That's OK.  

You should just try to not close it until you are satisfied it is fully answered.  Asking to closely related questions might cause you to violate the rules of 500 points for a single question.  You might get a warning from a Moderator or Zone Advisor and as long as you do not abuse it, everything will be fine.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35073917
I should add:  that it is perfectly fine to ask related questions like:
In previous question <paste link here> I thought it was ZZZ but now I find it's XXX and need further assistance.

Just try to not make a habit out of it.  ;)

You can also ask related questions like:
In previous question <paste link here> my question was answered but an Expert also mentioned ZZZ and I would like to get additional information.  Can someone provide any information on ZZZ?
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 

Author Comment

by:globalwm2
ID: 35073955
Thank you!

> Occasionally people will ask more than one question per question and the Experts will ask you to close the current one and as a related one.  That's OK.  

ok...  I was trying to be aware of that not wanting to keep "piling on" so to speak.  :)

Thanks for the welcome!

Back to your previous response, so I may learn to "speak" regexp - can you explain your string:

regexp_replace(regexp_replace(myCol,'\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m',''),'[ ]+',' ')

??

0
 

Author Comment

by:globalwm2
ID: 35073993
Oh... got it!

> ou can also ask related questions like:
In previous question <paste link here> my question was answered but an Expert also mentioned ZZZ and I would like to get additional information.  Can someone provide any information on ZZZ?
0
 

Author Comment

by:globalwm2
ID: 35074044
I just modified your line to be:

select regexp_replace(regexp_replace(regexp_replace(myCol,'\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m',''),'[ ]+',' '),' ?\([^)]+\)') from mytab

(combined with the first statement)

Looking good - I need to incorporate into my script and further validate. brb
0
 

Author Comment

by:globalwm2
ID: 35074125
One refinement I see is when the column begins with a "mm" value - for example:  

67mm UV Haze is becoming: m UV Haze

EF 70200mm f/2.8L IS II USM is becoming: EFm f/2.8L IS II USM

Is there a way to 'delay' the replacement or only replace when a '(' character is found?
0
 

Author Comment

by:globalwm2
ID: 35074377
Okay, looks like I'm not going to have to go this deep anymore.

So far, they are liking the results with just:

regexp_replace(ee.sku,' ?\([^)]+\)') as sku_cleansed

0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 35074732
Again, this is why we ask for the possibilities/examples.

Regex is a powerful pattern matching program.  Once you define the pattern(s), it's just coming up expression to meet that pattern.
0
 

Author Comment

by:globalwm2
ID: 35074778
It was a good learning event nonetheless.  Thank you for your help!

I'm going to go with your statement containing:

select regexp_replace(regexp_replace(myCol,'\(.*\)|[0-9]*.[0-9]+m',''),'[ ]+',' ') from mytab

as a correct response to what I originally asked for?
0
 
LVL 76

Accepted Solution

by:
slightwv (䄆 Netminder) earned 125 total points
ID: 35074941
Accept whichever post or posts helped you get the answer.  Since this question really had no answer, you can award points for the learning experience.

>>One refinement I see is when the column begins with a "mm" value - for example:  

For what it's worth, you can get around this with some 'space' tricks (may not work in all instances but appears to for the current test cases).

You keep throwing them out, we'll try to keep tweaking the regex or close it out.  It's up to you.  We'll play as long as you want (maybe...).


with mytab as (
select 'Chocolate 0.6m (1.96 ft.) Braided' myCol from dual
union all
select 'MC HDMIB 2m (6.56 ft.)' from dual
union all
select 'AP4N Bamboo (Pr)' from dual
union all
select 'SoundWare White (Ea)' from dual
union all
select 'Pearl 0.6m (1.96 ft.) Black/White' from dual
union all
select 'Solo HD (PRODUCT)RED' from dual
union all
select 'SoundDock (R) Portable Black' from dual
union all
select 'ML MTH (2 pr)' from dual
union all
select 'FPTAG01(U)' from dual
union all
select 'CFBFAG (2 Pr)' from dual
union all
select '67mm UV Haze is becoming: m UV Haze' from dual
union all
select 'EF 70200mm f/2.8L IS II USM is becoming: EFm f/2.8L IS II USM' from dual
)
select trim(regexp_replace(regexp_replace(myCol || ' ','\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m ',''),'[ ]+',' ')) from mytab
/

Open in new window

0
 

Author Closing Comment

by:globalwm2
ID: 35074974
Pts awarded on overall learning experience with reg expressions and EE etiquette.

Thank you!
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Suggested Solutions

Cursors in Oracle: A cursor is used to process individual rows returned by database system for a query. In oracle every SQL statement executed by the oracle server has a private area. This area contains information about the SQL statement and the…
If you find yourself in this situation “I have used SELECT DISTINCT but I’m getting duplicates” then I'm sorry to say you are using the wrong SQL technique as it only does one thing which is: produces whole rows that are unique. If the results you a…
This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now