Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 504
  • Last Modified:

regexp_replace - further filtering needed

I'm currently using:

regexp_replace(ee.sku,' ?\([^)]+\)') as sku_cleansed

which makes the string before:
MC HDMIB 2m (6.56 ft.)

look like:
MC HDMIB 2m

Is there a way to further filter the "2m"  - or anything matching: ##.##m ? I would also want to remove "12.4m"  (and ft.).

Thanks!
0
globalwm2
Asked:
globalwm2
  • 10
  • 8
1 Solution
 
slightwv (䄆 Netminder) Commented:
Why did you close your previous question if that part of the requirement wasn't complete?

It is also why I asked what I did in:
http://www.experts-exchange.com/Database/Oracle/Q_26869077.html?#a35060315

We need to know more about the entire set of 'good' and 'bad' data or we'll solve one then you'll need to continue posting new questions.

For example can the following be in the data:  'QWE 2m' and you only want QWE?
0
 
globalwm2Author Commented:
Sorry Expert, at the time in my validation it was workable. Then the specs changed from the Business Unit. Originally we thought we satisfied the requirement.

Back to your q:

yes, want to strip of references to meters and feet (measurements).
0
 
slightwv (䄆 Netminder) Commented:
>>Then the specs changed

removing "2m" was part of the original question "would also like to strip out the "0.6m" if possible too."

>>meters and feet (measurements).

So the Feet will ALWAYS be in paranthesis.  Meters will ALWAYS be numbers, a possible decimal point, no space and a lowercase m ( ###.###m ).

You can never have 1,000.98m?

It's also safe to remove ANYTHING in that format or will the meters always appear to the left of the feet in parans?
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
slightwv (䄆 Netminder) Commented:
Try this:

with mytab as (
select 'Chocolate 0.6m (1.96 ft.) Braided' myCol from dual
union all
select 'MC HDMIB 2m (6.56 ft.)' from dual
)
select regexp_replace(regexp_replace(myCol,'\(.*\)|[0-9]*.[0-9]+m',''),'[ ]+',' ') from mytab
/
0
 
globalwm2Author Commented:
I have all kinds of information within the parens - here are some examples:

AP4N Bamboo (Pr)
SoundWare White (Ea)
Pearl 0.6m (1.96 ft.) Black/White
Solo HD (PRODUCT)RED
SoundDock (R) Portable Black
ML MTH (2 pr)
FPTAG01(U)
CFBFAG (2 Pr)

but when the parens has "ft" - then preceding it is some metric measurement like '0.6m'

0
 
globalwm2Author Commented:
That looks promising.

So, as a newbie here -- you suggesting I don't close this until I complete my script?

I had posted some examples above while you were also replying....

Sorry, just trying to be fair and not have an endless string of comments...

0
 
slightwv (䄆 Netminder) Commented:
'ft' with no space ALWAYS?

Also your sample data says 'ft.'.  Is it 'ft' or 'ft.' or both?

Try this one

with mytab as (
select 'Chocolate 0.6m (1.96 ft.) Braided' myCol from dual
union all
select 'MC HDMIB 2m (6.56 ft.)' from dual
union all
select 'AP4N Bamboo (Pr)' from dual
union all
select 'SoundWare White (Ea)' from dual
union all
select 'Pearl 0.6m (1.96 ft.) Black/White' from dual
union all
select 'Solo HD (PRODUCT)RED' from dual
union all
select 'SoundDock (R) Portable Black' from dual
union all
select 'ML MTH (2 pr)' from dual
union all
select 'FPTAG01(U)' from dual
union all
select 'CFBFAG (2 Pr)' from dual
)
select regexp_replace(regexp_replace(myCol,'\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m',''),'[ ]+',' ') from mytab
/

Open in new window

0
 
slightwv (䄆 Netminder) Commented:
>>So, as a newbie here -- you suggesting I don't close this until I complete my script?

First: Welcome to Experts-Exchange!

Close it when you feel the question asked, has been answered.  If we didn't answer it, then you shouldn't close.  If you aren't getting the response you feel you should, click the 'Request Attention' link above and a Moderator will do everything they can to find you the right Experts.

Occasionally people will ask more than one question per question and the Experts will ask you to close the current one and as a related one.  That's OK.  

You should just try to not close it until you are satisfied it is fully answered.  Asking to closely related questions might cause you to violate the rules of 500 points for a single question.  You might get a warning from a Moderator or Zone Advisor and as long as you do not abuse it, everything will be fine.
0
 
slightwv (䄆 Netminder) Commented:
I should add:  that it is perfectly fine to ask related questions like:
In previous question <paste link here> I thought it was ZZZ but now I find it's XXX and need further assistance.

Just try to not make a habit out of it.  ;)

You can also ask related questions like:
In previous question <paste link here> my question was answered but an Expert also mentioned ZZZ and I would like to get additional information.  Can someone provide any information on ZZZ?
0
 
globalwm2Author Commented:
Thank you!

> Occasionally people will ask more than one question per question and the Experts will ask you to close the current one and as a related one.  That's OK.  

ok...  I was trying to be aware of that not wanting to keep "piling on" so to speak.  :)

Thanks for the welcome!

Back to your previous response, so I may learn to "speak" regexp - can you explain your string:

regexp_replace(regexp_replace(myCol,'\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m',''),'[ ]+',' ')

??

0
 
globalwm2Author Commented:
Oh... got it!

> ou can also ask related questions like:
In previous question <paste link here> my question was answered but an Expert also mentioned ZZZ and I would like to get additional information.  Can someone provide any information on ZZZ?
0
 
globalwm2Author Commented:
I just modified your line to be:

select regexp_replace(regexp_replace(regexp_replace(myCol,'\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m',''),'[ ]+',' '),' ?\([^)]+\)') from mytab

(combined with the first statement)

Looking good - I need to incorporate into my script and further validate. brb
0
 
globalwm2Author Commented:
One refinement I see is when the column begins with a "mm" value - for example:  

67mm UV Haze is becoming: m UV Haze

EF 70200mm f/2.8L IS II USM is becoming: EFm f/2.8L IS II USM

Is there a way to 'delay' the replacement or only replace when a '(' character is found?
0
 
globalwm2Author Commented:
Okay, looks like I'm not going to have to go this deep anymore.

So far, they are liking the results with just:

regexp_replace(ee.sku,' ?\([^)]+\)') as sku_cleansed

0
 
slightwv (䄆 Netminder) Commented:
Again, this is why we ask for the possibilities/examples.

Regex is a powerful pattern matching program.  Once you define the pattern(s), it's just coming up expression to meet that pattern.
0
 
globalwm2Author Commented:
It was a good learning event nonetheless.  Thank you for your help!

I'm going to go with your statement containing:

select regexp_replace(regexp_replace(myCol,'\(.*\)|[0-9]*.[0-9]+m',''),'[ ]+',' ') from mytab

as a correct response to what I originally asked for?
0
 
slightwv (䄆 Netminder) Commented:
Accept whichever post or posts helped you get the answer.  Since this question really had no answer, you can award points for the learning experience.

>>One refinement I see is when the column begins with a "mm" value - for example:  

For what it's worth, you can get around this with some 'space' tricks (may not work in all instances but appears to for the current test cases).

You keep throwing them out, we'll try to keep tweaking the regex or close it out.  It's up to you.  We'll play as long as you want (maybe...).


with mytab as (
select 'Chocolate 0.6m (1.96 ft.) Braided' myCol from dual
union all
select 'MC HDMIB 2m (6.56 ft.)' from dual
union all
select 'AP4N Bamboo (Pr)' from dual
union all
select 'SoundWare White (Ea)' from dual
union all
select 'Pearl 0.6m (1.96 ft.) Black/White' from dual
union all
select 'Solo HD (PRODUCT)RED' from dual
union all
select 'SoundDock (R) Portable Black' from dual
union all
select 'ML MTH (2 pr)' from dual
union all
select 'FPTAG01(U)' from dual
union all
select 'CFBFAG (2 Pr)' from dual
union all
select '67mm UV Haze is becoming: m UV Haze' from dual
union all
select 'EF 70200mm f/2.8L IS II USM is becoming: EFm f/2.8L IS II USM' from dual
)
select trim(regexp_replace(regexp_replace(myCol || ' ','\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m ',''),'[ ]+',' ')) from mytab
/

Open in new window

0
 
globalwm2Author Commented:
Pts awarded on overall learning experience with reg expressions and EE etiquette.

Thank you!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 10
  • 8
Tackle projects and never again get stuck behind a technical roadblock.
Join Now