Link to home
Start Free TrialLog in
Avatar of globalwm2
globalwm2Flag for United States of America

asked on

regexp_replace - further filtering needed

I'm currently using:

regexp_replace(ee.sku,' ?\([^)]+\)') as sku_cleansed

which makes the string before:
MC HDMIB 2m (6.56 ft.)

look like:
MC HDMIB 2m

Is there a way to further filter the "2m"  - or anything matching: ##.##m ? I would also want to remove "12.4m"  (and ft.).

Thanks!
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Why did you close your previous question if that part of the requirement wasn't complete?

It is also why I asked what I did in:
https://www.experts-exchange.com/questions/26869077/SQL-parse-parens-and-contents-in-between.html?&anchorAnswerId=35060315#a35060315

We need to know more about the entire set of 'good' and 'bad' data or we'll solve one then you'll need to continue posting new questions.

For example can the following be in the data:  'QWE 2m' and you only want QWE?
Avatar of globalwm2

ASKER

Sorry Expert, at the time in my validation it was workable. Then the specs changed from the Business Unit. Originally we thought we satisfied the requirement.

Back to your q:

yes, want to strip of references to meters and feet (measurements).
>>Then the specs changed

removing "2m" was part of the original question "would also like to strip out the "0.6m" if possible too."

>>meters and feet (measurements).

So the Feet will ALWAYS be in paranthesis.  Meters will ALWAYS be numbers, a possible decimal point, no space and a lowercase m ( ###.###m ).

You can never have 1,000.98m?

It's also safe to remove ANYTHING in that format or will the meters always appear to the left of the feet in parans?
Try this:

with mytab as (
select 'Chocolate 0.6m (1.96 ft.) Braided' myCol from dual
union all
select 'MC HDMIB 2m (6.56 ft.)' from dual
)
select regexp_replace(regexp_replace(myCol,'\(.*\)|[0-9]*.[0-9]+m',''),'[ ]+',' ') from mytab
/
I have all kinds of information within the parens - here are some examples:

AP4N Bamboo (Pr)
SoundWare White (Ea)
Pearl 0.6m (1.96 ft.) Black/White
Solo HD (PRODUCT)RED
SoundDock (R) Portable Black
ML MTH (2 pr)
FPTAG01(U)
CFBFAG (2 Pr)

but when the parens has "ft" - then preceding it is some metric measurement like '0.6m'

That looks promising.

So, as a newbie here -- you suggesting I don't close this until I complete my script?

I had posted some examples above while you were also replying....

Sorry, just trying to be fair and not have an endless string of comments...

'ft' with no space ALWAYS?

Also your sample data says 'ft.'.  Is it 'ft' or 'ft.' or both?

Try this one

with mytab as (
select 'Chocolate 0.6m (1.96 ft.) Braided' myCol from dual
union all
select 'MC HDMIB 2m (6.56 ft.)' from dual
union all
select 'AP4N Bamboo (Pr)' from dual
union all
select 'SoundWare White (Ea)' from dual
union all
select 'Pearl 0.6m (1.96 ft.) Black/White' from dual
union all
select 'Solo HD (PRODUCT)RED' from dual
union all
select 'SoundDock (R) Portable Black' from dual
union all
select 'ML MTH (2 pr)' from dual
union all
select 'FPTAG01(U)' from dual
union all
select 'CFBFAG (2 Pr)' from dual
)
select regexp_replace(regexp_replace(myCol,'\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m',''),'[ ]+',' ') from mytab
/

Open in new window

>>So, as a newbie here -- you suggesting I don't close this until I complete my script?

First: Welcome to Experts-Exchange!

Close it when you feel the question asked, has been answered.  If we didn't answer it, then you shouldn't close.  If you aren't getting the response you feel you should, click the 'Request Attention' link above and a Moderator will do everything they can to find you the right Experts.

Occasionally people will ask more than one question per question and the Experts will ask you to close the current one and as a related one.  That's OK.  

You should just try to not close it until you are satisfied it is fully answered.  Asking to closely related questions might cause you to violate the rules of 500 points for a single question.  You might get a warning from a Moderator or Zone Advisor and as long as you do not abuse it, everything will be fine.
I should add:  that it is perfectly fine to ask related questions like:
In previous question <paste link here> I thought it was ZZZ but now I find it's XXX and need further assistance.

Just try to not make a habit out of it.  ;)

You can also ask related questions like:
In previous question <paste link here> my question was answered but an Expert also mentioned ZZZ and I would like to get additional information.  Can someone provide any information on ZZZ?
Thank you!

> Occasionally people will ask more than one question per question and the Experts will ask you to close the current one and as a related one.  That's OK.  

ok...  I was trying to be aware of that not wanting to keep "piling on" so to speak.  :)

Thanks for the welcome!

Back to your previous response, so I may learn to "speak" regexp - can you explain your string:

regexp_replace(regexp_replace(myCol,'\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m',''),'[ ]+',' ')

??

Oh... got it!

> ou can also ask related questions like:
In previous question <paste link here> my question was answered but an Expert also mentioned ZZZ and I would like to get additional information.  Can someone provide any information on ZZZ?
I just modified your line to be:

select regexp_replace(regexp_replace(regexp_replace(myCol,'\([0-9]*\.[0-9]+ ?ft\.?\)|[0-9]*.[0-9]+m',''),'[ ]+',' '),' ?\([^)]+\)') from mytab

(combined with the first statement)

Looking good - I need to incorporate into my script and further validate. brb
One refinement I see is when the column begins with a "mm" value - for example:  

67mm UV Haze is becoming: m UV Haze

EF 70200mm f/2.8L IS II USM is becoming: EFm f/2.8L IS II USM

Is there a way to 'delay' the replacement or only replace when a '(' character is found?
Okay, looks like I'm not going to have to go this deep anymore.

So far, they are liking the results with just:

regexp_replace(ee.sku,' ?\([^)]+\)') as sku_cleansed

Again, this is why we ask for the possibilities/examples.

Regex is a powerful pattern matching program.  Once you define the pattern(s), it's just coming up expression to meet that pattern.
It was a good learning event nonetheless.  Thank you for your help!

I'm going to go with your statement containing:

select regexp_replace(regexp_replace(myCol,'\(.*\)|[0-9]*.[0-9]+m',''),'[ ]+',' ') from mytab

as a correct response to what I originally asked for?
ASKER CERTIFIED SOLUTION
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Pts awarded on overall learning experience with reg expressions and EE etiquette.

Thank you!