Avatar of lightspeedvt
lightspeedvt
 asked on

RegExp to complete

Hi Experts,

I've come to one RegExp on my own and it is near perfect to my needs, but it is needed to solve one issue.

Here is the income string:

section{  font-size  :  22px  ;    color : #cecece; } @media { #holder{ color:#cecece; font-size : 10px; border-color:#cecece; font:color sasd;   font:border-color; }  article{ font-size:14px; } header { color:#ffffff; } } .element{font-size:20px;} section{ font-size:12px; color:#cecece; } @media() {.element{font-size:20px;}} section{ font-size:12px; color:#cecece; font-weight:bold; } footer{color:#cacaca;  font:border-color, color, more, color; color:border-color, color, more, color; font-size:1111; font-size:color getting, better; color: color; font-size: more color;} test{font-size:10px;color:zzz;}

Open in new window


Regular expression:

(?!{)(?!;)(?![^{;]*(border-color|color)[^{;]*)\b[^{;]*;

Open in new window


Here is the highlighted result screenshot at http://regexpal.com/ 
Results
Issue is that at some reason it selects all text that comes after words "border-color" and "color" that are listed with (border-color|color) up to symbol ";". I am trying to fix it, to not select those text if the needed word exist till ";". I've did a screenshot marked with red rectangle of what should be omitted in search results
Results marked with red rectangle that shouldn't be included
Also, here is the logical explanation what I am trying to achieve: RegExp has to select all text that hasn't needed word up to symbol ";" (including those symbol)  inside { .... }. If at least one of needed word exists inside those part of text - it should remain up to symbol ";".

Want to notice that it is needed to work under Javascript, so Lookbehind Assertions won't work.

This question is connected with my previous question: https://www.experts-exchange.com/questions/27878123/Javascript-Regular-Expression-to-parse-CSS-text.html
I still wasn't able to get all in one RegExp at those question, so I've decided to work more on my own and simplify task.

I will be very thankful if somebody will help me to complete with this.
Regular ExpressionsScripting LanguagesJavaScript

Avatar of undefined
Last Comment
ahoffmann

8/22/2022 - Mon
Sean Stuber

if you're only looking for numeric font sizes or the font-weight:bold try this...

font-size[:0-9 px]+;|font-weight:bold;
lightspeedvt

ASKER
There could be any text. So coming from my RegExp, you can see that we are just looking for the words that are specified, and if at least one word found - all those text up to ";" symbol shouldn't be in match.
Sean Stuber

I'm afraid I don't understand.

Can you give an example of where the expression above doesn't yield the results you're looking for?
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
lightspeedvt

ASKER
Sure, I will try to explain better.

Inside the { ... } we have text that is delimited by ";". Example:

myTag1{ someText1: some value 1; someText2:some value 2; someText3 :some value 3 ;}

RegExp has to filter the text that goes inside { } :
a) someText1: some value 1;
b) someText2:some value 2;
c) someText3 :some value 3 ;

And match only those text that hasn't at least one needed specified word.

Real example:
List of words that we need to not have in text matches: "border-color", "color", "url".

Incoming string:
myTag{ font-size: 10px; color:black; font :color ; padding: 10px 5px; border-color:color 1px solid; background: url repeat; font-weight: bold; }

I've marked with bold text that shouldn't be in match.

So, the result if I will do replace with empty symbol will be:
myTag{ color:black; font :color ; border-color:color 1px solid; background: url repeat; }

If you will check my question, I have there the complete very good example of possible incoming string and 2 screenshots (how it is matching with my RegExp and what text still has to be not included in match results that I marked with red rectangle).
lightspeedvt

ASKER
Here is the result of replacing matches with empty symbol from the incoming string that I provided in my question:

section{  color : #cecece; } @media { #holder{ color:#cecece; border-color:#cecece; font:color sasd;   font:border-color; }  article{ } header { color:#ffffff; } } .element{} section{ color:#cecece; } @media() {.element{}} section{ color:#cecece; } footer{color:#cacaca;  font:border-color, color, more, color; color:border-color, color, more, color; font-size:color getting, better; color: color; font-size: more color;} test{color:zzz;}

Open in new window

Sean Stuber

After removing all strings fround with the regular expression I posted above from your first string, you get the string you just pointed.  

Other than a some extra whitespace, the results are identical.  You're right -  it was a good example.  Using it I was able to produce what you asked for.

however, the results are not the same with your second example, but the rules changed there and are inconsistent.  You say you want to remove anything with a word found, but you want to keep "font-weight: bold;"  bold is a word, but it doesn't follow that rule?  In your examples, your removed font-weight: bold;,  but in your descriptions and highlighted examples of what was supposed to be removed you don't.

which are correct?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Sean Stuber

Using only the example input strings and the expected output strings,
Then  this regexp seems to work

 *[^{;:]+: *([0-9]+(px *)?|bold)+;
lightspeedvt

ASKER
Ok. Just to keep all consistent I am listing only one example.

Incoming string:

section{  font-size  :  22px  ;    color : #cecece; } @media { #holder{ color:#cecece; font-size : 10px; border-color:#cecece; font:color sasd;   font:border-color; }  article{ font-size:14px; } header { color:#ffffff; } } .element{font-size:20px;} section{ font-size:12px; color:#cecece; } @media() {.element{font-size:20px;}} section{ font-size:12px; color:#cecece; font-weight:bold; } footer{color:#cacaca;  font:border-color, color, more, color; color:border-color, color, more, color; font-size:1111; font-size:color getting, better; color: color; font-size: more color;} test{font-size:10px;color:zzz;}

Open in new window


Regular Expression that I have:

(?!{)(?!;)(?![^{;]*(border-color|color)[^{;]*)\b[^{;]*;

Open in new window


Screenshot of match results that I am getting with my expression:
Results with my RegExp
Replace result with empty symbol that I am getting with my expression:

section{ color } @media { #holder{ color border-color font:color font:border-color; } article{ } header { color } } .element{} section{ color } @media() {.element{}} section{ color } footer{color font:border-color, color, more, color; color:border-color, color, more, color; font-size:color color: color; font-size: more color;} test{color}

Open in new window


Replace result with empty symbol that I need to get with my expression:

section{  color : #cecece; } @media { #holder{ color:#cecece; border-color:#cecece; font:color sasd;   font:border-color; }  article{ } header { color:#ffffff; } } .element{} section{ color:#cecece; } @media() {.element{}} section{ color:#cecece; } footer{color:#cacaca;  font:border-color, color, more, color; color:border-color, color, more, color; font-size:color getting, better; color: color; font-size: more color;} test{color:zzz;}

Open in new window


Screenshot of match results that I am getting with my expression that has text marked with red rectangle that should be not matched too in result (missed in result):
Results with my RegExp marked with red rectangle text that shouldn't be in match too
Screenshot of match results that I am getting with my expression overlayed by text marked with black rectangle that should match in result:
Matched that should be in result overlayed with result of my RegExp
Screenshot of match results that I am getting with my expression has all text that should be in match results marked with red background:
Matches that has to be in result
Screenshot of result that I need to get:
Results that I need to get
As for white spaces - they are not important. Matches are not connected to some words (font-size, font-weight, etc). Matches has to exclude all text that hasn't the specified words: "color" or "border-color".  And all this text delimited by ";" symbol that is located inside "{" ... "}".

Let me know if I need to explain more based just on this example.

Thanks.
Sean Stuber

using the expression  in my previous post http:#a38451518 returns what you requested.

if it's still not what you want, please post an expression where it does not work

btw,  you don't need to post the screen shots,
only the incoming and expected results after replacement

everything else simply added confusion
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
lightspeedvt

ASKER
Hello sdstuber.

I've sent my comment right after your one. RegExp hasn't to look for "font-size" or "font-weight". Those text is just for example. As there could be any text "padding", "margin", "height", etc. And all of them can have it's own values in different format. So it is impossible to list all text that has to be replaced. That is why I am connecting to words that has to be not matched. Those words are "color", "border-color" (in this case I will be able to add more words to exclude).
lightspeedvt

ASKER
Here is the part of incoming string that won't work with your way (you will see what I mean):

section{  padding:20px 10px; font-size  :  22px  ;    color : #cecece; background:#cecece; }

Open in new window

Sean Stuber

what result are you expecting with that input?

I get this as the result using the regexp above

section{ color : #cecece; background:#cecece; }
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
lightspeedvt

ASKER
In those input I've added couple of more text, and all of it has to be replaced with empty symbol. So remains only text with "color" or "border-color":

section{  color : #cecece; }

Open in new window


I can add more and more text. And that is why it is impossible to list all text that has to be replaced.
lightspeedvt

ASKER
For example this:

section{  color : #cecece; float: left; background: url(http://site.com/files/rocker/images/bg_primaryNav_right.gif) right bottom no-repeat; padding: 0 .8em 2px; margin: 0; font-size: 1.2em; text-decoration: none; border-left: 1px solid #0F1924; cursor: default; line-height: 1; position: absolute; bottom: calc(100% - 15px); text-indent: -9999999px;  background-position: -626px -25px;}

Open in new window


And result still should be the same:

section{  color : #cecece; }

Open in new window

Sean Stuber

since you can't use negative lookarounds (i.e. look for strings that are NOT x) you can't do what you want with simple regexp replace.

you either have to positive searches (i.e. look for strings the ARE x) for what you want to remove as I have shown above

or do positive searches for what you want to keep and then construct a new string by concatenating whatever you find
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
lightspeedvt

ASKER
It is possible to do positive and negative lookaheads, and impossible only lookbehinds. So I built my own regexp that has to do matches of all text that hasn't needed words, and it is not matching text before the needed word and needed word itself, but it still matches text after the needed word. I have more concerns that I am missing something in my regexp:

(?!{)(?!;)(?![^{;]*(border-color|color)[^{;]*)\b[^{;]*;
ahoffmann

can you please explain in short simple words what the regex shoud match
please post the match(es) you expect as plain ASCII text here
lightspeedvt

ASKER
Hello ahoffmann.

In short words: RegExp has to match all text that hasn't needed word up to symbol ";" (including those symbol)  inside { .... }. If at least one of needed word exists inside those part of text - it should not match up to symbol ";".

Incoming string:
section{  font-size  :  22px  ;    color : #cecece; } @media { #holder{ color:#cecece; font-size : 10px; border-color:#cecece; font:color sasd;   font:border-color; }  article{ font-size:14px; } header { color:#ffffff; } } .element{font-size:20px;} section{ font-size:12px; color:#cecece; } @media() {.element{font-size:20px;}} section{ font-size:12px; color:#cecece; font-weight:bold; } footer{color:#cacaca;  font:border-color, color, more, color; color:border-color, color, more, color; font-size:1111; font-size:color getting, better; color: color; font-size: more color;} test{font-size:10px;color:zzz;}

Open in new window


RegExp that I have:
(?!{)(?!;)(?![^{;]*(border-color|color)[^{;]*)\b[^{;]*;

Replace with empty symbol result that should be:
section{  color : #cecece; } @media { #holder{ color:#cecece; border-color:#cecece; font:color sasd;   font:border-color; }  article{ } header { color:#ffffff; } } .element{} section{ color:#cecece; } @media() {.element{}} section{ color:#cecece; } footer{color:#cacaca;  font:border-color, color, more, color; color:border-color, color, more, color; font-size:color getting, better; color: color; font-size: more color;} test{color:zzz;}

Open in new window


Incoming text can be different, so it is impossible to list all patterns, but it has common logic - all those text is inside { ... } and delimited by ";"

Very good explanation with screenshots is in my comment: Posted on 2012-10-01 at 10:03:17 ID: 38451526
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ahoffmann

do I understand correctly that you have nested scopes of { ... } ?
and your regex should match in each scope separately?

if so, regex is the wrong approach to do it, as you always may find (and get) examles where it fails, you need a proper parser for that
I'm not telling that's impossible, but you'll have more pain than you can imagine ...

just one example to think about:
element.before{content:"nasty;} color:#dead;";}
lightspeedvt

ASKER
Hello  ahoffmann,

I have nested text delimited by ";" inside { ... }.

Thing is that incoming string is CSS formatted text, so it always formatted correctly. Each "{" has closing "}" (similar to HTML tags). And text inside { ... } has ";" symbol in the end of text parts. That is all with formatting logic. So RegExp should have only these 3 symbols: "{", "}" and ";". If RegExp built only with these symbols - it will be safe for CSS text. So situation with " that you described in example will work fine. And it is normal. But additional closing "}" can't be inside CSS if it hasn't opened "{". So this is fine too. Even if it is very needed to use ":" symbol - it can be used too.

Based on that logic find between closures not needed text seems to be right for Regular Expressions...
ASKER CERTIFIED SOLUTION
Sean Stuber

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
lightspeedvt

ASKER
You are Genius!!! While I was stuck trying to not match, you had inverted logic - it is needed to match! And the result of matches  joined by empty symbol will be correct!

Thank you very much!
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
lightspeedvt

ASKER
This Expert has very good logic!
ahoffmann

> But additional closing "}" can't be inside CSS if it hasn't opened "{".
this statement applies to your (expected) examples but not to CSS in general, does it?
lightspeedvt

ASKER
Sure, we can add additional closing "}" into CSS, but in this case stylesheet will be processed not correctly. Site with such CSS will be looking broken. Developer that is coding those site will see that it has issue. So he will come to those error and fix it because site won't be looking good until fixing those issue. So it make no sense to have "}" without "{".
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ahoffmann

> ..  but in this case stylesheet will be processed not correctly.

you mean your code will not process it correctly, right?
my example is w3c-conform CSS syntax and not brocken!

if so, my question would be: why should I use a tool which forces me to use non-standards
;-)
lightspeedvt

ASKER
Let's take W3C CSS Validator:
http://jigsaw.w3.org/css-validator/

Select "By direct input".

Now place this simple CSS code:
td{ color:#cecece; } }

Open in new window


You will get error as it is standard in CSS.
ahoffmann

hmm, let's take W3C CSS Validator:
http://jigsaw.w3.org/css-validator/

Select "By direct input".

Now place this simple CSS code (see my first comment above):
element.before{content:"nasty;} color:#dead;";}

You will get "congratulations" as it is standard in CSS.
----
no offence meant, just wondering about a regex which can't validate w3c CSS
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
lightspeedvt

ASKER
Got it! Thanks for the example!

As "content" property is only one exception (at least with current CSS3 revision). So, regex from sdstuber has to be fixed and not select "{" and "}" inside " .... " that is inside { ..... }
lightspeedvt

ASKER
And to keep in mind that symbol " can be unescaped by "\" inside "....". Example:
content: "some text \" {  inside }"
ahoffmann

as I said; such nested scopes are very hard to detect properly with regex, you better go with a parser
now you have also to escape
  { }
  \" but only inside "
  \' but only inside '
  CR and NL but only inside " or ' (if not escaped)
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
lightspeedvt

ASKER
Hard, but possible. Using parser is not an option, as it will take too much CPU resources and will be causing browser freezes.

To @sdstuber - do you think you can update regex to escape them?
ahoffmann

I never said impossible, but very, very hard (best you ask Friedl :)

here's just a beginner to match simple quoted strings:

(["'])(?:\\?+.)*?\1

note that it does not identify \ at end of line to escape CR or NL, feel free to add it ...
if you get it working you can build an ored group with similar regex for your braces, and so on ...
NOTE that you have to properly adapt it to work with JavaScript, in particualr the intended usage of \

things are much easyer in perl where a regex can contain code, and can reference/call itself, and you have full support for all kinds of positive and negative lookbehind and lookahead

sorry, I'm too lazy to build and test a regex in days, when a parser can be build in seconds
hope this helps, good luck