Link to home
Start Free TrialLog in
Avatar of musickmann
musickmann

asked on

Grep pattern in TextWrangler

I'm trying to determine a grep pattern that will allow me to find and replace all lines that look like this:
<style type=""text/css"">/*<![CDATA[*/    span.c1 {BACKGROUND-COLOR: #c0c0c0}    /*]]>*/    </style>

Basically anything between style tags.
<style?[^>]*> will get be that first style tag, but I'm hitting a wall getting the selection to extend to the last </style>

I'm putting this in a generic linux category cause I would assume that the grep rules would be pretty close?
Avatar of Gerwin Jansen
Gerwin Jansen
Flag of Netherlands image

Do you want to replace the complete line or just what's in between the tags? Also: will you have different first tags and always just </style> as as closing tag? Replacing what's in between the tags could be done with sed where you have 3 RE's and where you replace the 2nd RE with whatever you want.
Avatar of musickmann
musickmann

ASKER

I am looking to remove the style tags completely. I'm not sure what sed and 3RE's refers to though.
RE == Regular Expression.

sed == Stream EDitor - Command line program

You probably should have added the Mac zones since Text Wrangler is a Mac product.

You should do it in 2 passes, since that's a little easier.   If you have the first one made, then use that.  The 2nd one should just be </style>
I'm assuming the style tags are across multiple lines, right?
Thanks, I was so single minded on this I didn't even think outside of the scope to sed. I haven't used the sed command much, so wouldn't know where to start there.

I think I wasn't clear enough still yet - I want to remove the style tags and everything in between. I could certainly remove the opening and closing tags in two passes, but I'm having trouble finding a way to select the two tags and everything in between to get removed.

If there is a sed command that makes this easier, I could attempt that. I imagine is something like
sed [pattern] [filename] from the terminal?
>> I want to remove the style tags and everything in between
I understand, you mean by 'everything in between' that it can be spread over several lines?

About sed: yes it's like that, you give it some patterns to search for and when found perform some actions (basically).
In Text Wrangler:

Find: <style.</style> would find the whole style statement.
Ooops sorry. Ignore that :D
Rather: <style.+?style>
ASKER CERTIFIED SOLUTION
Avatar of serialband
serialband
Flag of Ukraine image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Awesome, that worked perfectly, and I don't think I would have ever gotten there on my own. I've been going over the grep references trying to figure out the syntax and just not getting it very well.

I usually try to narrate through statements like this but I guess I'm just not sure what grep is looking at. If I understand the components of the expression:
* = 0 or more previous characters
[^>] = any character that is NOT >
? = 0 or 1 previous characters

So I'm not sure how that stings together to tell it to find all the characters between the two style tags.
TextWrangler's grep seems to be implemented a little differently from command line grep.  Otherwise I would have been able to match it with <style*style>.
There are many different flavours of grep. Like for instance InDesign has few commands that are not available else where.

in TextWrangler <style*style> does not find anything but <style.+?style> finds any character, one or more
(except a return)  between <style and style. Actually it could be shortened down to <s.+?e>
The question mark makes the command less greedy.
Then you should have gotten the points for your much simpler solution.  I only just installed TextWrangler yesterday for testing that.
No problem about the points. He used your suggestion.

Try out my solution just for fun and try to make it not work. :D
Sigurdur - I'm sorry, I never even saw your comments until just now when I got the last email, I was refreshing this page yesterday and only saw the comment from serialband.

However, you're solution seems to require the string to be on a single line. In this one example, that happens to be the case, but I have some other files I'll be working with in the coming days where that will not be the case.
I'm sorry I didn't see the solution earlier, both work and do exactly what I was looking for. I tried several variants of the * .* +* etc but I guess it was the ? that I needed to use.
Nevermind the points. :D

The only thing that breaks the . command is the return, but one of the things you could use to include a return if the line breaks is like this:

<style.*?\r*.*?style>

This is <style, any character zero or more time, return zero or more times, any character zero ore more times and then style> to close.