RegEx: how to extend this Regex

Hi all,
I have the following RegEx to find and replace URLs in a string.
----------------------------
Regex re = new Regex(@"(\[URL=|\[url=)*((?<!\[img=|\[IMG=)(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?)(\])*");
----------------------------
I want to extend this query: it should ONLY do this, if the URL is NOT in an area surrounded by [html] and [/html].

Examples (this already works):
.... [url=....] ....      ==>     do not change this
.... http://www..../ ....     ==>     make it: .... [url=http://www....] ....

Examples (what I want additionally):
.... [html] ... http://www... [/html] ....    ==> do NOT change this!

Any ideas?


SmoerbleAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

DominicCroninCommented:
What you need is called "lookaround". In other words, lookahead and lookbehind features. Many modern regex languages support this. Which language are you using? It looks like Perl or Dotnet: in either case you're in luck.

The general gist of things is that lookaround expressions don't actually match characters themselves, but positions, in the same way as for example a word boundary match doesn't consume any characters either. Usually the syntax is (?=XXX) for a lookahead and (?<=XXX) for a lookbehind. Maybe an example is useful.

If you had the string:

aaaaa[html]bbbbbbb[/html]ccccccccc

then a regex of

(?<=\[html\])b

would match the first b, but not the rest.

to match all the b's you could write something like

(?<=\[html\]).*(?=\[\/html\])

In other words, match characters as long as what comes before the first one is [html] and what comes after the last one is [\html]

You might have to look up the documentation for the specific language you're using to get the details.
0
SmoerbleAuthor Commented:
I think here's a little misunderstanding:
I want everthing EXCEPT the stuff between [html] and [/html].
And this regex needs to include the regEx above.

About the language: it's C#, yes.
0
SmoerbleAuthor Commented:
Hmm... maybe a different approach: do it with several steps:

1) get all strings between [html] and [/html] (there might be more than one block), save it somehow
2) replace all URLs with the regEx from above
3) get all [html][/html] blocks in the modified string and replace them with the original strings saved in step 1.

Any idea how the code would have to look like? Possible? Clever?
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

DominicCroninCommented:
You might be able to do something with negative lookaround, using a ! instead of =:

(?<!...)
and
(?!...)

The reference for this behaviour in dotnet is here:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpcongroupingconstructs.asp

but in fact, that's probably not the way to go. The point is that if you can get the [html]...[/html] blocks to be matched by one part of your regex, they won't be being matched by the other parts of your regex. This means you can do something like:

(.*)(\[html\].*?\[\/html\])

and only emit the text matched by the first group. (I haven't tested this, and with a memory like mine, it's bound to be buggy, but hopefully it's a pointer in the right direction.)

I would also suggest that you look at the /G assertion, which forces the match to begin where the last match finished.

See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconmiscellaneousconstructs.asp

Have a look at the examples in http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemtextregularexpressionsmatchclassnextmatchtopic.asp


Hope this helps.




0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
SmoerbleAuthor Commented:
Well...
sorry, I think I only understand some parts of your forum. Can you please give me some pseudo-code?
0
DominicCroninCommented:
If you post the code you are using now, I'll see if I can show you how to fit this in.
0
DominicCroninCommented:
I don't think this should be deleted. Although the questioner didn't understand me, I think the explanations I've given and the links to relevant references should be quite adequate to help some other people facing this sort of problem.
0
SmoerbleAuthor Commented:
Oh sorry, totally missed that one.
I made a complete different approach (a checkbox that says "do not translate URLs"), as I missed your question about my code.
So I will grant you the points anyway, sorry for the delay.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.