Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


RegEx: how to  extend this Regex

Posted on 2004-10-26
Medium Priority
Last Modified: 2010-07-27
Hi all,
I have the following RegEx to find and replace URLs in a string.
Regex re = new Regex(@"(\[URL=|\[url=)*((?<!\[img=|\[IMG=)(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?)(\])*");
I want to extend this query: it should ONLY do this, if the URL is NOT in an area surrounded by [html] and [/html].

Examples (this already works):
.... [url=....] ....      ==>     do not change this
.... http://www..../ ....     ==>     make it: .... [url=http://www....] ....

Examples (what I want additionally):
.... [html] ... http://www... [/html] ....    ==> do NOT change this!

Any ideas?

Question by:Smoerble
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4

Expert Comment

ID: 12412784
What you need is called "lookaround". In other words, lookahead and lookbehind features. Many modern regex languages support this. Which language are you using? It looks like Perl or Dotnet: in either case you're in luck.

The general gist of things is that lookaround expressions don't actually match characters themselves, but positions, in the same way as for example a word boundary match doesn't consume any characters either. Usually the syntax is (?=XXX) for a lookahead and (?<=XXX) for a lookbehind. Maybe an example is useful.

If you had the string:


then a regex of


would match the first b, but not the rest.

to match all the b's you could write something like


In other words, match characters as long as what comes before the first one is [html] and what comes after the last one is [\html]

You might have to look up the documentation for the specific language you're using to get the details.

Author Comment

ID: 12415769
I think here's a little misunderstanding:
I want everthing EXCEPT the stuff between [html] and [/html].
And this regex needs to include the regEx above.

About the language: it's C#, yes.

Author Comment

ID: 12416605
Hmm... maybe a different approach: do it with several steps:

1) get all strings between [html] and [/html] (there might be more than one block), save it somehow
2) replace all URLs with the regEx from above
3) get all [html][/html] blocks in the modified string and replace them with the original strings saved in step 1.

Any idea how the code would have to look like? Possible? Clever?
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.


Accepted Solution

DominicCronin earned 2000 total points
ID: 12425309
You might be able to do something with negative lookaround, using a ! instead of =:


The reference for this behaviour in dotnet is here:


but in fact, that's probably not the way to go. The point is that if you can get the [html]...[/html] blocks to be matched by one part of your regex, they won't be being matched by the other parts of your regex. This means you can do something like:


and only emit the text matched by the first group. (I haven't tested this, and with a memory like mine, it's bound to be buggy, but hopefully it's a pointer in the right direction.)

I would also suggest that you look at the /G assertion, which forces the match to begin where the last match finished.

See http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconmiscellaneousconstructs.asp

Have a look at the examples in http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemtextregularexpressionsmatchclassnextmatchtopic.asp

Hope this helps.


Author Comment

ID: 12428216
sorry, I think I only understand some parts of your forum. Can you please give me some pseudo-code?

Expert Comment

ID: 12436646
If you post the code you are using now, I'll see if I can show you how to fit this in.

Expert Comment

ID: 12635067
I don't think this should be deleted. Although the questioner didn't understand me, I think the explanations I've given and the links to relevant references should be quite adequate to help some other people facing this sort of problem.

Author Comment

ID: 12642675
Oh sorry, totally missed that one.
I made a complete different approach (a checkbox that says "do not translate URLs"), as I missed your question about my code.
So I will grant you the points anyway, sorry for the delay.

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this post we will learn how to connect and configure Android Device (Smartphone etc.) with Android Studio. After that we will run a simple Hello World Program.
Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question