Link to home
Create AccountLog in
Avatar of b0lsc0tt
b0lsc0ttFlag for United States of America

asked on

Alternative to lookbehind for vbscript, javascript etc

This is in relation to the question at http:Q_22860096.html

VBScript, Javascript and other "flavors" don't support lookbehinds in an expression.  In this case I want to add <br /> in front of any <img in the string.  Here is a sample:

<img>
this is some text<img>
this is more<a><img> with some like <img>
<br /><img>but don't duplicate <br /><img>

However, in the sample, there are some situations where the <br /> tag is already there.  I don't want a duplicate.

With a lookbehind this is a simple expression (see the question above).  However without one this seems impossible.  The expression needs to look for and see the <br /> without capturing it.  Can this be done without more complicated script that would depend on captured groups, etc?  I don't mind a complicated expression but it would need to work with the limits of ECMA flavor (i.e. Javascript, vbscript).  Mainly no lookbehinds, atomic grouping or conditionals (http://www.regular-expressions.info/refflavors.html).

Let me know if there are any questions or if anything isn't clear.  Thanks!

bol
Avatar of b0lsc0tt
b0lsc0tt
Flag of United States of America image

ASKER

Just to clarify, by duplicate I mean I don't want to have ...

<br /><br /><img

bol
SOLUTION
Avatar of mreuring
mreuring
Flag of Netherlands image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
Remove all <br>s before <img>s and add <br>s  then.

Thank you both.  Both suggestions will work and are great.

Martin, in the back of my mind I had the idea of using a captured group but I never pursued it.  Your script shows me that it would've been easier and cleaner than I had thought.  Nice post and it does provide a nice alternate to lookbehind.

Zvonko, thanks for that post and suggestion.  That was also a good way to get the same result as lookbehind.  I am interested to see which the Asker chooses in the other q.  ;)

I had been pursuing some great expression or solution.  I will keep this open for a bit still to see if others might have some different input or suggestions.  In some ways I had hoped to learn some great technique.  I definitely don't mean to take from either suggestion and appreciate them.  They will work for what I asked and are great.  Let me know if you have any other ideas and I welcome new ways to do this, especially if it uses a single expression (more complicated I am sure) and minimal script.

bol
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
And when space chars are in question betwean <br> and <img> then perhaps like this:

<script language=vbscript>
  expertarea1 = "<br> <img>this is some text<br>  <img>this is more<a><img> with some like <img><br /><img>but don't duplicate <br /><img>"

  Dim theExpr
  Set theExpr = New RegExp
  theExpr.Global = True
  theExpr.IgnoreCase = True
  theExpr.Pattern = "(<br ?/?>\s*)?<img"
  expertarea1 = theExpr.Replace(expertarea1, "<br /><img")

  window.alert("for bol: " & Chr(13) & expertarea1)

</script>

That's more or less the same as I had, the most notable exception being the use of a string instead of a literal expression. It doesn't do anything to change the principal though and that seems to be what b0lsc0tt was looking for, a radically different way of thinking about this problem.

The thing is, I was thinking about this last night, and there's not much of a difference to be made, especially considdering the fact that we're talking about an interpreted language.

Either you use a relatively simple regular expression, of which, by now there's ample examples, in which you match the br and the img tags and replace all instances with <br /><img...>.
Or you run through it semi-programatically. In which case you don't need to use regular expressions at all.

The reason I think you shouldn't look for any more advanced RegExp is performance. An interpreted language has no chance of pre-caching the expression, in essence creating a performance hit in re-creating the regular expression whenever the code is required. Regular expressions of course give you a whole lot of control, but if the rules are simple and finite, a programmatic approach will give you better performance...

As a result I came up with an alternative:
input = "<img>but don't duplicate <br /><img>";
parts = input.split("<img");
br = "<br />";
for (var i = 0; i < parts.length; i++) {
  line = parts[i];
  if (line.indexOf("<br") < (line.length - 7) parts[i] = line + br;
}
output = parts.join("<img");

This makes as much use of built-in functions as possible, keeping performance high. Also it's a radically different approach :) This would be my best-effort approach.
Thanks for the comments.

mreuring,
Without getting too off topic do you know (by testing) or have support for the performance you mentioned.  I understand that browser and even use (i.e. javascript/browser, vbscript/ASP) will affect this but just curious if you have quick links or personal facts.  I don't (yet) have a great knowledge or the internals or engine but I would've thought (call me bias) that a regex would be faster than making an array, for loop, etc.

Thanks again for all comments.  I will give this a little more time; I wish to see if Ozo, Amit_G, etc will post and contribute too.  I hope noone minds but input from a number of experts was my hope from the start.  The comments alone have made this worthwhile but I am greedy.  ;-)

bol
Once upon a time I did do some performance testing on the use of regular expression in javascript, times have changed and so may the results of such tests.

The problem, performance wise, wasn't in running a regular expression, or using one. In general the performance hit with regular expressions is in creating them. Since JavaScript is interpreted each time the code is read (unless there's some smart caching built into the VM's these days) that results in at least one performance hit for creating the regular expression.

If the regular expression is re-used on the same page a number of times, this one-time creation might result in better performance. However, it didn't seem to be a general trend either. So, at the time, I concluded that Regular Expressions in JavaScript were more usefull as a way of keeping your code compact but not so much a performance improvement.

The way some Java API's implemented Regular Expressions at the time supported this conclusion by being able to, and suggestion to, pre-generate RegExp objects so that you need not generate it in your running code.

As the nature of regular expressions hasn't changed much, I assume these performance issues have not changed either, but the only way to be sure would be doing an extensive test (preferably acros languages).
ASKER CERTIFIED SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
If you read my previous two posts instead of just the last one you should realise that I made the performance reference based on possibly making use of more complex expressions in an atttempts to mimic look-behinds. I'm not going to bother testing the actual difference, but I'm still inclined to think (even considdering the alternatives you've posted on your site) that in this case any regular expression more complex than what we've already suggested is going to perform slower to a programattic approach.

I'm still tempted to say that a regular expression, in a fairly simple problem as what we're dealing with here, is not going to result in the best performance possible. Regular expressions, for those of us blessed with a marginal degree of understanding, will make for clean/small/readable code, not performance. I don't know enough of the actual implementation behind regular expressions, but I don't think they're ever the best performing solution, nothing so generic ever seems to be.

I will concede that it won't likely make a huge difference, but this might be an entirally different matter when the input grows beyond a few meager lines of text into a huge page of several Mb's.
Avatar of stevenlevithan
stevenlevithan

From my personal experience only, significant performance issues involved with regular expressions are generally caused by developers who don't know how to write optimized regular expressions or don't deeply understand backtracking and other regex performance-related issues. Regex optimization is something of a dark art.

Check out this article: http://www.javaworld.com/javaworld/jw-09-2007/jw-09-optimizingregex.html

It is, in part, Java specific, and makes a few debatable claims, but on the whole it is a very good article for the basics of regex optimization (a topic which isn't written about often enough, or at least not by people who know what they're talking about). "Mastering Regular Expressions" by Jeffrey Friedl also contains some very good discussion of regex performance issues and techniques.
Thanks for the posts.  I am glad to see a little activity still and will keep this open for a bit still to see if it continues.

stevenlevithan,
Welcome to EE and I'm glad you contributed.  That is a very interesting article and has some great info.  I "heard" about the article on your site from another source (thanks! ;)) and was going to post it here when I closed this just as an FYI.  It definitely had helpful info on this topic and now it can even be one of the accepted answers.  I hope to see you around here more. :)

Let me know if anyone has a question from me.  Thanks for the comments so far which have provided the info I was hoping for.

bol
@b0lsc0tt, thanks! Just to be clear, when I referenced people who don't know what they're talking about when it comes to regex performance, I wasn't talking about anyone here. mreuring's points are certainly valid, but IMHO such issues are often less impactive than many people realize, especially with modern browsers and regex engines.
Thanks for all of the comments.  It was a very worthwhile question and I got great info and help.  Thank you all and I'll see you around. :)

bol
Hehe, it was a verry good question for most of us I think :) I certainly enjoyed learning a bit more on this one. More experts should go around asking interresting questions :)