[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Alternative to lookbehind for vbscript, javascript etc

Posted on 2007-09-28
16
Medium Priority
?
1,151 Views
Last Modified: 2008-01-09
This is in relation to the question at http:Q_22860096.html

VBScript, Javascript and other "flavors" don't support lookbehinds in an expression.  In this case I want to add <br /> in front of any <img in the string.  Here is a sample:

<img>
this is some text<img>
this is more<a><img> with some like <img>
<br /><img>but don't duplicate <br /><img>

However, in the sample, there are some situations where the <br /> tag is already there.  I don't want a duplicate.

With a lookbehind this is a simple expression (see the question above).  However without one this seems impossible.  The expression needs to look for and see the <br /> without capturing it.  Can this be done without more complicated script that would depend on captured groups, etc?  I don't mind a complicated expression but it would need to work with the limits of ECMA flavor (i.e. Javascript, vbscript).  Mainly no lookbehinds, atomic grouping or conditionals (http://www.regular-expressions.info/refflavors.html).

Let me know if there are any questions or if anything isn't clear.  Thanks!

bol
0
Comment
Question by:b0lsc0tt
  • 5
  • 5
  • 3
  • +1
16 Comments
 
LVL 54

Author Comment

by:b0lsc0tt
ID: 19982653
Just to clarify, by duplicate I mean I don't want to have ...

<br /><br /><img

bol
0
 
LVL 17

Assisted Solution

by:mreuring
mreuring earned 660 total points
ID: 19983055
Well, you could simply capture the br, but not group it and replace all instance of img (including the br, when it exists) with the br/img combo:

var re = /(?:<br \/>)?(<img>)/g;
alert("<img>but don't duplicate <br /><img>".replace(re, "<br />$1"));

Hope it helps,


 Martin
0
 
LVL 63

Expert Comment

by:Zvonko
ID: 19983166
Remove all <br>s before <img>s and add <br>s  then.

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 54

Author Comment

by:b0lsc0tt
ID: 19985328
Thank you both.  Both suggestions will work and are great.

Martin, in the back of my mind I had the idea of using a captured group but I never pursued it.  Your script shows me that it would've been easier and cleaner than I had thought.  Nice post and it does provide a nice alternate to lookbehind.

Zvonko, thanks for that post and suggestion.  That was also a good way to get the same result as lookbehind.  I am interested to see which the Asker chooses in the other q.  ;)

I had been pursuing some great expression or solution.  I will keep this open for a bit still to see if others might have some different input or suggestions.  In some ways I had hoped to learn some great technique.  I definitely don't mean to take from either suggestion and appreciate them.  They will work for what I asked and are great.  Let me know if you have any other ideas and I welcome new ways to do this, especially if it uses a single expression (more complicated I am sure) and minimal script.

bol
0
 
LVL 63

Assisted Solution

by:Zvonko
Zvonko earned 660 total points
ID: 19986834
Here a single line expression:

<script language=vbscript>
  expertarea1 = "<br><img>this is some text<br><img>this is more<a><img> with some like <img><br /><img>but don't duplicate <br /><img>"

  Dim theExpr
  Set theExpr = New RegExp
  theExpr.Global = True
  theExpr.IgnoreCase = True
  theExpr.Pattern = "(<br ?/?>)?<img"
  expertarea1 = theExpr.Replace(expertarea1, "<br /><img")

  window.alert("for bol: " & Chr(13) & expertarea1)

</script>


0
 
LVL 63

Expert Comment

by:Zvonko
ID: 19986839
And when space chars are in question betwean <br> and <img> then perhaps like this:

<script language=vbscript>
  expertarea1 = "<br> <img>this is some text<br>  <img>this is more<a><img> with some like <img><br /><img>but don't duplicate <br /><img>"

  Dim theExpr
  Set theExpr = New RegExp
  theExpr.Global = True
  theExpr.IgnoreCase = True
  theExpr.Pattern = "(<br ?/?>\s*)?<img"
  expertarea1 = theExpr.Replace(expertarea1, "<br /><img")

  window.alert("for bol: " & Chr(13) & expertarea1)

</script>

0
 
LVL 17

Expert Comment

by:mreuring
ID: 19988123
That's more or less the same as I had, the most notable exception being the use of a string instead of a literal expression. It doesn't do anything to change the principal though and that seems to be what b0lsc0tt was looking for, a radically different way of thinking about this problem.

The thing is, I was thinking about this last night, and there's not much of a difference to be made, especially considdering the fact that we're talking about an interpreted language.

Either you use a relatively simple regular expression, of which, by now there's ample examples, in which you match the br and the img tags and replace all instances with <br /><img...>.
Or you run through it semi-programatically. In which case you don't need to use regular expressions at all.

The reason I think you shouldn't look for any more advanced RegExp is performance. An interpreted language has no chance of pre-caching the expression, in essence creating a performance hit in re-creating the regular expression whenever the code is required. Regular expressions of course give you a whole lot of control, but if the rules are simple and finite, a programmatic approach will give you better performance...

As a result I came up with an alternative:
input = "<img>but don't duplicate <br /><img>";
parts = input.split("<img");
br = "<br />";
for (var i = 0; i < parts.length; i++) {
  line = parts[i];
  if (line.indexOf("<br") < (line.length - 7) parts[i] = line + br;
}
output = parts.join("<img");

This makes as much use of built-in functions as possible, keeping performance high. Also it's a radically different approach :) This would be my best-effort approach.
0
 
LVL 54

Author Comment

by:b0lsc0tt
ID: 19995283
Thanks for the comments.

mreuring,
Without getting too off topic do you know (by testing) or have support for the performance you mentioned.  I understand that browser and even use (i.e. javascript/browser, vbscript/ASP) will affect this but just curious if you have quick links or personal facts.  I don't (yet) have a great knowledge or the internals or engine but I would've thought (call me bias) that a regex would be faster than making an array, for loop, etc.

Thanks again for all comments.  I will give this a little more time; I wish to see if Ozo, Amit_G, etc will post and contribute too.  I hope noone minds but input from a number of experts was my hope from the start.  The comments alone have made this worthwhile but I am greedy.  ;-)

bol
0
 
LVL 17

Expert Comment

by:mreuring
ID: 19995436
Once upon a time I did do some performance testing on the use of regular expression in javascript, times have changed and so may the results of such tests.

The problem, performance wise, wasn't in running a regular expression, or using one. In general the performance hit with regular expressions is in creating them. Since JavaScript is interpreted each time the code is read (unless there's some smart caching built into the VM's these days) that results in at least one performance hit for creating the regular expression.

If the regular expression is re-used on the same page a number of times, this one-time creation might result in better performance. However, it didn't seem to be a general trend either. So, at the time, I concluded that Regular Expressions in JavaScript were more usefull as a way of keeping your code compact but not so much a performance improvement.

The way some Java API's implemented Regular Expressions at the time supported this conclusion by being able to, and suggestion to, pre-generate RegExp objects so that you need not generate it in your running code.

As the nature of regular expressions hasn't changed much, I assume these performance issues have not changed either, but the only way to be sure would be doing an extensive test (preferably acros languages).
0
 
LVL 1

Accepted Solution

by:
stevenlevithan earned 680 total points
ID: 20010562
Some browsers will cache the most recently used regular expressions automagically, and in any case, if you assign a regex object to a variable (whether it be a regex literal or created using the RegExp constructor), it won't need to be compiled again in the future. It's not like regex compilation is that expensive in JavaScript to begin with. The regexes shown in this thread are so simple they will typically take no more than a nanosecond to compile.

The regexes posted so far will probably work well for your needs, based on the description of the problem thus far. When more complex lookbeind mimicking is necessary, there is often a way to pull it off (depending on the programming language you're using), but it will typically involve more than a simple regex. For JavaScript specifically, I've written up some ways to go about it at http://blog.stevenlevithan.com/archives/mimic-lookbehind-javascript
0
 
LVL 17

Expert Comment

by:mreuring
ID: 20010966
If you read my previous two posts instead of just the last one you should realise that I made the performance reference based on possibly making use of more complex expressions in an atttempts to mimic look-behinds. I'm not going to bother testing the actual difference, but I'm still inclined to think (even considdering the alternatives you've posted on your site) that in this case any regular expression more complex than what we've already suggested is going to perform slower to a programattic approach.

I'm still tempted to say that a regular expression, in a fairly simple problem as what we're dealing with here, is not going to result in the best performance possible. Regular expressions, for those of us blessed with a marginal degree of understanding, will make for clean/small/readable code, not performance. I don't know enough of the actual implementation behind regular expressions, but I don't think they're ever the best performing solution, nothing so generic ever seems to be.

I will concede that it won't likely make a huge difference, but this might be an entirally different matter when the input grows beyond a few meager lines of text into a huge page of several Mb's.
0
 
LVL 1

Expert Comment

by:stevenlevithan
ID: 20011572
From my personal experience only, significant performance issues involved with regular expressions are generally caused by developers who don't know how to write optimized regular expressions or don't deeply understand backtracking and other regex performance-related issues. Regex optimization is something of a dark art.

Check out this article: http://www.javaworld.com/javaworld/jw-09-2007/jw-09-optimizingregex.html

It is, in part, Java specific, and makes a few debatable claims, but on the whole it is a very good article for the basics of regex optimization (a topic which isn't written about often enough, or at least not by people who know what they're talking about). "Mastering Regular Expressions" by Jeffrey Friedl also contains some very good discussion of regex performance issues and techniques.
0
 
LVL 54

Author Comment

by:b0lsc0tt
ID: 20018119
Thanks for the posts.  I am glad to see a little activity still and will keep this open for a bit still to see if it continues.

stevenlevithan,
Welcome to EE and I'm glad you contributed.  That is a very interesting article and has some great info.  I "heard" about the article on your site from another source (thanks! ;)) and was going to post it here when I closed this just as an FYI.  It definitely had helpful info on this topic and now it can even be one of the accepted answers.  I hope to see you around here more. :)

Let me know if anyone has a question from me.  Thanks for the comments so far which have provided the info I was hoping for.

bol
0
 
LVL 1

Expert Comment

by:stevenlevithan
ID: 20018290
@b0lsc0tt, thanks! Just to be clear, when I referenced people who don't know what they're talking about when it comes to regex performance, I wasn't talking about anyone here. mreuring's points are certainly valid, but IMHO such issues are often less impactive than many people realize, especially with modern browsers and regex engines.
0
 
LVL 54

Author Comment

by:b0lsc0tt
ID: 20045056
Thanks for all of the comments.  It was a very worthwhile question and I got great info and help.  Thank you all and I'll see you around. :)

bol
0
 
LVL 17

Expert Comment

by:mreuring
ID: 20045284
Hehe, it was a verry good question for most of us I think :) I certainly enjoyed learning a bit more on this one. More experts should go around asking interresting questions :)
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

825 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question