Solved

How to make a nongreedy RegEX expression?

Posted on 2002-07-01
8
207 Views
Last Modified: 2010-05-02
I'm using regular expressions 5.5 and trying to javascripts that match a particular pattern. No problem doing that......it's just this VBScripting is so friggin greedy and when a match is found ALL scripts are removed.

What I need is some way to stop at the first </script> tag! Here is code I put together to illustrate:

-------------code-------------------
' Remove line breakes to create one line instead
' of multi lines

regEX.Pattern = "\n|\s+|\t"
sHTML = regEX.Replace(sHTML, " ")

' define the pattern I am matching
regEX.Pattern = "<script.*?>.*?</script>"
Set Matches = regEX.Execute(sHTML)   ' Execute search.
 
  For Each Match In Matches
If InStr(1, Match, "openwindow", vbTextCompare) = 0 Then
  sHTML = regEX.Replace(sHTML, "")
End If
   Next

-----------cut--------------

Like I said, above works fine if it wasn't for the fact it is tough as nails getting VBScript to stop matching at the first </script> tag like you easily can in perl. Here is 300 points for a solution!



0
Comment
Question by:Biffo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
8 Comments
 
LVL 5

Expert Comment

by:rpai
ID: 7123223
Suppose 'sGivenString' is the string  you have and you wish to stop matching at the first <script> tag.

sGivenString = "XXX YYY AAA <script> AAA BBB CCC DDD </script>"
s = Mid(sGivenString, 1, InStr(1, sGivenString, "<script>", vbTextCompare) - 1)
s = Replace(s, "AAA", "ZZZ", 1, , vbTextCompare)
Debug.Print s

Is this something that you are looking for?
0
 
LVL 5

Expert Comment

by:rpai
ID: 7123224
The above code would only replace the 'AAA' before the <script> tag with 'ZZZ'and not the one that exists within the script tag.
0
 
LVL 2

Author Comment

by:Biffo
ID: 7123249
I want to replace the entire script...everything in between <script>...</script> including the tags and globally in case there is more than one script with that matches my search pattern.



0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Expert Comment

by:rpai
ID: 7123304
So maybe something like might help:-

sGivenString = "XXX YYY AAA <script> AAA BBB CCC DDD </script>"
s = Mid(sGivenString, InStr(1, sGivenString, "<script>", vbTextCompare) - 1, InStr(1, sGivenString, "</script>", vbTextCompare) - 1 )
s = Replace(s, s, "", 1, , vbTextCompare)
Debug.Print s
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 7123827
For Each Match In Matches
If InStr(1, Match, "openwindow", vbTextCompare) = 0 Then
 sHTML = replace(sHTML,match,"")
End If
0
 
LVL 2

Author Comment

by:Biffo
ID: 7124328
All this isn't really helping me with my primary regex matching problems! And that problem is stop matching at the first </script> tag and not the last one encountered in the doc.

so I want change this:

---------------------------------------------
<head>
<title>Sample</title>
<script language="JavaScript1.1" type="text/javascript">
<!--

function whatever... {
//-->
</script>
</head>
<body>

<SCRIPT Language="Javascript">
more script whatever
</SCRIPT>

<script language="JavaScript1.1">
<!--
openwindow.....stuff here
//-->
</script>
<script language="JavaScript1.1" src="some menu perhaps"></script>
</body>
</html>
------------------------------------------------------------

To look like this:

-----------------------------------------------------
<head>
<title>Sample</title>
<script language="JavaScript1.1" type="text/javascript">
<!--

function whatever... {
//-->
</script>
</head>
<body>

<SCRIPT Language="Javascript">
more script whatever
</SCRIPT>

<script language="JavaScript1.1" src="some menu perhaps"></script>
</body>
</html>
---------------------------------------------------

And not like this which is the way current regex is doing it:

--------------------------------------------------
<head>
<title>Sample</title>
</head>
<body>


</body>
</html>
---------------------------------------------------


0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 300 total points
ID: 7125076
I wouldn't really call that a greedy match.  That's global matching.  If that's what you're getting then you've got global set on your regular expression.  Typically, the following is considered a greedy match.
 
  this <script>is</script> an example of a <script>greedy </script> match

leaving you with:

  this  match

Since you aren't trying to do a global replace (and you're not just trying to replace the first match -- which would be easy), you'll need something like the following -- which is very similar to what we've already posted.  

html = "<head>" & vbcrlf & _
    "<title>Sample</title>" & vbcrlf & _
    "<script language=""JavaScript1.1"" type=""text/javascript"">" & vbcrlf & _
    "<!--" & vbcrlf & _
    "function whatever... { " & vbcrlf & _
    "//-->" & vbcrlf & _
    "</script>" & vbcrlf & _
    "</head>" & vbcrlf & _
    "<body>" & vbcrlf & _
    "<SCRIPT Language=""Javascript"">" & vbcrlf & _
    "more script whatever" & vbcrlf & _
    "</SCRIPT>" & vbcrlf & _
    "<script language=""JavaScript1.1"">" & vbcrlf & _
    "<!--" & vbcrlf & _
    "openwindow.....stuff here" & vbcrlf & _
    "//-->" & vbcrlf & _
    "</script>" & vbcrlf & _
    "<script language=""JavaScript1.1"" src=""some menu perhaps""></script>" & vbcrlf & _
    "</body>" & vbcrlf & _
    "</html>" & vbcrlf

wscript.echo "BEFORE: " & vbcrlf & vbcrlf & html

set r = new RegExp
r.pattern = "<script(?:.|\n)*?</script>"
r.ignorecase = true
r.multiline = true
r.global = true
set matches = r.execute(html)

r.pattern = "openwindow"

for each m in matches
  if r.test(m) then html=replace(html,m,"")
next

wscript.echo "AFTER: " & vbcrlf & vbcrlf & html
0
 
LVL 2

Author Comment

by:Biffo
ID: 7125130
Yah, clockwatcher, that seems to be my solution there. I was getting ready to split up the html and place scripts all on their own one line and then just match to end of line to keep from matching too much.

Job well done and here is your points......
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction In a recent article (http://www.experts-exchange.com/A_7811-A-Better-Concatenate-Function.html) for the Excel community, I showed an improved version of the Excel Concatenate() function.  While writing that article I realized that no o…
Article by: Martin
Here are a few simple, working, games that you can use as-is or as the basis for your own games. Tic-Tac-Toe This is one of the simplest of all games.   The game allows for a choice of who goes first and keeps track of the number of wins for…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

751 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question