Solved

How to make a nongreedy RegEX expression?

Posted on 2002-07-01
8
206 Views
Last Modified: 2010-05-02
I'm using regular expressions 5.5 and trying to javascripts that match a particular pattern. No problem doing that......it's just this VBScripting is so friggin greedy and when a match is found ALL scripts are removed.

What I need is some way to stop at the first </script> tag! Here is code I put together to illustrate:

-------------code-------------------
' Remove line breakes to create one line instead
' of multi lines

regEX.Pattern = "\n|\s+|\t"
sHTML = regEX.Replace(sHTML, " ")

' define the pattern I am matching
regEX.Pattern = "<script.*?>.*?</script>"
Set Matches = regEX.Execute(sHTML)   ' Execute search.
 
  For Each Match In Matches
If InStr(1, Match, "openwindow", vbTextCompare) = 0 Then
  sHTML = regEX.Replace(sHTML, "")
End If
   Next

-----------cut--------------

Like I said, above works fine if it wasn't for the fact it is tough as nails getting VBScript to stop matching at the first </script> tag like you easily can in perl. Here is 300 points for a solution!



0
Comment
Question by:Biffo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
8 Comments
 
LVL 5

Expert Comment

by:rpai
ID: 7123223
Suppose 'sGivenString' is the string  you have and you wish to stop matching at the first <script> tag.

sGivenString = "XXX YYY AAA <script> AAA BBB CCC DDD </script>"
s = Mid(sGivenString, 1, InStr(1, sGivenString, "<script>", vbTextCompare) - 1)
s = Replace(s, "AAA", "ZZZ", 1, , vbTextCompare)
Debug.Print s

Is this something that you are looking for?
0
 
LVL 5

Expert Comment

by:rpai
ID: 7123224
The above code would only replace the 'AAA' before the <script> tag with 'ZZZ'and not the one that exists within the script tag.
0
 
LVL 2

Author Comment

by:Biffo
ID: 7123249
I want to replace the entire script...everything in between <script>...</script> including the tags and globally in case there is more than one script with that matches my search pattern.



0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Expert Comment

by:rpai
ID: 7123304
So maybe something like might help:-

sGivenString = "XXX YYY AAA <script> AAA BBB CCC DDD </script>"
s = Mid(sGivenString, InStr(1, sGivenString, "<script>", vbTextCompare) - 1, InStr(1, sGivenString, "</script>", vbTextCompare) - 1 )
s = Replace(s, s, "", 1, , vbTextCompare)
Debug.Print s
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 7123827
For Each Match In Matches
If InStr(1, Match, "openwindow", vbTextCompare) = 0 Then
 sHTML = replace(sHTML,match,"")
End If
0
 
LVL 2

Author Comment

by:Biffo
ID: 7124328
All this isn't really helping me with my primary regex matching problems! And that problem is stop matching at the first </script> tag and not the last one encountered in the doc.

so I want change this:

---------------------------------------------
<head>
<title>Sample</title>
<script language="JavaScript1.1" type="text/javascript">
<!--

function whatever... {
//-->
</script>
</head>
<body>

<SCRIPT Language="Javascript">
more script whatever
</SCRIPT>

<script language="JavaScript1.1">
<!--
openwindow.....stuff here
//-->
</script>
<script language="JavaScript1.1" src="some menu perhaps"></script>
</body>
</html>
------------------------------------------------------------

To look like this:

-----------------------------------------------------
<head>
<title>Sample</title>
<script language="JavaScript1.1" type="text/javascript">
<!--

function whatever... {
//-->
</script>
</head>
<body>

<SCRIPT Language="Javascript">
more script whatever
</SCRIPT>

<script language="JavaScript1.1" src="some menu perhaps"></script>
</body>
</html>
---------------------------------------------------

And not like this which is the way current regex is doing it:

--------------------------------------------------
<head>
<title>Sample</title>
</head>
<body>


</body>
</html>
---------------------------------------------------


0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 300 total points
ID: 7125076
I wouldn't really call that a greedy match.  That's global matching.  If that's what you're getting then you've got global set on your regular expression.  Typically, the following is considered a greedy match.
 
  this <script>is</script> an example of a <script>greedy </script> match

leaving you with:

  this  match

Since you aren't trying to do a global replace (and you're not just trying to replace the first match -- which would be easy), you'll need something like the following -- which is very similar to what we've already posted.  

html = "<head>" & vbcrlf & _
    "<title>Sample</title>" & vbcrlf & _
    "<script language=""JavaScript1.1"" type=""text/javascript"">" & vbcrlf & _
    "<!--" & vbcrlf & _
    "function whatever... { " & vbcrlf & _
    "//-->" & vbcrlf & _
    "</script>" & vbcrlf & _
    "</head>" & vbcrlf & _
    "<body>" & vbcrlf & _
    "<SCRIPT Language=""Javascript"">" & vbcrlf & _
    "more script whatever" & vbcrlf & _
    "</SCRIPT>" & vbcrlf & _
    "<script language=""JavaScript1.1"">" & vbcrlf & _
    "<!--" & vbcrlf & _
    "openwindow.....stuff here" & vbcrlf & _
    "//-->" & vbcrlf & _
    "</script>" & vbcrlf & _
    "<script language=""JavaScript1.1"" src=""some menu perhaps""></script>" & vbcrlf & _
    "</body>" & vbcrlf & _
    "</html>" & vbcrlf

wscript.echo "BEFORE: " & vbcrlf & vbcrlf & html

set r = new RegExp
r.pattern = "<script(?:.|\n)*?</script>"
r.ignorecase = true
r.multiline = true
r.global = true
set matches = r.execute(html)

r.pattern = "openwindow"

for each m in matches
  if r.test(m) then html=replace(html,m,"")
next

wscript.echo "AFTER: " & vbcrlf & vbcrlf & html
0
 
LVL 2

Author Comment

by:Biffo
ID: 7125130
Yah, clockwatcher, that seems to be my solution there. I was getting ready to split up the html and place scripts all on their own one line and then just match to end of line to keep from matching too much.

Job well done and here is your points......
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
You can of course define an array to hold data that is of a particular type like an array of Strings to hold customer names or an array of Doubles to hold customer sales, but what do you do if you want to coordinate that data? This article describes…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

740 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question