Solved

How to make a nongreedy RegEX expression?

Posted on 2002-07-01
8
204 Views
Last Modified: 2010-05-02
I'm using regular expressions 5.5 and trying to javascripts that match a particular pattern. No problem doing that......it's just this VBScripting is so friggin greedy and when a match is found ALL scripts are removed.

What I need is some way to stop at the first </script> tag! Here is code I put together to illustrate:

-------------code-------------------
' Remove line breakes to create one line instead
' of multi lines

regEX.Pattern = "\n|\s+|\t"
sHTML = regEX.Replace(sHTML, " ")

' define the pattern I am matching
regEX.Pattern = "<script.*?>.*?</script>"
Set Matches = regEX.Execute(sHTML)   ' Execute search.
 
  For Each Match In Matches
If InStr(1, Match, "openwindow", vbTextCompare) = 0 Then
  sHTML = regEX.Replace(sHTML, "")
End If
   Next

-----------cut--------------

Like I said, above works fine if it wasn't for the fact it is tough as nails getting VBScript to stop matching at the first </script> tag like you easily can in perl. Here is 300 points for a solution!



0
Comment
Question by:Biffo
  • 3
  • 3
  • 2
8 Comments
 
LVL 5

Expert Comment

by:rpai
ID: 7123223
Suppose 'sGivenString' is the string  you have and you wish to stop matching at the first <script> tag.

sGivenString = "XXX YYY AAA <script> AAA BBB CCC DDD </script>"
s = Mid(sGivenString, 1, InStr(1, sGivenString, "<script>", vbTextCompare) - 1)
s = Replace(s, "AAA", "ZZZ", 1, , vbTextCompare)
Debug.Print s

Is this something that you are looking for?
0
 
LVL 5

Expert Comment

by:rpai
ID: 7123224
The above code would only replace the 'AAA' before the <script> tag with 'ZZZ'and not the one that exists within the script tag.
0
 
LVL 2

Author Comment

by:Biffo
ID: 7123249
I want to replace the entire script...everything in between <script>...</script> including the tags and globally in case there is more than one script with that matches my search pattern.



0
Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 
LVL 5

Expert Comment

by:rpai
ID: 7123304
So maybe something like might help:-

sGivenString = "XXX YYY AAA <script> AAA BBB CCC DDD </script>"
s = Mid(sGivenString, InStr(1, sGivenString, "<script>", vbTextCompare) - 1, InStr(1, sGivenString, "</script>", vbTextCompare) - 1 )
s = Replace(s, s, "", 1, , vbTextCompare)
Debug.Print s
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 7123827
For Each Match In Matches
If InStr(1, Match, "openwindow", vbTextCompare) = 0 Then
 sHTML = replace(sHTML,match,"")
End If
0
 
LVL 2

Author Comment

by:Biffo
ID: 7124328
All this isn't really helping me with my primary regex matching problems! And that problem is stop matching at the first </script> tag and not the last one encountered in the doc.

so I want change this:

---------------------------------------------
<head>
<title>Sample</title>
<script language="JavaScript1.1" type="text/javascript">
<!--

function whatever... {
//-->
</script>
</head>
<body>

<SCRIPT Language="Javascript">
more script whatever
</SCRIPT>

<script language="JavaScript1.1">
<!--
openwindow.....stuff here
//-->
</script>
<script language="JavaScript1.1" src="some menu perhaps"></script>
</body>
</html>
------------------------------------------------------------

To look like this:

-----------------------------------------------------
<head>
<title>Sample</title>
<script language="JavaScript1.1" type="text/javascript">
<!--

function whatever... {
//-->
</script>
</head>
<body>

<SCRIPT Language="Javascript">
more script whatever
</SCRIPT>

<script language="JavaScript1.1" src="some menu perhaps"></script>
</body>
</html>
---------------------------------------------------

And not like this which is the way current regex is doing it:

--------------------------------------------------
<head>
<title>Sample</title>
</head>
<body>


</body>
</html>
---------------------------------------------------


0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 300 total points
ID: 7125076
I wouldn't really call that a greedy match.  That's global matching.  If that's what you're getting then you've got global set on your regular expression.  Typically, the following is considered a greedy match.
 
  this <script>is</script> an example of a <script>greedy </script> match

leaving you with:

  this  match

Since you aren't trying to do a global replace (and you're not just trying to replace the first match -- which would be easy), you'll need something like the following -- which is very similar to what we've already posted.  

html = "<head>" & vbcrlf & _
    "<title>Sample</title>" & vbcrlf & _
    "<script language=""JavaScript1.1"" type=""text/javascript"">" & vbcrlf & _
    "<!--" & vbcrlf & _
    "function whatever... { " & vbcrlf & _
    "//-->" & vbcrlf & _
    "</script>" & vbcrlf & _
    "</head>" & vbcrlf & _
    "<body>" & vbcrlf & _
    "<SCRIPT Language=""Javascript"">" & vbcrlf & _
    "more script whatever" & vbcrlf & _
    "</SCRIPT>" & vbcrlf & _
    "<script language=""JavaScript1.1"">" & vbcrlf & _
    "<!--" & vbcrlf & _
    "openwindow.....stuff here" & vbcrlf & _
    "//-->" & vbcrlf & _
    "</script>" & vbcrlf & _
    "<script language=""JavaScript1.1"" src=""some menu perhaps""></script>" & vbcrlf & _
    "</body>" & vbcrlf & _
    "</html>" & vbcrlf

wscript.echo "BEFORE: " & vbcrlf & vbcrlf & html

set r = new RegExp
r.pattern = "<script(?:.|\n)*?</script>"
r.ignorecase = true
r.multiline = true
r.global = true
set matches = r.execute(html)

r.pattern = "openwindow"

for each m in matches
  if r.test(m) then html=replace(html,m,"")
next

wscript.echo "AFTER: " & vbcrlf & vbcrlf & html
0
 
LVL 2

Author Comment

by:Biffo
ID: 7125130
Yah, clockwatcher, that seems to be my solution there. I was getting ready to split up the html and place scripts all on their own one line and then just match to end of line to keep from matching too much.

Job well done and here is your points......
0

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction I needed to skip over some file processing within a For...Next loop in some old production code and wished that VB (classic) had a statement that would drop down to the end of the current iteration, bypassing the statements that were c…
Introduction While answering a recent question about filtering a custom class collection, I realized that this could be accomplished with very little code by using the ScriptControl (SC) library.  This article will introduce you to the SC library a…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question