Solved

How to make a nongreedy RegEX expression?

Posted on 2002-07-01
8
208 Views
Last Modified: 2010-05-02
I'm using regular expressions 5.5 and trying to javascripts that match a particular pattern. No problem doing that......it's just this VBScripting is so friggin greedy and when a match is found ALL scripts are removed.

What I need is some way to stop at the first </script> tag! Here is code I put together to illustrate:

-------------code-------------------
' Remove line breakes to create one line instead
' of multi lines

regEX.Pattern = "\n|\s+|\t"
sHTML = regEX.Replace(sHTML, " ")

' define the pattern I am matching
regEX.Pattern = "<script.*?>.*?</script>"
Set Matches = regEX.Execute(sHTML)   ' Execute search.
 
  For Each Match In Matches
If InStr(1, Match, "openwindow", vbTextCompare) = 0 Then
  sHTML = regEX.Replace(sHTML, "")
End If
   Next

-----------cut--------------

Like I said, above works fine if it wasn't for the fact it is tough as nails getting VBScript to stop matching at the first </script> tag like you easily can in perl. Here is 300 points for a solution!



0
Comment
Question by:Biffo
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
8 Comments
 
LVL 5

Expert Comment

by:rpai
ID: 7123223
Suppose 'sGivenString' is the string  you have and you wish to stop matching at the first <script> tag.

sGivenString = "XXX YYY AAA <script> AAA BBB CCC DDD </script>"
s = Mid(sGivenString, 1, InStr(1, sGivenString, "<script>", vbTextCompare) - 1)
s = Replace(s, "AAA", "ZZZ", 1, , vbTextCompare)
Debug.Print s

Is this something that you are looking for?
0
 
LVL 5

Expert Comment

by:rpai
ID: 7123224
The above code would only replace the 'AAA' before the <script> tag with 'ZZZ'and not the one that exists within the script tag.
0
 
LVL 2

Author Comment

by:Biffo
ID: 7123249
I want to replace the entire script...everything in between <script>...</script> including the tags and globally in case there is more than one script with that matches my search pattern.



0
[Live Webinar] The Cloud Skills Gap

As Cloud technologies come of age, business leaders grapple with the impact it has on their team's skills and the gap associated with the use of a cloud platform.

Join experts from 451 Research and Concerto Cloud Services on July 27th where we will examine fact and fiction.

 
LVL 5

Expert Comment

by:rpai
ID: 7123304
So maybe something like might help:-

sGivenString = "XXX YYY AAA <script> AAA BBB CCC DDD </script>"
s = Mid(sGivenString, InStr(1, sGivenString, "<script>", vbTextCompare) - 1, InStr(1, sGivenString, "</script>", vbTextCompare) - 1 )
s = Replace(s, s, "", 1, , vbTextCompare)
Debug.Print s
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 7123827
For Each Match In Matches
If InStr(1, Match, "openwindow", vbTextCompare) = 0 Then
 sHTML = replace(sHTML,match,"")
End If
0
 
LVL 2

Author Comment

by:Biffo
ID: 7124328
All this isn't really helping me with my primary regex matching problems! And that problem is stop matching at the first </script> tag and not the last one encountered in the doc.

so I want change this:

---------------------------------------------
<head>
<title>Sample</title>
<script language="JavaScript1.1" type="text/javascript">
<!--

function whatever... {
//-->
</script>
</head>
<body>

<SCRIPT Language="Javascript">
more script whatever
</SCRIPT>

<script language="JavaScript1.1">
<!--
openwindow.....stuff here
//-->
</script>
<script language="JavaScript1.1" src="some menu perhaps"></script>
</body>
</html>
------------------------------------------------------------

To look like this:

-----------------------------------------------------
<head>
<title>Sample</title>
<script language="JavaScript1.1" type="text/javascript">
<!--

function whatever... {
//-->
</script>
</head>
<body>

<SCRIPT Language="Javascript">
more script whatever
</SCRIPT>

<script language="JavaScript1.1" src="some menu perhaps"></script>
</body>
</html>
---------------------------------------------------

And not like this which is the way current regex is doing it:

--------------------------------------------------
<head>
<title>Sample</title>
</head>
<body>


</body>
</html>
---------------------------------------------------


0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 300 total points
ID: 7125076
I wouldn't really call that a greedy match.  That's global matching.  If that's what you're getting then you've got global set on your regular expression.  Typically, the following is considered a greedy match.
 
  this <script>is</script> an example of a <script>greedy </script> match

leaving you with:

  this  match

Since you aren't trying to do a global replace (and you're not just trying to replace the first match -- which would be easy), you'll need something like the following -- which is very similar to what we've already posted.  

html = "<head>" & vbcrlf & _
    "<title>Sample</title>" & vbcrlf & _
    "<script language=""JavaScript1.1"" type=""text/javascript"">" & vbcrlf & _
    "<!--" & vbcrlf & _
    "function whatever... { " & vbcrlf & _
    "//-->" & vbcrlf & _
    "</script>" & vbcrlf & _
    "</head>" & vbcrlf & _
    "<body>" & vbcrlf & _
    "<SCRIPT Language=""Javascript"">" & vbcrlf & _
    "more script whatever" & vbcrlf & _
    "</SCRIPT>" & vbcrlf & _
    "<script language=""JavaScript1.1"">" & vbcrlf & _
    "<!--" & vbcrlf & _
    "openwindow.....stuff here" & vbcrlf & _
    "//-->" & vbcrlf & _
    "</script>" & vbcrlf & _
    "<script language=""JavaScript1.1"" src=""some menu perhaps""></script>" & vbcrlf & _
    "</body>" & vbcrlf & _
    "</html>" & vbcrlf

wscript.echo "BEFORE: " & vbcrlf & vbcrlf & html

set r = new RegExp
r.pattern = "<script(?:.|\n)*?</script>"
r.ignorecase = true
r.multiline = true
r.global = true
set matches = r.execute(html)

r.pattern = "openwindow"

for each m in matches
  if r.test(m) then html=replace(html,m,"")
next

wscript.echo "AFTER: " & vbcrlf & vbcrlf & html
0
 
LVL 2

Author Comment

by:Biffo
ID: 7125130
Yah, clockwatcher, that seems to be my solution there. I was getting ready to split up the html and place scripts all on their own one line and then just match to end of line to keep from matching too much.

Job well done and here is your points......
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I’ve seen a number of people looking for examples of how to access web services from VB6.  I’ve been using a test harness I built in VB6 (using many resources I found online) that I use for small projects to work out how to communicate with web serv…
If you have ever used Microsoft Word then you know that it has a good spell checker and it may have occurred to you that the ability to check spelling might be a nice piece of functionality to add to certain applications of yours. Well the code that…
Get people started with the utilization of class modules. Class modules can be a powerful tool in Microsoft Access. They allow you to create self-contained objects that encapsulate functionality. They can easily hide the complexity of a process from…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…
Suggested Courses
Course of the Month7 days, 17 hours left to enroll

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question