Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

How to make a nongreedy RegEX expression?

Posted on 2002-07-01
8
Medium Priority
?
211 Views
Last Modified: 2010-05-02
I'm using regular expressions 5.5 and trying to javascripts that match a particular pattern. No problem doing that......it's just this VBScripting is so friggin greedy and when a match is found ALL scripts are removed.

What I need is some way to stop at the first </script> tag! Here is code I put together to illustrate:

-------------code-------------------
' Remove line breakes to create one line instead
' of multi lines

regEX.Pattern = "\n|\s+|\t"
sHTML = regEX.Replace(sHTML, " ")

' define the pattern I am matching
regEX.Pattern = "<script.*?>.*?</script>"
Set Matches = regEX.Execute(sHTML)   ' Execute search.
 
  For Each Match In Matches
If InStr(1, Match, "openwindow", vbTextCompare) = 0 Then
  sHTML = regEX.Replace(sHTML, "")
End If
   Next

-----------cut--------------

Like I said, above works fine if it wasn't for the fact it is tough as nails getting VBScript to stop matching at the first </script> tag like you easily can in perl. Here is 300 points for a solution!



0
Comment
Question by:Biffo
  • 3
  • 3
  • 2
8 Comments
 
LVL 5

Expert Comment

by:rpai
ID: 7123223
Suppose 'sGivenString' is the string  you have and you wish to stop matching at the first <script> tag.

sGivenString = "XXX YYY AAA <script> AAA BBB CCC DDD </script>"
s = Mid(sGivenString, 1, InStr(1, sGivenString, "<script>", vbTextCompare) - 1)
s = Replace(s, "AAA", "ZZZ", 1, , vbTextCompare)
Debug.Print s

Is this something that you are looking for?
0
 
LVL 5

Expert Comment

by:rpai
ID: 7123224
The above code would only replace the 'AAA' before the <script> tag with 'ZZZ'and not the one that exists within the script tag.
0
 
LVL 2

Author Comment

by:Biffo
ID: 7123249
I want to replace the entire script...everything in between <script>...</script> including the tags and globally in case there is more than one script with that matches my search pattern.



0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 5

Expert Comment

by:rpai
ID: 7123304
So maybe something like might help:-

sGivenString = "XXX YYY AAA <script> AAA BBB CCC DDD </script>"
s = Mid(sGivenString, InStr(1, sGivenString, "<script>", vbTextCompare) - 1, InStr(1, sGivenString, "</script>", vbTextCompare) - 1 )
s = Replace(s, s, "", 1, , vbTextCompare)
Debug.Print s
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 7123827
For Each Match In Matches
If InStr(1, Match, "openwindow", vbTextCompare) = 0 Then
 sHTML = replace(sHTML,match,"")
End If
0
 
LVL 2

Author Comment

by:Biffo
ID: 7124328
All this isn't really helping me with my primary regex matching problems! And that problem is stop matching at the first </script> tag and not the last one encountered in the doc.

so I want change this:

---------------------------------------------
<head>
<title>Sample</title>
<script language="JavaScript1.1" type="text/javascript">
<!--

function whatever... {
//-->
</script>
</head>
<body>

<SCRIPT Language="Javascript">
more script whatever
</SCRIPT>

<script language="JavaScript1.1">
<!--
openwindow.....stuff here
//-->
</script>
<script language="JavaScript1.1" src="some menu perhaps"></script>
</body>
</html>
------------------------------------------------------------

To look like this:

-----------------------------------------------------
<head>
<title>Sample</title>
<script language="JavaScript1.1" type="text/javascript">
<!--

function whatever... {
//-->
</script>
</head>
<body>

<SCRIPT Language="Javascript">
more script whatever
</SCRIPT>

<script language="JavaScript1.1" src="some menu perhaps"></script>
</body>
</html>
---------------------------------------------------

And not like this which is the way current regex is doing it:

--------------------------------------------------
<head>
<title>Sample</title>
</head>
<body>


</body>
</html>
---------------------------------------------------


0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 1200 total points
ID: 7125076
I wouldn't really call that a greedy match.  That's global matching.  If that's what you're getting then you've got global set on your regular expression.  Typically, the following is considered a greedy match.
 
  this <script>is</script> an example of a <script>greedy </script> match

leaving you with:

  this  match

Since you aren't trying to do a global replace (and you're not just trying to replace the first match -- which would be easy), you'll need something like the following -- which is very similar to what we've already posted.  

html = "<head>" & vbcrlf & _
    "<title>Sample</title>" & vbcrlf & _
    "<script language=""JavaScript1.1"" type=""text/javascript"">" & vbcrlf & _
    "<!--" & vbcrlf & _
    "function whatever... { " & vbcrlf & _
    "//-->" & vbcrlf & _
    "</script>" & vbcrlf & _
    "</head>" & vbcrlf & _
    "<body>" & vbcrlf & _
    "<SCRIPT Language=""Javascript"">" & vbcrlf & _
    "more script whatever" & vbcrlf & _
    "</SCRIPT>" & vbcrlf & _
    "<script language=""JavaScript1.1"">" & vbcrlf & _
    "<!--" & vbcrlf & _
    "openwindow.....stuff here" & vbcrlf & _
    "//-->" & vbcrlf & _
    "</script>" & vbcrlf & _
    "<script language=""JavaScript1.1"" src=""some menu perhaps""></script>" & vbcrlf & _
    "</body>" & vbcrlf & _
    "</html>" & vbcrlf

wscript.echo "BEFORE: " & vbcrlf & vbcrlf & html

set r = new RegExp
r.pattern = "<script(?:.|\n)*?</script>"
r.ignorecase = true
r.multiline = true
r.global = true
set matches = r.execute(html)

r.pattern = "openwindow"

for each m in matches
  if r.test(m) then html=replace(html,m,"")
next

wscript.echo "AFTER: " & vbcrlf & vbcrlf & html
0
 
LVL 2

Author Comment

by:Biffo
ID: 7125130
Yah, clockwatcher, that seems to be my solution there. I was getting ready to split up the html and place scripts all on their own one line and then just match to end of line to keep from matching too much.

Job well done and here is your points......
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
Background What I'm presenting in this article is the result of 2 conditions in my work area: We have a SQL Server production environment but no development or test environment; andWe have an MS Access front end using tables in SQL Server but we a…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

927 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question