?
Solved

Using REGEXP with VBScript to ignore HTML

Posted on 2003-03-26
23
Medium Priority
?
327 Views
Last Modified: 2012-08-13
Using Regular expressions I am trying to replace a pattern within a string but the replacement should have no effect on text between "<" and ">". The result should write a html uneffected by changes but text between the tags altered.
I am unformilary with the synax and use of the regexp object so your help would be appreciated.
0
Comment
Question by:kine
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 9
  • 8
  • 6
23 Comments
 
LVL 14

Expert Comment

by:avner
ID: 8209483
I don't think you can do that, the only way to do it is to get the data into  a string, then test char by char to verify you are not inside an HTML tag .
0
 

Author Comment

by:kine
ID: 8209568
I take your point but I think that I am getting close to my objective by using
dim objregx
set objregx = New RegExp
objregx.Pattern =  ">\w\b"& wordtochange&"\b\w<"
objregx.Global = True
back = objregx.Replace(stringtobechanged,replacement string)

Set objregx = nothing

any ideas
0
 
LVL 14

Accepted Solution

by:
avner earned 375 total points
ID: 8209652
I loooked into it a litle more and came up with this :


<html>
<head>
<title>about:blank</title>
<script language="javascript1.2">
<!-- copyright(c) avcoh@yahoo.com
function changeTextOnly(obj, sFrom, sTo)
{
var sText = obj.innerText;

var iRegExp1 = new RegExp("(>[^<]*("+sFrom+")[^<]*<)","g");

          while(iRegExp1.test(sText))
               {
                    sText= sText.split(RegExp.$1).join(RegExp.$1.replace(sFrom,sTo));
               }

obj.innerText=sText;
}
-->
</script>
<style>

</style>
</head>
<body>
<button onclick="changeTextOnly(document.getElementById('aa'), '654', '___')">replace 654</button>
<textarea id="aa" cols="50" rows="20"><html><test attr="654">111-654-222</test><test attr="654">3-654-4</test></html></textarea>
</body>
</html>
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 14

Expert Comment

by:avner
ID: 8209676
instead of join and split, you can also use :

function changeTextOnly(obj, sFrom, sTo)
{
var sText = obj.innerText;

var iRegExp1 = new RegExp("(>[^<]*("+sFrom+")[^<]*<)","g");

          while(iRegExp1.test(sText))
               {
                    sText= sText.replace(RegExp.$1,RegExp.$1.replace(sFrom,sTo));
               }

obj.innerText=sText;
}
0
 

Author Comment

by:kine
ID: 8210169
I take your point but I think that I am getting close to my objective by using
dim objregx
set objregx = New RegExp
objregx.Pattern =  ">\w\b"& wordtochange&"\b\w<"
objregx.Global = True
back = objregx.Replace(stringtobechanged,replacement string)

Set objregx = nothing

any ideas
0
 
LVL 14

Expert Comment

by:avner
ID: 8210193
Click "reload this question" to avoid comment duplications.
0
 

Author Comment

by:kine
ID: 8210200
Thanks, that looks good. I will give it a spin and let you know how its works
0
 
LVL 7

Expert Comment

by:markhoy
ID: 8210606
0
 
LVL 7

Expert Comment

by:markhoy
ID: 8226536
0
 

Author Comment

by:kine
ID: 8237487
Thanks for those links. It does seem pretty easy to remove the html tags but its not so simple when you just want to ignore them and later specific text between them. I did try and convert the great script suggested by avner from client side javascript to server side VBScript. Unfortunately I could not get it work as well as it does in the above script.
0
 
LVL 14

Expert Comment

by:avner
ID: 8237498
kine , Please you post you vbscript,  I'll try to look at it.
0
 
LVL 7

Expert Comment

by:markhoy
ID: 8238046
A very convoluted way (using the replace html tags method) would be to replace all the HTML blocks with markers e.g #1#, and then to use your regexp on the remaining text, and then to replace the markers with the original HTML tags...
0
 

Author Comment

by:kine
ID: 8238507
Yeah, markhov a friend of mine says the same.  My biggest problem (amongst many) if the variable number of spaces before and after the piece of text that I wish to alter. this is the pattern
objregx.Pattern =  ">[^>]*\b[ ]{1}"&  wordtoreplace &"\b[ ]{1}[^<]*<"
regular expression look as if the will be simply to understand, if that was only the case.
0
 

Author Comment

by:kine
ID: 8238645
I see one of the problems that I'm having, the code is  replacing everything between ">" and "<" when it sees the first error.  So that the other mispellings, and correct words are being overwriten
0
 

Author Comment

by:kine
ID: 8238688
Here is the whole ugly thing

<html>
<head>
<title>spell checker</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

</head>
<!--#include file="dictionary/diction1.inc" -->
'all the words in the world ordered by they initial letters
<%
'

test= request.form("html")
back = test
'back=replace(back,">","> ")
'back=replace(back,"<"," <")
back=replace(back,"&nbsp;"," &nbsp; ")
  Dim objRegE
  Set objRegE = New RegExp
  objRegE.Pattern = "[0-9]?"
  objRegE.IgnoreCase = True
 objRegE.Global = true    
 test= objRegE.Replace(test, "")
  Set objRegE = Nothing


' Clears the string of "<>" and contents
Function clearTags(str)    
    Dim re    
    Set re = New RegExp    
    re.Pattern = "<[^>]+>"
    re.Global = true
    clearTags = re.Replace(str," ")
End Function

test = clearTags(test)

test=replace(test,"&nbsp;"," ")    
test=replace(test,Chr(9),"")
test=replace(test,Chr(10),"")
test=replace(test,Chr(11),"")
test=replace(test,Chr(12),"")
test=replace(test,Chr(13),"")
test=replace(test,"-"," ")
test=replace(test,".","")
test=replace(test,",","")
test=replace(test,"?","")
test=replace(test,"!","")
test=replace(test,";","")
test=replace(test,":","")
test=replace(test,"\","")
test=replace(test,"|","")
test=replace(test,"("," ")
test=replace(test,")"," ")
test=replace(test,"[","")
test=replace(test,"]","")
test=replace(test,"{","")
test=replace(test,Chr(34),"")
test=replace(test,"&lt;","")
'test=replace(test,">","")
'test=replace(test,"<","")
test=replace(test,"=","")
test=replace(test,">>","")
test=replace(test,"+","")
test=replace(test,"_","")
test=replace(test,"'s","")
'test=replace(test,Chr(32),"")


test=replace(test,"    "," ")
'replaces some other charactors
test=replace(test,"   "," ")
test=replace(test,"  "," ")

'creates an array from the string          

test2=split(test," ")
i=0

'Go through the contents of the array one by one then triming spaces and getting the initial letter and first two digits,
'these to be used to find the dictionary variables
do while i < ubound(test2)
lal=test2(i)
lal=trim(lal)
lala=lcase(left(lal,2))
initials=lcase(left(lal,1))
'response.write ubound(test2)-i &" ; lal="& lal &" test3 ="
'use the array.inc to find which varible in the diction.inc to check
%>
<!--#include file="dictionary/array.inc" -->

<%
' run through the dictionary variable and see if the current array item has a match
  Dim objRegExp
  Set objRegExp = New RegExp
  objRegExp.Pattern = "\b"& lal &"\b"
  objRegExp.IgnoreCase = True  
  objRegExp.Global = false
  Dim strStringToSearch
  strStringToSearch = dict  
  loo=objRegExp.Test(strStringToSearch)
  Set objRegExp = Nothing    


  Dim objRegEx
  Set objRegEx = New RegExp
  objRegEx.Pattern = "\b"& lal &"\b"
  objRegEx.IgnoreCase = True
  objRegEx.Global = false    
  zoo=objRegEx.Test(passed)
  Set objRegEx = Nothing
  if zoo = false then
'if InStr(passed,lal)=0 then

'boo=inStr (dict,test2(i))
if loo=false then

test3="<a href='#'>"& lal &"</a>"
'response.write test3 & lal &"<br>"



dim objregx
set objregx = New RegExp
'>[^<] *("+sFrom+")[^<]  *<)","[^<(.+?)>]>
'\b"&  lal &"\b
objregx.Pattern =  ">[^<>]*\b "&  lal &" \b[^<>]*<"
'objregx.Pattern =  ">[^>]*[ ]{1,5}\b"&  lal &"\b[ ]{1,5}[^<]*<"
objregx.Global = True
back = objregx.Replace(back,test3)

Set objregx = nothing




end if
end if
i=i+1
passed =passed & lal & " " 
'response.write loo & lal &"<br>"

response.write  test3 &" loo = "& loo&"<br>"
loop

response.write back
'response.write now
'response.write "<br>"& passed
%>


</body>
</html>
0
 
LVL 14

Expert Comment

by:avner
ID: 8243287
kine , why can't use the JavaScript code ?

I'm unable to test your code since I am not running IIS.
0
 

Author Comment

by:kine
ID: 8244054
Learning experience I'm afraid. I have set myself a series of tasks to be completed with specific methods.  I'm getting closer anyway. I will post the results up here when its done.
0
 
LVL 14

Expert Comment

by:avner
ID: 8244129
Let us know if you need additional specific help with RegExp.
0
 
LVL 7

Expert Comment

by:markhoy
ID: 8244454
Try this:

Dim myRegExp
Set myRegExp = New RegExp
myRegExp.Pattern = "<[^\>]*>"
myRegExp.Global = True
myRegExp.IgnoreCase = True
0
 
LVL 7

Expert Comment

by:markhoy
ID: 8244463
Finding text that is NOT in HTML tags:

First the function:
--------------
Function RegExpReplace(strInput, strPattern, strReplace)
    ' Use <?> to indicate the match you wish to replace

    ' Create and setup several variables:
    Dim regEx, Match, Matches, Position, strReturn
    Position = 1
    strReturn = "" 

    ' Set up the regular expression:
    Set regEx = New RegExp
    regEx.Pattern = strPattern
    regEx.IgnoreCase = True
    regEx.Global = True

    ' Get all the matches for it:
    Set Matches = regEx.Execute(strInput)

    ' Go through the Matches collection
    ' and build the output string:
    For Each Match in Matches
    strReturn = strReturn & Mid(strInput, Position, Match.FirstIndex+1-Position)
    strReturn = strReturn & Replace(strReplace, "<?>", Match.Value)
    Position = Len(Match.Value) + Match.FirstIndex + 1
    Next

    ' Add any text after the last match
    strReturn = strReturn & Mid(strInput, Position, Len(strInput))

    RegExpReplace = strReturn
End Function
--------------
This was grabbed right from the previous post which came from this article: http://www.aspfaqs.com/aspfaqs/ShowFAQ.asp?FAQID=66 

But the example in the previous article didn't quite work for me...
--------------
strHTML = RegExpReplace(strHTML, "strong(?![^<]+>), "*<?>*")
--------------
...it kept replacing text in html tags as well as text not in html tags. So, I banged around with it and came up with this modification:
--------------
strHTML = RegExpReplace(strHTML, "(?![^<]+>)" + strSearch + "(?![^<]+>)", strReplace)
--------------
strSearch is the text that I'm looking for and strReplace is what I want to replace it with. You might be able to modify this for your needs?



0
 
LVL 7

Expert Comment

by:markhoy
ID: 8245895
0
 
LVL 14

Expert Comment

by:avner
ID: 8426372
kine, do you need any further help with this question or can it be closed ?
0
 

Author Comment

by:kine
ID: 8434558
Yeah, its done and dusted, your code showed the way. Cheers
0

Featured Post

Get real performance insights from real users

Key features:
- Total Pages Views and Load times
- Top Pages Viewed and Load Times
- Real Time Site Page Build Performance
- Users’ Browser and Platform Performance
- Geographic User Breakdown
- And more

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What is Node.js? Node.js is a server side scripting language much like PHP or ASP but is used to implement the complete package of HTTP webserver and application framework. The difference is that Node.js’s execution engine is asynchronous and event…
JavaScript has plenty of pieces of code people often just copy/paste from somewhere but never quite fully understand. Self-Executing functions are just one good example that I'll try to demystify here.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question