kine
asked on
Using REGEXP with VBScript to ignore HTML
Using Regular expressions I am trying to replace a pattern within a string but the replacement should have no effect on text between "<" and ">". The result should write a html uneffected by changes but text between the tags altered.
I am unformilary with the synax and use of the regexp object so your help would be appreciated.
I am unformilary with the synax and use of the regexp object so your help would be appreciated.
I don't think you can do that, the only way to do it is to get the data into a string, then test char by char to verify you are not inside an HTML tag .
ASKER
I take your point but I think that I am getting close to my objective by using
dim objregx
set objregx = New RegExp
objregx.Pattern = ">\w\b"& wordtochange&"\b\w<"
objregx.Global = True
back = objregx.Replace(stringtobe changed,re placement string)
Set objregx = nothing
any ideas
dim objregx
set objregx = New RegExp
objregx.Pattern = ">\w\b"& wordtochange&"\b\w<"
objregx.Global = True
back = objregx.Replace(stringtobe
Set objregx = nothing
any ideas
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
instead of join and split, you can also use :
function changeTextOnly(obj, sFrom, sTo)
{
var sText = obj.innerText;
var iRegExp1 = new RegExp("(>[^<]*("+sFrom+") [^<]*<)"," g");
while(iRegExp1.test(sText) )
{
sText= sText.replace(RegExp.$1,Re gExp.$1.re place(sFro m,sTo));
}
obj.innerText=sText;
}
function changeTextOnly(obj, sFrom, sTo)
{
var sText = obj.innerText;
var iRegExp1 = new RegExp("(>[^<]*("+sFrom+")
while(iRegExp1.test(sText)
{
sText= sText.replace(RegExp.$1,Re
}
obj.innerText=sText;
}
ASKER
I take your point but I think that I am getting close to my objective by using
dim objregx
set objregx = New RegExp
objregx.Pattern = ">\w\b"& wordtochange&"\b\w<"
objregx.Global = True
back = objregx.Replace(stringtobe changed,re placement string)
Set objregx = nothing
any ideas
dim objregx
set objregx = New RegExp
objregx.Pattern = ">\w\b"& wordtochange&"\b\w<"
objregx.Global = True
back = objregx.Replace(stringtobe
Set objregx = nothing
any ideas
Click "reload this question" to avoid comment duplications.
ASKER
Thanks, that looks good. I will give it a spin and let you know how its works
Have a look at these if you haven't already:
http://www.aspfaqs.com/aspfaqs/ShowFAQ.asp?FAQID=155
http://www.aspfaqs.com/aspfaqs/ShowFAQ.asp?FAQID=99
http://www.aspfaqs.com/aspfaqs/ShowCategory.asp?CatID=16
http://www.aspfaqs.com/aspfaqs/ShowFAQ.asp?FAQID=155
http://www.aspfaqs.com/aspfaqs/ShowFAQ.asp?FAQID=99
http://www.aspfaqs.com/aspfaqs/ShowCategory.asp?CatID=16
ASKER
Thanks for those links. It does seem pretty easy to remove the html tags but its not so simple when you just want to ignore them and later specific text between them. I did try and convert the great script suggested by avner from client side javascript to server side VBScript. Unfortunately I could not get it work as well as it does in the above script.
kine , Please you post you vbscript, I'll try to look at it.
A very convoluted way (using the replace html tags method) would be to replace all the HTML blocks with markers e.g #1#, and then to use your regexp on the remaining text, and then to replace the markers with the original HTML tags...
ASKER
Yeah, markhov a friend of mine says the same. My biggest problem (amongst many) if the variable number of spaces before and after the piece of text that I wish to alter. this is the pattern
objregx.Pattern = ">[^>]*\b[ ]{1}"& wordtoreplace &"\b[ ]{1}[^<]*<"
regular expression look as if the will be simply to understand, if that was only the case.
objregx.Pattern = ">[^>]*\b[ ]{1}"& wordtoreplace &"\b[ ]{1}[^<]*<"
regular expression look as if the will be simply to understand, if that was only the case.
ASKER
I see one of the problems that I'm having, the code is replacing everything between ">" and "<" when it sees the first error. So that the other mispellings, and correct words are being overwriten
ASKER
Here is the whole ugly thing
<html>
<head>
<title>spell checker</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<!--#include file="dictionary/diction1. inc" -->
'all the words in the world ordered by they initial letters
<%
'
test= request.form("html")
back = test
'back=replace(back,">","> ")
'back=replace(back,"<"," <")
back=replace(back," " ," ")
Dim objRegE
Set objRegE = New RegExp
objRegE.Pattern = "[0-9]?"
objRegE.IgnoreCase = True
objRegE.Global = true
test= objRegE.Replace(test, "")
Set objRegE = Nothing
' Clears the string of "<>" and contents
Function clearTags(str)
Dim re
Set re = New RegExp
re.Pattern = "<[^>]+>"
re.Global = true
clearTags = re.Replace(str," ")
End Function
test = clearTags(test)
test=replace(test," " ," ")
test=replace(test,Chr(9)," ")
test=replace(test,Chr(10), "")
test=replace(test,Chr(11), "")
test=replace(test,Chr(12), "")
test=replace(test,Chr(13), "")
test=replace(test,"-"," ")
test=replace(test,".","")
test=replace(test,",","")
test=replace(test,"?","")
test=replace(test,"!","")
test=replace(test,";","")
test=replace(test,":","")
test=replace(test,"\","")
test=replace(test,"|","")
test=replace(test,"("," ")
test=replace(test,")"," ")
test=replace(test,"[","")
test=replace(test,"]","")
test=replace(test,"{","")
test=replace(test,Chr(34), "")
test=replace(test,"<"," ")
'test=replace(test,">","")
'test=replace(test,"<","")
test=replace(test,"=","")
test=replace(test,">>","")
test=replace(test,"+","")
test=replace(test,"_","")
test=replace(test,"'s","")
'test=replace(test,Chr(32) ,"")
test=replace(test," "," ")
'replaces some other charactors
test=replace(test," "," ")
test=replace(test," "," ")
'creates an array from the string
test2=split(test," ")
i=0
'Go through the contents of the array one by one then triming spaces and getting the initial letter and first two digits,
'these to be used to find the dictionary variables
do while i < ubound(test2)
lal=test2(i)
lal=trim(lal)
lala=lcase(left(lal,2))
initials=lcase(left(lal,1) )
'response.write ubound(test2)-i &" ; lal="& lal &" test3 ="
'use the array.inc to find which varible in the diction.inc to check
%>
<!--#include file="dictionary/array.inc " -->
<%
' run through the dictionary variable and see if the current array item has a match
Dim objRegExp
Set objRegExp = New RegExp
objRegExp.Pattern = "\b"& lal &"\b"
objRegExp.IgnoreCase = True
objRegExp.Global = false
Dim strStringToSearch
strStringToSearch = dict
loo=objRegExp.Test(strStri ngToSearch )
Set objRegExp = Nothing
Dim objRegEx
Set objRegEx = New RegExp
objRegEx.Pattern = "\b"& lal &"\b"
objRegEx.IgnoreCase = True
objRegEx.Global = false
zoo=objRegEx.Test(passed)
Set objRegEx = Nothing
if zoo = false then
'if InStr(passed,lal)=0 then
'boo=inStr (dict,test2(i))
if loo=false then
test3="<a href='#'>"& lal &"</a>"
'response.write test3 & lal &"<br>"
dim objregx
set objregx = New RegExp
'>[^<] *("+sFrom+")[^<] *<)","[^<(.+?)>]>
'\b"& lal &"\b
objregx.Pattern = ">[^<>]*\b "& lal &" \b[^<>]*<"
'objregx.Pattern = ">[^>]*[ ]{1,5}\b"& lal &"\b[ ]{1,5}[^<]*<"
objregx.Global = True
back = objregx.Replace(back,test3 )
Set objregx = nothing
end if
end if
i=i+1
passed =passed & lal & " "
'response.write loo & lal &"<br>"
response.write test3 &" loo = "& loo&"<br>"
loop
response.write back
'response.write now
'response.write "<br>"& passed
%>
</body>
</html>
<html>
<head>
<title>spell checker</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<!--#include file="dictionary/diction1.
'all the words in the world ordered by they initial letters
<%
'
test= request.form("html")
back = test
'back=replace(back,">","> ")
'back=replace(back,"<"," <")
back=replace(back," "
Dim objRegE
Set objRegE = New RegExp
objRegE.Pattern = "[0-9]?"
objRegE.IgnoreCase = True
objRegE.Global = true
test= objRegE.Replace(test, "")
Set objRegE = Nothing
' Clears the string of "<>" and contents
Function clearTags(str)
Dim re
Set re = New RegExp
re.Pattern = "<[^>]+>"
re.Global = true
clearTags = re.Replace(str," ")
End Function
test = clearTags(test)
test=replace(test," "
test=replace(test,Chr(9),"
test=replace(test,Chr(10),
test=replace(test,Chr(11),
test=replace(test,Chr(12),
test=replace(test,Chr(13),
test=replace(test,"-"," ")
test=replace(test,".","")
test=replace(test,",","")
test=replace(test,"?","")
test=replace(test,"!","")
test=replace(test,";","")
test=replace(test,":","")
test=replace(test,"\","")
test=replace(test,"|","")
test=replace(test,"("," ")
test=replace(test,")"," ")
test=replace(test,"[","")
test=replace(test,"]","")
test=replace(test,"{","")
test=replace(test,Chr(34),
test=replace(test,"<","
'test=replace(test,">","")
'test=replace(test,"<","")
test=replace(test,"=","")
test=replace(test,">>","")
test=replace(test,"+","")
test=replace(test,"_","")
test=replace(test,"'s","")
'test=replace(test,Chr(32)
test=replace(test," "," ")
'replaces some other charactors
test=replace(test," "," ")
test=replace(test," "," ")
'creates an array from the string
test2=split(test," ")
i=0
'Go through the contents of the array one by one then triming spaces and getting the initial letter and first two digits,
'these to be used to find the dictionary variables
do while i < ubound(test2)
lal=test2(i)
lal=trim(lal)
lala=lcase(left(lal,2))
initials=lcase(left(lal,1)
'response.write ubound(test2)-i &" ; lal="& lal &" test3 ="
'use the array.inc to find which varible in the diction.inc to check
%>
<!--#include file="dictionary/array.inc
<%
' run through the dictionary variable and see if the current array item has a match
Dim objRegExp
Set objRegExp = New RegExp
objRegExp.Pattern = "\b"& lal &"\b"
objRegExp.IgnoreCase = True
objRegExp.Global = false
Dim strStringToSearch
strStringToSearch = dict
loo=objRegExp.Test(strStri
Set objRegExp = Nothing
Dim objRegEx
Set objRegEx = New RegExp
objRegEx.Pattern = "\b"& lal &"\b"
objRegEx.IgnoreCase = True
objRegEx.Global = false
zoo=objRegEx.Test(passed)
Set objRegEx = Nothing
if zoo = false then
'if InStr(passed,lal)=0 then
'boo=inStr (dict,test2(i))
if loo=false then
test3="<a href='#'>"& lal &"</a>"
'response.write test3 & lal &"<br>"
dim objregx
set objregx = New RegExp
'>[^<] *("+sFrom+")[^<] *<)","[^<(.+?)>]>
'\b"& lal &"\b
objregx.Pattern = ">[^<>]*\b "& lal &" \b[^<>]*<"
'objregx.Pattern = ">[^>]*[ ]{1,5}\b"& lal &"\b[ ]{1,5}[^<]*<"
objregx.Global = True
back = objregx.Replace(back,test3
Set objregx = nothing
end if
end if
i=i+1
passed =passed & lal & " "
'response.write loo & lal &"<br>"
response.write test3 &" loo = "& loo&"<br>"
loop
response.write back
'response.write now
'response.write "<br>"& passed
%>
</body>
</html>
kine , why can't use the JavaScript code ?
I'm unable to test your code since I am not running IIS.
I'm unable to test your code since I am not running IIS.
ASKER
Learning experience I'm afraid. I have set myself a series of tasks to be completed with specific methods. I'm getting closer anyway. I will post the results up here when its done.
Let us know if you need additional specific help with RegExp.
Try this:
Dim myRegExp
Set myRegExp = New RegExp
myRegExp.Pattern = "<[^\>]*>"
myRegExp.Global = True
myRegExp.IgnoreCase = True
Dim myRegExp
Set myRegExp = New RegExp
myRegExp.Pattern = "<[^\>]*>"
myRegExp.Global = True
myRegExp.IgnoreCase = True
Finding text that is NOT in HTML tags:
First the function:
--------------
Function RegExpReplace(strInput, strPattern, strReplace)
' Use <?> to indicate the match you wish to replace
' Create and setup several variables:
Dim regEx, Match, Matches, Position, strReturn
Position = 1
strReturn = ""
' Set up the regular expression:
Set regEx = New RegExp
regEx.Pattern = strPattern
regEx.IgnoreCase = True
regEx.Global = True
' Get all the matches for it:
Set Matches = regEx.Execute(strInput)
' Go through the Matches collection
' and build the output string:
For Each Match in Matches
strReturn = strReturn & Mid(strInput, Position, Match.FirstIndex+1-Positio n)
strReturn = strReturn & Replace(strReplace, "<?>", Match.Value)
Position = Len(Match.Value) + Match.FirstIndex + 1
Next
' Add any text after the last match
strReturn = strReturn & Mid(strInput, Position, Len(strInput))
RegExpReplace = strReturn
End Function
--------------
This was grabbed right from the previous post which came from this article: http://www.aspfaqs.com/aspfaqs/ShowFAQ.asp?FAQID=66
But the example in the previous article didn't quite work for me...
--------------
strHTML = RegExpReplace(strHTML, "strong(?![^<]+>), "*<?>*")
--------------
...it kept replacing text in html tags as well as text not in html tags. So, I banged around with it and came up with this modification:
--------------
strHTML = RegExpReplace(strHTML, "(?![^<]+>)" + strSearch + "(?![^<]+>)", strReplace)
--------------
strSearch is the text that I'm looking for and strReplace is what I want to replace it with. You might be able to modify this for your needs?
First the function:
--------------
Function RegExpReplace(strInput, strPattern, strReplace)
' Use <?> to indicate the match you wish to replace
' Create and setup several variables:
Dim regEx, Match, Matches, Position, strReturn
Position = 1
strReturn = ""
' Set up the regular expression:
Set regEx = New RegExp
regEx.Pattern = strPattern
regEx.IgnoreCase = True
regEx.Global = True
' Get all the matches for it:
Set Matches = regEx.Execute(strInput)
' Go through the Matches collection
' and build the output string:
For Each Match in Matches
strReturn = strReturn & Mid(strInput, Position, Match.FirstIndex+1-Positio
strReturn = strReturn & Replace(strReplace, "<?>", Match.Value)
Position = Len(Match.Value) + Match.FirstIndex + 1
Next
' Add any text after the last match
strReturn = strReturn & Mid(strInput, Position, Len(strInput))
RegExpReplace = strReturn
End Function
--------------
This was grabbed right from the previous post which came from this article: http://www.aspfaqs.com/aspfaqs/ShowFAQ.asp?FAQID=66
But the example in the previous article didn't quite work for me...
--------------
strHTML = RegExpReplace(strHTML, "strong(?![^<]+>), "*<?>*")
--------------
...it kept replacing text in html tags as well as text not in html tags. So, I banged around with it and came up with this modification:
--------------
strHTML = RegExpReplace(strHTML, "(?![^<]+>)" + strSearch + "(?![^<]+>)", strReplace)
--------------
strSearch is the text that I'm looking for and strReplace is what I want to replace it with. You might be able to modify this for your needs?
ALso,
https://www.experts-exchange.com/questions/20433094/regex-text-outside-of-a-href-tags.html
And if you are happy to use Perl see here:
http://www.perlmonks.org/index.pl?node_id=246935
https://www.experts-exchange.com/questions/20433094/regex-text-outside-of-a-href-tags.html
And if you are happy to use Perl see here:
http://www.perlmonks.org/index.pl?node_id=246935
kine, do you need any further help with this question or can it be closed ?
ASKER
Yeah, its done and dusted, your code showed the way. Cheers