• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2505
  • Last Modified:

VBScript Regular Expression Parse HTML file for some key items

Good morning experts...  I am in need of some Regular Expression help.
I need to parse the following HTML into 4 variables...

Title = Betweent the <title> tags
KeyWords = content value of keywords metatag
Description = content value of description metatag
PageContent = everything between the <body> tags


<html>
    <head>
        <title>This is my page title</title>
        <meta content="these are my keywords" name="keywords" />
        <meta content="this is the page description" name="description" />
    </head>
    <body>
       This is the HTML document content.
    </body>
</html>

Open in new window

0
midcompweb
Asked:
midcompweb
  • 3
  • 3
1 Solution
 
vsudipCommented:
0
 
ddrudikCommented:

<%
Set regEx = New RegExp
regEx.Global = True
regEx.IgnoreCase = True
sourcestring = "your source string"
regEx.Pattern = "([^<>]*)</title>[\S\s]*?<meta content=""([^""]*)"" name=""keywords"" />[\S\s]*?<meta content=""([^""]*)"" name=""description"" />[\S\s]*?<body>([\S\s]*?)</body>"
Set Matches = regEx.Execute(sourcestring)
  For z = 0 to Matches.Count-1
    results = results & "Matches(" & z & ") = " & chr(34) & Server.HTMLEncode(Matches(z)) & chr(34) & chr(13)
    For zz = 0 to Matches(z).SubMatches.Count-1
      results = results & "Matches(" & z & ").SubMatches(" & zz & ") = " & chr(34) & Server.HTMLEncode(Matches(z).SubMatches(zz)) & chr(34) & chr(13)
    next
    results=Left(results,Len(results)-1) & chr(13)
  next
Response.Write "<pre>" & results
%>

Open in new window

0
 
midcompwebAuthor Commented:
excellect ddrudik, one small question...is there any way to know which of the results belongs to which tag?  The tags may not always be in the same order in the source string, and sometimes they could be completley missing from the source.

Thanks again for quick response guys
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
ddrudikCommented:
The way to do that would be to do the regex matches separately within an if then statement.

Set regEx = New RegExp
regEx.Global = True
regEx.IgnoreCase = True
regEx.MultiLine = True
teststring = "<your string>"
regEx.Pattern = "<title>([\S\s]*?)</title>"
Set Test = regEx.Test(teststring)
if Test=True then
  Set Matches = regEx.Execute(teststring)
  title = Matches(0).SubMatches(1)
end if
regEx.Pattern = "<meta content=""([^""]*)"" name=""keywords"" />"
Set Test = regEx.Test(teststring)
if Test=True then
  Set Matches = regEx.Execute(teststring)
  keywords = Matches(0).SubMatches(1)
end if
regEx.Pattern = "<meta content=""([^""]*)"" name=""description"" />"
Set Test = regEx.Test(teststring)
if Test=True then
  Set Matches = regEx.Execute(teststring)
  description = Matches(0).SubMatches(1)
end if
regEx.Pattern = "<meta content="<body>([\S\s]*?)</body>"
Set Test = regEx.Test(teststring)
if Test=True then
  Set Matches = regEx.Execute(teststring)
  body= Matches(0).SubMatches(1)
end if
0
 
midcompwebAuthor Commented:
Exactly what i was looking for, I did a little cleanup on the code so i will post the working example but thank you SO much for you help!
Set regEx = New RegExp
	regEx.Global = True
	regEx.IgnoreCase = True
	regEx.MultiLine = True
	teststring = PageContent
	regEx.Pattern = "<title>([\S\s]*?)</title>"
	Test = regEx.Test(teststring)
	if Test=True then
	  Set Matches = regEx.Execute(teststring)
	  title = Matches(0).SubMatches(0)
	end if
	regEx.Pattern = "<meta content=""([^""]*)"" name=""keywords"" />"
	Test = regEx.Test(teststring)
	if Test=True then
	  Set Matches = regEx.Execute(teststring)
	  keywords = Matches(0).SubMatches(0)
	end if
	regEx.Pattern = "<meta content=""([^""]*)"" name=""description"" />"
	Test = regEx.Test(teststring)
	if Test=True then
	  Set Matches = regEx.Execute(teststring)
	  description = Matches(0).SubMatches(0)
	end if
	regEx.Pattern = "<body>([\S\s]*?)</body>"
	Test = regEx.Test(teststring)
	if Test=True then
	  Set Matches = regEx.Execute(teststring)
	  body= Matches(0).SubMatches(0)
	end if
	
	Response.Write "Title=" & title
	Response.Write "keywords=" & keywords
	Response.Write "description=" & description
	Response.Write "body=" & body

Open in new window

0
 
midcompwebAuthor Commented:
thanks again  :)  regular expressions have baffeled me for a long time  :)
0
 
ddrudikCommented:
Thanks for the question and the points.
0

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now