Solved

VBScript Regular Expression Parse HTML file for some key items

Posted on 2008-10-08
7
2,412 Views
Last Modified: 2012-05-05
Good morning experts...  I am in need of some Regular Expression help.
I need to parse the following HTML into 4 variables...

Title = Betweent the <title> tags
KeyWords = content value of keywords metatag
Description = content value of description metatag
PageContent = everything between the <body> tags


<html>

    <head>

        <title>This is my page title</title>

        <meta content="these are my keywords" name="keywords" />

        <meta content="this is the page description" name="description" />

    </head>

    <body>

       This is the HTML document content.

    </body>

</html>

Open in new window

0
Comment
Question by:midcompweb
  • 3
  • 3
7 Comments
 
LVL 8

Expert Comment

by:vsudip
ID: 22669300
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22669315

<%

Set regEx = New RegExp

regEx.Global = True

regEx.IgnoreCase = True

sourcestring = "your source string"

regEx.Pattern = "([^<>]*)</title>[\S\s]*?<meta content=""([^""]*)"" name=""keywords"" />[\S\s]*?<meta content=""([^""]*)"" name=""description"" />[\S\s]*?<body>([\S\s]*?)</body>"

Set Matches = regEx.Execute(sourcestring)

  For z = 0 to Matches.Count-1

    results = results & "Matches(" & z & ") = " & chr(34) & Server.HTMLEncode(Matches(z)) & chr(34) & chr(13)

    For zz = 0 to Matches(z).SubMatches.Count-1

      results = results & "Matches(" & z & ").SubMatches(" & zz & ") = " & chr(34) & Server.HTMLEncode(Matches(z).SubMatches(zz)) & chr(34) & chr(13)

    next

    results=Left(results,Len(results)-1) & chr(13)

  next

Response.Write "<pre>" & results

%>

Open in new window

0
 
LVL 4

Author Comment

by:midcompweb
ID: 22669371
excellect ddrudik, one small question...is there any way to know which of the results belongs to which tag?  The tags may not always be in the same order in the source string, and sometimes they could be completley missing from the source.

Thanks again for quick response guys
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 22669463
The way to do that would be to do the regex matches separately within an if then statement.

Set regEx = New RegExp
regEx.Global = True
regEx.IgnoreCase = True
regEx.MultiLine = True
teststring = "<your string>"
regEx.Pattern = "<title>([\S\s]*?)</title>"
Set Test = regEx.Test(teststring)
if Test=True then
  Set Matches = regEx.Execute(teststring)
  title = Matches(0).SubMatches(1)
end if
regEx.Pattern = "<meta content=""([^""]*)"" name=""keywords"" />"
Set Test = regEx.Test(teststring)
if Test=True then
  Set Matches = regEx.Execute(teststring)
  keywords = Matches(0).SubMatches(1)
end if
regEx.Pattern = "<meta content=""([^""]*)"" name=""description"" />"
Set Test = regEx.Test(teststring)
if Test=True then
  Set Matches = regEx.Execute(teststring)
  description = Matches(0).SubMatches(1)
end if
regEx.Pattern = "<meta content="<body>([\S\s]*?)</body>"
Set Test = regEx.Test(teststring)
if Test=True then
  Set Matches = regEx.Execute(teststring)
  body= Matches(0).SubMatches(1)
end if
0
 
LVL 4

Author Comment

by:midcompweb
ID: 22669559
Exactly what i was looking for, I did a little cleanup on the code so i will post the working example but thank you SO much for you help!
Set regEx = New RegExp

	regEx.Global = True

	regEx.IgnoreCase = True

	regEx.MultiLine = True

	teststring = PageContent

	regEx.Pattern = "<title>([\S\s]*?)</title>"

	Test = regEx.Test(teststring)

	if Test=True then

	  Set Matches = regEx.Execute(teststring)

	  title = Matches(0).SubMatches(0)

	end if

	regEx.Pattern = "<meta content=""([^""]*)"" name=""keywords"" />"

	Test = regEx.Test(teststring)

	if Test=True then

	  Set Matches = regEx.Execute(teststring)

	  keywords = Matches(0).SubMatches(0)

	end if

	regEx.Pattern = "<meta content=""([^""]*)"" name=""description"" />"

	Test = regEx.Test(teststring)

	if Test=True then

	  Set Matches = regEx.Execute(teststring)

	  description = Matches(0).SubMatches(0)

	end if

	regEx.Pattern = "<body>([\S\s]*?)</body>"

	Test = regEx.Test(teststring)

	if Test=True then

	  Set Matches = regEx.Execute(teststring)

	  body= Matches(0).SubMatches(0)

	end if

	

	Response.Write "Title=" & title

	Response.Write "keywords=" & keywords

	Response.Write "description=" & description

	Response.Write "body=" & body

Open in new window

0
 
LVL 4

Author Closing Comment

by:midcompweb
ID: 31504241
thanks again  :)  regular expressions have baffeled me for a long time  :)
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22669580
Thanks for the question and the points.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Pass through dll 2 62
Classic ASP application Will support SQL 2014 5 72
regex expression 9 55
Using Classic ASP inside HTML pages 2 55
I met Paul Devereux (@pdevereux) today when I responded to his tweet asking “Anybody know how to automate adding files from disk to a folder in #outlook  ?”.  I replied back and told Paul that using automation, in this case scripting, to add files t…
This demonstration started out as a follow up to some recently posted questions on the subject of logging in: http://www.experts-exchange.com/Programming/Languages/Scripting/JavaScript/Q_28634665.html and http://www.experts-exchange.com/Programming/…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now