I am trying to get the Title and Meta data from a site.
To get the data I am using XMLHTTP.
I am then using a Regular Expression to extract the data.
function stripHTMLTags(strPattern, strText)
set re = new RegExp
re.pattern = strPattern
re.ignorecase = true
re.global = true
Set Matches = re.Execute(strText)
for each match in matches
str2 = str2 & Match.value
I can extract the title with:
response.write "<br>" & stripHTMLTags("<title>(.*)?\<\/title>", xmlHTTP.responseText)
This works on some sites, and not on others.
Also Meta data that is proving a little more difficult. On some pages it works, and others not.
If I use:
response.write "<br>" & stripHTMLTags("<head>(.*)?\<\/head>", xmlHTTP.responseText)
...to extract the entire header block I get nothing.
Also, the following will produce results with some pages, and nothing with others:
response.write "<br>" & stripHTMLTags("<meta(.*)?\>", xmlHTTP.responseText)
I had wanted to grab the <head>...</head> and assign it to a variable so that I don't have to check the entire page code each time I look for data - making the script quicker. Basically then I could replace the xmlhttp.responsetext with the variable.
Anyway, any idea why the <head> part produces nothing, and the rest work intermittently?