Grap some text from an HTML file via Windows scripting

Hello experts,

I would like to catch a specific line from an intranet web page (html) and write it to a text file via Windows batch script. The HTML is a simple one with usual table, href, etc entities.

Is there a way to do this? Please provide a simple example?

Thank you in advance
bozerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Scott Fell, EE MVEDeveloper & EE ModeratorCommented:
What you will do is an xmlhttppost to the page to grab the code.  From there you just parse out what you need.

http://support.microsoft.com/kb/290591
       
I modified the code below.  We don't need DataToSend unless you need to pass form variables or a querystring.  I also changed, Response.Write xmlhttp.responsexml.xml to theHTML =  xmlhttp.responsetext

   
    DataToSend = ""
	dim xmlhttp 
	set xmlhttp = server.Createobject("MSXML2.ServerXMLHTTP")
	xmlhttp.Open "POST","http://www.somesite/somepage.html",false
	xmlhttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
	xmlhttp.send DataToSend
	Response.ContentType = "text/xml"
	theHTML =  xmlhttp.responsetext
        Set xmlhttp = nothing

Open in new window


At this point, all of  the html (if you were to view source) is in the variable theHTML.  Now you need to parse out what you want.

Let's say the our variable now looks like this.
<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <title>some site</title>
</head>
<body>
  <h1>This is the title</h1>
  <h2>This is a sub title</h2>
  <p>about our sub title about our sub title about our sub title about our sub title</p>
  <h2>This is a sub title</h2>
  <p>about our sub title about our sub title about our sub title about our sub title</p>
  <h2>This is a sub title</h2>
  <p>about our sub title about our sub title about our sub title about our sub title</p>
</body>
</html>

Open in new window

You only want what is in the first paragraph.  To do this, we look in the variable, theHMTL for the first <p> tag and grab everything to the next </p> tag.
Step 1 finding the position of the first p tag and next p tag
start = instr(theHTML,"<p>")
end = instr(theHTML,"</p>")

Open in new window


Step 2 is getting just that paragraph.
theParagraph = mid(theHTML, start, end)

Open in new window


Putting it all together
 
    
DataToSend = ""
dim xmlhttp 
set xmlhttp = server.Createobject("MSXML2.ServerXMLHTTP")
xmlhttp.Open "POST","http://www.somesite/somepage.html",false
xmlhttp.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
xmlhttp.send DataToSend
Response.ContentType = "text/xml"
theHTML =  xmlhttp.responsetext
Set xmlhttp = nothing

start = instr(theHTML,"<p>")
end = instr(theHTML,"</p>")

theParagraph = mid(theHTML, start, end)

Open in new window

0
bozerAuthor Commented:
Hello Scott,

Thank you for the detailed reply. But this looks exactly how I was doing similar operations with Classic ASP. I want to do everything over a windows batch script so I don't have to worry about Web Servers, libraries, etc.

I think a plain batch script can read from a text file so what I basically want is to do the same for the static html page or perhaps, the batch can also do html > save as text file > get text from text file.
0
Scott Fell, EE MVEDeveloper & EE ModeratorCommented:
That is vbs and you can run vbs from a batch file.  I run some scheduled tasks in the very same way.

If you have the html on your own server, you can use fso to read the file.  
http://msdn.microsoft.com/en-us/library/aa711216(v=vs.71).aspx
http://www.vb6.us/tutorials/using-fso-file-system-object-vb6
http://technet.microsoft.com/en-us/library/ee198716.aspx

Const ForReading = 1
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile("C:\folder\file.html", ForReading)
start = instr(objFile,"<p>")
end = instr(objFile,"</p>")

theParagraph = mid(objFile, start, end)

Open in new window


vbs is used in asp.  I don't use vb6 or vb.net but I believe this part is identical.  Instead of response.write you use WScript.Echo

Save this as a vbs and call it from your batch file.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
bozerAuthor Commented:
I apologize all, I had to attend some urgent issues and I will test your recommendations as soon as possible.
0
bozerAuthor Commented:
Thanks Scott, playing around the file operations code you provided helped me solve my problem.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Windows Server 2008

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.