How to extract selective text from html and save as doc

How to extract following type of text from about 5000 html pages and save them in one doc file and also possibly  export in a Database.
Eighth Maria Company
PO Box 10234
Traverse City, MI 40685
Contact: McLori Steele
Phone: 2901/978-0678 / Fax:
WWW: http://
Who is Participating?
Bob LambersonSoftware EngineerCommented:
I have used this method in the past.  
Dim httpClient
Dim fso
Dim MyFile
Dim Folder
Dim str1

Folder = "C:\Updates\clientfolders\ClientName\"

Set fso = CreateObject("Scripting.FileSystemObject")

Set httpClient = CreateObject("AspHTTP.Conn")

WScript.Echo("Starting download for Client - Click OK")

httpClient.URL="http://www.<your target url>"       ' replace this with your target site url
httpclient.SaveFileTo = Folder & "Client2.txt"
str1 = httpClient.GetURL

the str1 variable now holds all the html from the web page.
Parse through it using any markers you can, such as
Mid(Str1, InStr(1, Str1, "<H1>") + 5, InStr(1, Str1, "<\H1>")) would get you the string "PMA Member Listing"
then just continue looping through the string assigning the strings you want to variables.
Sorry I havn't got more time to code this part, but you should be able to get it from here.

set httpClient = nothing
set fso = nothing
set myFile = nothing
set Folder = nothing
set str1 = nothing

also take a look at
there are several other objects too that can be used for this purpose.

hope this is helpful

Bob LambersonSoftware EngineerCommented:
Hi micazone,
In the source code, you have to find a consistant "marker" of some kind, that relates to each field then parse each item out, probably into a variable. You can then easily write to a text file with fso script, or use ado and put the data in a database.
A link to a page with source code would help someone give you more specifics for the process.

micazoneAuthor Commented:
Can any one help with a vb script for this purpose. It will be a great help.
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Bob LambersonSoftware EngineerCommented:
Can you post the source code or a link to one of the pages you are trying to extract the data from?
micazoneAuthor Commented:
<H1>PMA Member Listing</H1>

<h2>E. Dianne Publishing</h2>

<table border=0><tr><td valign="top"><p><b>Address:</b></td><td><p>PO Box 284<br>
Murphy, OR 97533</td></tr></table>


<b>Contact:</b> Glen Allport<br>
<b>Phone:</b>  / <b>Fax:</b> <br>

      var emailAddress = "";
      var index = "";
      var encodedAddress="";
      for (i=0;i<emailAddress.length;i++){
            index = "&#"+emailAddress.charCodeAt(i);
            encodedAddress = encodedAddress+index;
      document.write("<a href='mailto:"+encodedAddress+"'></a>");

<b>WWW:</b> <a href="../..//default.htm">http://</a>

<p><b>Number of titles in print: </b><p>

<p><b>Categories: </b>
micazoneAuthor Commented:
Probably no one is able to solve this problem???????????
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.