How to extract selective text from html and save as doc

How to extract following type of text from about 5000 html pages and save them in one doc file and also possibly  export in a Database.
Eighth Maria Company
Address:
PO Box 10234
Traverse City, MI 40685
Contact: McLori Steele
Phone: 2901/978-0678 / Fax:
E-mail: 12hour@charter.net
WWW: http:// www.123.com
micazoneAsked:
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

x
 
Bob LambersonConnect With a Mentor Software EngineerCommented:
I have used this method in the past.  
 
Dim httpClient
Dim fso
Dim MyFile
Dim Folder
Dim str1

Folder = "C:\Updates\clientfolders\ClientName\"

Set fso = CreateObject("Scripting.FileSystemObject")

Set httpClient = CreateObject("AspHTTP.Conn")



WScript.Echo("Starting download for Client - Click OK")

httpClient.URL="http://www.<your target url>"       ' replace this with your target site url
httpclient.SaveFileTo = Folder & "Client2.txt"
str1 = httpClient.GetURL

the str1 variable now holds all the html from the web page.
Parse through it using any markers you can, such as
Mid(Str1, InStr(1, Str1, "<H1>") + 5, InStr(1, Str1, "<\H1>")) would get you the string "PMA Member Listing"
then just continue looping through the string assigning the strings you want to variables.
Sorry I havn't got more time to code this part, but you should be able to get it from here.

set httpClient = nothing
set fso = nothing
set myFile = nothing
set Folder = nothing
set str1 = nothing

also take a look at http://www.serverobjects.com/comp/asphttp3.htm
there are several other objects too that can be used for this purpose.

hope this is helpful

Bob
0
 
Bob LambersonSoftware EngineerCommented:
Hi micazone,
In the source code, you have to find a consistant "marker" of some kind, that relates to each field then parse each item out, probably into a variable. You can then easily write to a text file with fso script, or use ado and put the data in a database.
A link to a page with source code would help someone give you more specifics for the process.

Bob
0
 
micazoneAuthor Commented:
Can any one help with a vb script for this purpose. It will be a great help.
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
Bob LambersonSoftware EngineerCommented:
micazone,
Can you post the source code or a link to one of the pages you are trying to extract the data from?
Bob
0
 
micazoneAuthor Commented:
<H1>PMA Member Listing</H1>
<hr>

<h2>E. Dianne Publishing</h2>

<table border=0><tr><td valign="top"><p><b>Address:</b></td><td><p>PO Box 284<br>
Murphy, OR 97533</td></tr></table>

<p>

<b>Contact:</b> Glen Allport<br>
<b>Phone:</b>  / <b>Fax:</b> <br>

<b>E-mail:</b>
<script>
      var emailAddress = "edh_40internetcds.com";
      var index = "";
      var encodedAddress="";
      for (i=0;i<emailAddress.length;i++){
            index = "&#"+emailAddress.charCodeAt(i);
            encodedAddress = encodedAddress+index;
      }
      document.write("<a href='mailto:"+encodedAddress+"'>edh@internetcds.com</a>");
</script>
<br>

<b>WWW:</b> <a href="../..//default.htm">http://</a>
</p>

<p>
<p><b>Number of titles in print: </b><p>

<p>
<p><b>Categories: </b>
0
 
micazoneAuthor Commented:
Probably no one is able to solve this problem???????????
0
All Courses

From novice to tech pro — start learning today.