Solved

How to extract selective text from html and save as doc

Posted on 2004-10-21
156 Views
Last Modified: 2012-05-05
How to extract following type of text from about 5000 html pages and save them in one doc file and also possibly  export in a Database.
Eighth Maria Company
Address:
PO Box 10234
Traverse City, MI 40685
Contact: McLori Steele
Phone: 2901/978-0678 / Fax:
E-mail: 12hour@charter.net
WWW: http:// www.123.com
0
Question by:micazone
    6 Comments
     
    LVL 12

    Expert Comment

    by:BobLamberson
    Hi micazone,
    In the source code, you have to find a consistant "marker" of some kind, that relates to each field then parse each item out, probably into a variable. You can then easily write to a text file with fso script, or use ado and put the data in a database.
    A link to a page with source code would help someone give you more specifics for the process.

    Bob
    0
     

    Author Comment

    by:micazone
    Can any one help with a vb script for this purpose. It will be a great help.
    0
     
    LVL 12

    Expert Comment

    by:BobLamberson
    micazone,
    Can you post the source code or a link to one of the pages you are trying to extract the data from?
    Bob
    0
     

    Author Comment

    by:micazone
    <H1>PMA Member Listing</H1>
    <hr>

    <h2>E. Dianne Publishing</h2>

    <table border=0><tr><td valign="top"><p><b>Address:</b></td><td><p>PO Box 284<br>
    Murphy, OR 97533</td></tr></table>

    <p>

    <b>Contact:</b> Glen Allport<br>
    <b>Phone:</b>  / <b>Fax:</b> <br>

    <b>E-mail:</b>
    <script>
          var emailAddress = "edh_40internetcds.com";
          var index = "";
          var encodedAddress="";
          for (i=0;i<emailAddress.length;i++){
                index = "&#"+emailAddress.charCodeAt(i);
                encodedAddress = encodedAddress+index;
          }
          document.write("<a href='mailto:"+encodedAddress+"'>edh@internetcds.com</a>");
    </script>
    <br>

    <b>WWW:</b> <a href="../..//default.htm">http://</a>
    </p>

    <p>
    <p><b>Number of titles in print: </b><p>

    <p>
    <p><b>Categories: </b>
    0
     

    Author Comment

    by:micazone
    Probably no one is able to solve this problem???????????
    0
     
    LVL 12

    Accepted Solution

    by:
    I have used this method in the past.  
     
    Dim httpClient
    Dim fso
    Dim MyFile
    Dim Folder
    Dim str1

    Folder = "C:\Updates\clientfolders\ClientName\"

    Set fso = CreateObject("Scripting.FileSystemObject")

    Set httpClient = CreateObject("AspHTTP.Conn")



    WScript.Echo("Starting download for Client - Click OK")

    httpClient.URL="http://www.<your target url>"       ' replace this with your target site url
    httpclient.SaveFileTo = Folder & "Client2.txt"
    str1 = httpClient.GetURL

    the str1 variable now holds all the html from the web page.
    Parse through it using any markers you can, such as
    Mid(Str1, InStr(1, Str1, "<H1>") + 5, InStr(1, Str1, "<\H1>")) would get you the string "PMA Member Listing"
    then just continue looping through the string assigning the strings you want to variables.
    Sorry I havn't got more time to code this part, but you should be able to get it from here.

    set httpClient = nothing
    set fso = nothing
    set myFile = nothing
    set Folder = nothing
    set str1 = nothing

    also take a look at http://www.serverobjects.com/comp/asphttp3.htm
    there are several other objects too that can be used for this purpose.

    hope this is helpful

    Bob
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Prepare to Pass the CompTIA A+ 900 Series Exam

    CompTIA aims to adapt its A+ Certification to reflect the most current knowledge and skills needed by today's IT professionals--and this year's 2016 exam is harder than ever. This certification is one of the most highly-respected and sought after in IT.

    Introduction This article makes the case for using two modules in your VBA/VB6 applications to provide both case-sensitive and case-insensitive text comparison operations.  Recently, I solved an EE question using the LIKE function.  In order for th…
    When designing a form there are several BorderStyles to choose from, all of which can be classified as either 'Fixed' or 'Sizable' and I'd guess that 'Fixed Single' or one of the other fixed types is the most popular choice. I assume it's the most p…
    Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
    This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

    884 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now