Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


How to extract selective text from html and save as doc

Posted on 2004-10-21
Medium Priority
Last Modified: 2012-05-05
How to extract following type of text from about 5000 html pages and save them in one doc file and also possibly  export in a Database.
Eighth Maria Company
PO Box 10234
Traverse City, MI 40685
Contact: McLori Steele
Phone: 2901/978-0678 / Fax:
E-mail: 12hour@charter.net
WWW: http:// www.123.com
Question by:micazone
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
LVL 12

Expert Comment

by:Bob Lamberson
ID: 12378088
Hi micazone,
In the source code, you have to find a consistant "marker" of some kind, that relates to each field then parse each item out, probably into a variable. You can then easily write to a text file with fso script, or use ado and put the data in a database.
A link to a page with source code would help someone give you more specifics for the process.


Author Comment

ID: 12378670
Can any one help with a vb script for this purpose. It will be a great help.
LVL 12

Expert Comment

by:Bob Lamberson
ID: 12381346
Can you post the source code or a link to one of the pages you are trying to extract the data from?

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.


Author Comment

ID: 12381463
<H1>PMA Member Listing</H1>

<h2>E. Dianne Publishing</h2>

<table border=0><tr><td valign="top"><p><b>Address:</b></td><td><p>PO Box 284<br>
Murphy, OR 97533</td></tr></table>


<b>Contact:</b> Glen Allport<br>
<b>Phone:</b>  / <b>Fax:</b> <br>

      var emailAddress = "edh_40internetcds.com";
      var index = "";
      var encodedAddress="";
      for (i=0;i<emailAddress.length;i++){
            index = "&#"+emailAddress.charCodeAt(i);
            encodedAddress = encodedAddress+index;
      document.write("<a href='mailto:"+encodedAddress+"'>edh@internetcds.com</a>");

<b>WWW:</b> <a href="../..//default.htm">http://</a>

<p><b>Number of titles in print: </b><p>

<p><b>Categories: </b>

Author Comment

ID: 12411844
Probably no one is able to solve this problem???????????
LVL 12

Accepted Solution

Bob Lamberson earned 750 total points
ID: 12417055
I have used this method in the past.  
Dim httpClient
Dim fso
Dim MyFile
Dim Folder
Dim str1

Folder = "C:\Updates\clientfolders\ClientName\"

Set fso = CreateObject("Scripting.FileSystemObject")

Set httpClient = CreateObject("AspHTTP.Conn")

WScript.Echo("Starting download for Client - Click OK")

httpClient.URL="http://www.<your target url>"       ' replace this with your target site url
httpclient.SaveFileTo = Folder & "Client2.txt"
str1 = httpClient.GetURL

the str1 variable now holds all the html from the web page.
Parse through it using any markers you can, such as
Mid(Str1, InStr(1, Str1, "<H1>") + 5, InStr(1, Str1, "<\H1>")) would get you the string "PMA Member Listing"
then just continue looping through the string assigning the strings you want to variables.
Sorry I havn't got more time to code this part, but you should be able to get it from here.

set httpClient = nothing
set fso = nothing
set myFile = nothing
set Folder = nothing
set str1 = nothing

also take a look at http://www.serverobjects.com/comp/asphttp3.htm
there are several other objects too that can be used for this purpose.

hope this is helpful


Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
This article describes how to use a set of graphical playing cards to create a Draw Poker game in Excel or VB6.
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question