Link to home
Start Free TrialLog in
Avatar of JoseDavila
JoseDavilaFlag for United States of America

asked on

Use VB to Search for HTML String and Return Values to text file

Need to have VB app Search a entered url and look for a specific string in the html and return the values to a text file.
Avatar of Bob Lamberson
Bob Lamberson
Flag of United States of America image

Hi JoseDavila,
Will the url be entered on a form? in a text box?

Bob
Avatar of JoseDavila

ASKER

The URL will be submitted in the form with a text box.
JoseDavila,

Open a form in vb and add two textboxes and a command button.

Add this code to a form and run it.
Enter the search string in the second text box and click on search.
You will find a textfile.txt in c:\ that contains the found string.
Option Explicit


Private Sub cmdSearch_Click()
   If InStr(1, Text1.Text, Text2.Text) > 0 Then
      Open "C:\textfile.txt" For Output As #1
      Write #1, Text2.Text
      Close #1
   End If
End Sub

Private Sub Form_Load()
Text1.Text = "http://www.bluesquirrel.com/products/PopUpStopper/"
End Sub

Bob
Thank you for the quick response but I actually need the text2 to compare to the html source code not the url.  

Avatar of anv
anv

check this

Dim doc As HTMLDocument
Dim a_link As HTMLAnchorElement
Dim txt As String

    ' List the links.
    On Error Resume Next
    Set doc = WebBrowser1.Document   'Assuming the document is being displayed in the webbrowserr control placed on the form..


    For Each a_link In doc.links
        txt = txt & a_link.href & vbCrLf
    Next a_link

    txtLinks.Text = txt

JoseDavila,
This object will bring back the html source of a page.

http://www.serverobjects.com/comp/asphttp3.htm

Bob
Avatar of David Lee
The code below will load a page and then search through that page searching for text matching a given search string.  You don't mention what you want to do once a match is found, so I opted to pop a dialog box up indicating the item was found.  What I've provided here is a complete VB form (.frm) file.  To use it, copy and paste the code into Notepad and save the file with a .frm extension.  Then open the form in VB.  The web site I used is USA Today, so edit the url and change it to the site you are interested in.  Run the program, type the text you want to search for into the textbox, and click Go.  The text search is case sensitive.  On clicking Go the code will grab a copy of the HTML document and llop through the various elements on the page looking for the search text.

Hope this helps.


VERSION 5.00
Object = "{EAB22AC0-30C1-11CF-A7EB-0000C05BAE0B}#1.1#0"; "shdocvw.dll"
Begin VB.Form Form1
   Caption         =   "Form1"
   ClientHeight    =   3885
   ClientLeft      =   60
   ClientTop       =   450
   ClientWidth     =   7560
   LinkTopic       =   "Form1"
   ScaleHeight     =   3885
   ScaleWidth      =   7560
   StartUpPosition =   3  'Windows Default
   Begin VB.TextBox txtSearch
      Height          =   375
      Left            =   120
      TabIndex        =   2
      Top             =   2640
      Width           =   7335
   End
   Begin SHDocVwCtl.WebBrowser WebBrowser1
      Height          =   2295
      Left            =   120
      TabIndex        =   1
      Top             =   120
      Width           =   7335
      ExtentX         =   12938
      ExtentY         =   4048
      ViewMode        =   0
      Offline         =   0
      Silent          =   0
      RegisterAsBrowser=   0
      RegisterAsDropTarget=   1
      AutoArrange     =   0   'False
      NoClientEdge    =   0   'False
      AlignLeft       =   0   'False
      NoWebView       =   0   'False
      HideFileNames   =   0   'False
      SingleClick     =   0   'False
      SingleSelection =   0   'False
      NoFolders       =   0   'False
      Transparent     =   0   'False
      ViewID          =   "{0057D0E0-3573-11CF-AE69-08002B2E1262}"
      Location        =   ""
   End
   Begin VB.CommandButton cmdGo
      Caption         =   "Go"
      Height          =   495
      Left            =   120
      TabIndex        =   0
      Top             =   3240
      Width           =   1455
   End
End
Attribute VB_Name = "Form1"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = False
Private Sub cmdGo_Click()
    Dim objDocument As Object, _
        objElements As Object, _
        objItem As Object
    If txtSearch.Text <> "" Then
        WebBrowser1.Navigate "www.usatoday.com"
        While WebBrowser1.ReadyState <> READYSTATE_COMPLETE
            DoEvents
        Wend
        Set objDocument = WebBrowser1.Document
        Set objElements = objDocument.All
        For Each objItem In objElements
            If InStr(1, objItem.innerHTML, txtSearch.Text, vbTextCompare) > 0 Then
                MsgBox "Item found.", vbInformation, "Found It"
            End If
        Next
    End If
End Sub
Instead of using a Web Browser control, you might prefer the Microsoft Internet Transfer control (Inet).

Private Sub Form_Load()
    Dim strHTML As String
    strHTML = Inet1.OpenURL("https://www.experts-exchange.com", icString)
    MsgBox strHTML
End Sub

-Burbble
BlueDevilFan,

I would like the output to be in a textfile not a message box.
I would like the search to look into the html source for # and return the following entry into a text file. (Example
<td bgcolor="red"> 4.793E+01 <B>#</B></td>)
JoseDavila,

Can you be more specific about what you want returned?  Also, do you want to search for just the first # sign, or all pound signs on the page?  
Search for all pound signs on the page and return the number in the following example (<td bgcolor="red"> 4.793E+01 <B>#</B></td>)
Can you give me a URL for a page I can test against?  If not, can you copy and paste the HTML for a sample page here?
That all I am able to provide.
>> I would like the output to be in a textfile not a message box.

I was just giving an alternative solution to using a Web Browser Control for retrieving a file from a URL -- the Microsoft Internet Transfer Control's .OpenURL method.

I don't entirely understand how you want to parse it, though...

-Burbble
So I need load the webpage that needs parse and have my code find where every # exists in the web page.  (Example of HTML source from table (<td bgcolor="red"> 4.793E+01 <B>#</B></td>)) Once the code has found # char I would like it to recode the number (Which in this example is 4.793E+01) and write the output to a text file.
Does anybody have input on how to make this code work.  'Input as to where the item was found in the html source and write the text 10 character to left to a text file.

Private Sub cmdGo_Click()
    Dim objDocument As Object, _
        objElements As Object, _
        objItem As Object
    Dim myResults As String

    If txtSearch.Text <> "" Then
        WebBrowser1.Navigate "http://www.yahoo.com/"
            DoEvents
        Wend
        Set objDocument = WebBrowser1.Document
        Set objElements = objDocument.All
        For Each objItem In objElements
            If InStr(1, objItem.innerHTML, txtSearch.Text, vbTextCompare) > 0 Then
                myResults = 'Need Input as to where the item was found and record the text 10 character to left & vbClrf
                Open "C:\output.txt" For Output As #1
                Write #1, myResults
                Close #1
            End If
        Next
    End If
End Sub
You've left out part of the code, JoseDavila.  The way it is now it probably won't work because you're not letting the web page load before it moves on to searching for the text you want.  Also, the way you've set the code up to write the output to the text file you're only going to get the alst itteration, not every itteration.  Hang on a minute and I'll modify my post to search for what you've asked for.  Note though that since I don't have an example to test with I can't say for certain it'll work as you want.
ASKER CERTIFIED SOLUTION
Avatar of David Lee
David Lee
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
hi

for following

>> For Each objItem In objElements
>>            If InStr(1, objItem.innerHTML, txtSearch.Text, vbTextCompare) > 0 Then
>>                myResults = 'Need Input as to where the item was found and record the >>text 10 character to left & vbClrf
>>                Open "C:\output.txt" For Output As #1
>>                Write #1, myResults
>>                Close #1
>>            End If
>>        Next

try this

For Each objItem In objElements
   ind=InStr(1, objItem.innerHTML, txtSearch.Text, vbTextCompare)
  If ind > 0 then                      
         myResults =  ind &  Mid(objItem.innerHTML, ind - 10, 10) &  vbClrf
         Open "C:\output.txt" For Output As #1
         Write #1, myResults
         Close #1
   End If
Next