Solved

vb.net 2008 html parsing and listbox

Posted on 2009-05-04
13
337 Views
Last Modified: 2013-11-26
Im fairly new to programming, but what I want my program to do is to visit a webpage that has a bunch of links on the page containing usernames, then pull the user names from the page and add them to a list box
Heres a sample of the html

<table style="margin-left:auto;margin-right:auto;"><tr><td style="font-size:8pt;text-align:center;" class="color">
<div><b><a href="http://www.website.com/username1">
chronic</a>
</b></div>

<a href="http://www.website.com/username1">
    <img class="pic1" onmouseover="showInfo5('chronic', '', '');this.className='pic2';" onmouseout="this.className='pic1';return nd();" src="http://www.website.com/file/pic/user/chronic_75.jpg" alt="" height="75" width="56" />
</a>
</td>

</tr>
</table></td>
<td style="text-align:center;vertical-align;middle;">
<table style="margin-left:auto;margin-right:auto;"><tr><td style="font-size:8pt;text-align:center;" class="color">
<div><b><a href="http://www.website.com/username2">
dantheman2108</a>
</b></div>

I would like for it to pull the data after <a href="http://www.website.com/ and put the username into a listbox, any ideas or suggestions on how i would go about this, ive been using the web browser control

0
Comment
Question by:j0eh4x
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
13 Comments
 
LVL 15

Accepted Solution

by:
oobayly earned 500 total points
ID: 24298160
Use the HttpWebRequest class to get the html on the page, then use a RegEx to extract the usernames
    Dim re As New Regex("<a href""http://www.website.com/(<?Username>/+?)""")
    Dim usernames As New List(Of String)()
    For Each m As Match In re.Matches(htmlText)
      usernames.Add(m.Groups("Username"))
    Next

Open in new window

0
 

Author Comment

by:j0eh4x
ID: 24299422
having trouble with this code


under For it has "statement can not appear outside of a method body"
and under usernames, it says "declaration expected"

  Dim re As New Regex("<a href""http://www.website.com/(<?Usernamer/+?)""")
    For Each m As Match In re.Matches(htmlText)
      usernames.Add(m.Groups("Username"))
    Next

Open in new window

0
 
LVL 15

Expert Comment

by:oobayly
ID: 24299597
As the compile error suggests, you need to place the for loop in a method or event handler. Also, you haven't declared the List to be populated. Finally, you misspelt Username in the Regex
Private Function PopulateUsernames(htmlText As String) As List(Of String)
  Dim re As New Regex("<a href""http://www.website.com/(<?Username>/+?)""")
  Dim usernames As New List(Of String)()
  For Each m As Match In re.Matches(htmlText)
    usernames.Add(m.Groups("Username"))
  Next
  Return usernames;
End Sub

Open in new window

0
Forrester Webinar: xMatters Delivers 261% ROI

Guest speaker Dean Davison, Forrester Principal Consultant, explains how a Fortune 500 communication company using xMatters found these results: Achieved a 261% ROI, Experienced $753,280 in net present value benefits over 3 years and Reduced MTTR by 91% for tier 1 incidents.

 

Author Comment

by:j0eh4x
ID: 24307234
m.Groups("Username")   "Value of type 'system.text.regularexpressions.group' cannot be converted to string'.
Imports System.net
Imports System.Text.RegularExpressions
 
Public Class Form1
    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim myReq As HttpWebRequest = _
         WebRequest.Create("http://www.website.com/")
    End Sub
    Private Function PopulateUsernames(ByVal htmlText As String) As List(Of String)
        Dim re As New Regex("<a href""http://www.website.com/(<?Username>/+?)""")
        Dim usernames As New List(Of String)()
        For Each m As Match In re.Matches(htmlText)
            usernames.Add(m.Groups("Username"))
        Next
        Return usernames
    End Function
End Class

Open in new window

0
 
LVL 15

Expert Comment

by:oobayly
ID: 24309209
Sorry about that last error, forgot that it should use the Value property of the match:

usernames.Add(m.Groups("Username").Value)

Open in new window

0
 

Author Comment

by:j0eh4x
ID: 24310217
how would i add it to the listbox
0
 
LVL 15

Expert Comment

by:oobayly
ID: 24312343
Assuming your listbox is called listBox1:
'' Inside the Loop
listBox1.Items.Add(m.Groups("Username").Value)

Open in new window

0
 

Author Comment

by:j0eh4x
ID: 24317634
oobayly i greatly appreciate your help, i have one last dumb question
attached is the final program, on previous programs ive put my code in buttons. how do i trigger this Private Function PopulateUsernames
Imports System.Net
Imports System.Text.RegularExpressions
 
Public Class Form1
    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim myReq As HttpWebRequest = _
         WebRequest.Create("http://www.website.com/browse/")
    End Sub
    Private Function PopulateUsernames(ByVal htmlText As String) As List(Of String)
        Dim re As New Regex("<a href""http://www.website.com/(<?Username>/+?)""")
        Dim usernames As New List(Of String)()
        For Each m As Match In re.Matches(htmlText)
            usernames.Add(m.Groups("Username").Value)
            '' Inside the Loop
            ListBox.Items.Add(m.Groups("Username").Value)
        Next
        Return usernames
 
    End Function
End Class

Open in new window

0
 
LVL 15

Expert Comment

by:oobayly
ID: 24317752
Instead of using HttpWebRequest, use the WebClient, download the html as a string, and pass it to PopulateUsernames. The 3 lines below should go in the Load event
Dim client As New WebClient()
Dim htmlText As String = client.DownloadString("http://www.website.com/")
PopulateUsernames(htmlText)

Open in new window

0
 

Author Comment

by:j0eh4x
ID: 24317913
i tried the following code but got nothing in the list box

Imports System.Net
Imports System.Text.RegularExpressions
 
Public Class Form1
    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim client As New WebClient()
        Dim htmlText As String = client.DownloadString("http://www.stlpunk.com/browse")
        PopulateUsernames(htmlText)
    End Sub
    Private Function PopulateUsernames(ByVal htmlText As String) As List(Of String)
        Dim re As New Regex("<a href""http://www.stlpunk.com/(<?Username>/+?)""")
        Dim usernames As New List(Of String)()
        For Each m As Match In re.Matches(htmlText)
            usernames.Add(m.Groups("Username").Value)
            '' Inside the Loop
            ListBox.Items.Add(m.Groups("Username").Value)
        Next
        Return usernames
 
    End Function
 
End Class

Open in new window

0
 
LVL 15

Expert Comment

by:oobayly
ID: 24318368
There's nothing obviously wrong with the code, so all I can suggest is adding some breakpoints and verifying that the HTML returned is valid and that some matches are being returned.
0
 

Author Comment

by:j0eh4x
ID: 24318641
i used to following code to put a break in time after receiving the html then i set a textbox equal to the htmltext variable to make sure its recieving the html ok.... still not populating the list box though

Imports System.Net
Imports System.Text.RegularExpressions
 
Public Class Form1
    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim client As New WebClient()
        Dim htmlText As String = client.DownloadString("http://www.stlpunk.com/browse/mode_recent/")
        Dim timeOut As DateTime = Now.AddMilliseconds(5000)'pause for 5 seconds
        Do
            Application.DoEvents()
        Loop Until Now > timeOut
        TextBox1.Text = htmlText ' display html
        PopulateUsernames(htmlText)
 
 
    End Sub
    Private Function PopulateUsernames(ByVal htmlText As String) As List(Of String)
        Dim re As New Regex("<a href""http://www.stlpunk.com/(<?Username>/+?)""")
        Dim usernames As New List(Of String)()
        For Each m As Match In re.Matches(htmlText)
            usernames.Add(m.Groups("Username").Value)
            '' Inside the Loop
            ListBox.Items.Add(m.Groups("Username").Value)
        Next
        Return usernames
 
    End Function
 
End Class

Open in new window

0
 
LVL 15

Expert Comment

by:oobayly
ID: 24319439
By breakpoint I mean a debugging breakpoint, so that you can inspect the code returned by the WebClient.
You don't need to block the Load event using the loop as DownloadString will block until it returns a string (or throws an exception).

http://msdn.microsoft.com/en-us/library/ktf38f66(VS.71).aspx
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In my previous article (http://www.experts-exchange.com/Programming/Languages/.NET/.NET_Framework_3.x/A_4362-Serialization-in-NET-1.html) we saw the basics of serialization and how types/objects can be serialized to Binary format. In this blog we wi…
In my previous two articles we discussed Binary Serialization (http://www.experts-exchange.com/A_4362.html) and XML Serialization (http://www.experts-exchange.com/A_4425.html). In this article we will try to know more about SOAP (Simple Object Acces…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…
Exchange organizations may use the Journaling Agent of the Transport Service to archive messages going through Exchange. However, if the Transport Service is integrated with some email content management application (such as an antispam), the admini…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question