Grabbing content in HTML tags from a webpage

I would like to grab content from a webpage and grab HTML nested tags.

For example

<html>
<head>
</head>
<body>
<div><p>Open para here</p></div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here</div>
</div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here1</div>
</div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here2</div>
</div>
</body>

I'd like the code to be able to grab all the div tags like <div class="class1"> . But to grab them correctly nested.

I've thought of using RegExp to do this, but am not sure what kind of regexp would be able to grab tags nested correctly.

Any ideas?

Thanks
LVL 1
jaycangelAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

vinnyd79Commented:
Im not sure what you are looking for. Here is an example that will download the file,read it and add the data between the div tags to an array. Is this what you are looking to do?


Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Private Sub Command1_Click()
Dim TmpFile As String, ff As Integer, fData As String
Dim pos1 As Integer, pos2 As Integer, cnt As Long
Dim arrTags() As String

' download file to temp dir
TmpFile = Environ("Temp") & "\TmpFile.tmp"
URLDownloadToFile 0, "http://www.experts-exchange.com", TmpFile, 0, 0

' read file data into fData string
ff = FreeFile
Open TmpFile For Input As #ff
fData = Input$(LOF(ff), ff)
Close #ff

' delete file
Kill TmpFile

cnt = 0
pos1 = 1
While pos1 <> 0
    ' look for div tags
    pos1 = InStr(pos1 + 1, fData, "<div")
    pos2 = InStr(pos1 + 1, fData, "</div>") + 6
    If pos1 = 0 Then GoTo Done
    ' add to array
    ReDim Preserve arrTags(cnt)
    arrTags(cnt) = Mid$(fData, pos1, pos2 - pos1)
    cnt = cnt + 1
Wend
Done:

' loop through array
Dim x As Long
For x = 0 To UBound(arrTags)
    MsgBox arrTags(x)
Next x

End Sub
0
jaycangelAuthor Commented:
What that preserve nested tags?

Problem I had is trying to work out how you would deal with

<div><div>hello</div></div>
0
vinnyd79Commented:
How do you want to deal with that?  Do you want the above to be returned as a single item in the array?
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

jaycangelAuthor Commented:
I'd like it to seperate each div into an array on the same level. So

<div><div>hello1</div></div>
<div><div>hello2</div></div>

would become
array[0] = <div><div>hello1</div></div>
array[1] = <div><div>hello2</div></div>

I can then itterate through each to find the div I'm looking for
0
vinnyd79Commented:
I have an idea on how this could be done. I'll try to put together an example this afternoon.
0
jaycangelAuthor Commented:
Thanks :-)
0
vinnyd79Commented:
ok, give this a try.


Option Explicit
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Dim StartTagStack() As String  'in declarations area
Dim EndTagStack() As String    'in declarations area


Private Sub Command1_Click()
Dim TmpFile As String, ff As Integer, fData As String
Dim cnt As Long, sCnt As Long, eCnt As Long, x As Long, r As Long
Dim arrTags() As String

' initialize vars to 0
cnt = 0: sCnt = 0: eCnt = 0

' redim arrays to 0 to hold positions of tags
ReDim StartTagStack(sCnt)
ReDim EndTagStack(eCnt)

' download file to temp dir
TmpFile = Environ("Temp") & "\TmpFile.tmp"
URLDownloadToFile 0, "http://www.somewebpage.com", TmpFile, 0, 0

''' TmpFile = "C:\Tester.txt"  ' file used for testing

' read file data into fData string
ff = FreeFile
Open TmpFile For Input As #ff
fData = Input$(LOF(ff), ff)
Close #ff

' delete file
Kill TmpFile

' start looking for tags
For x = 1 To Len(fData)
    If LCase(Mid$(fData, x, 4)) = "<div" Then
        ' found <div tag, add position found to StartTagStack
        ReDim Preserve StartTagStack(sCnt)
        StartTagStack(sCnt) = x
        ' increment counter to redim array when another start tag is found
        sCnt = sCnt + 1
    End If
   
    If LCase(Mid$(fData, x, 6)) = "</div>" Then
        ' found </div> tag, add position found to EndTagStack
        ReDim Preserve EndTagStack(eCnt)
        EndTagStack(eCnt) = x
        ' increment counter to redim array when another end tag is found
        eCnt = eCnt + 1
        ' if the # of Start Tags = # of End Tags then we have a complete
        ' tag that will contain any nested tags
        If UBound(StartTagStack) = UBound(EndTagStack) Then
            ' write to arrTags array
            ReDim Preserve arrTags(cnt)
            ' use first item in StartTagStack array and last item in EndTagStack array
            ' to get the string needed to add to array arrTags
            arrTags(cnt) = Mid$(fData, CLng(StartTagStack(LBound(StartTagStack))), CLng(EndTagStack(UBound(EndTagStack))) + 6 - CLng(StartTagStack(LBound(StartTagStack))))
            ' increment array counter for next set of tags found
            cnt = cnt + 1
            ' Clear the stacks
            sCnt = 0
            eCnt = 0
            ReDim StartTagStack(sCnt)
            ReDim EndTagStack(eCnt)
        End If
    End If
Next x

' loop through array
For r = 0 To UBound(arrTags)
    MsgBox arrTags(r)
Next r

End Sub
 
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
jaycangelAuthor Commented:
Thanks, that looks like a big piece of code. Will try it out.

Thank you
0
vinnyd79Commented:
Did you get a chance to try the code out? It worked perfectly with my test file,but if it doesn't work correctly for you then let me know and I will try to determine what the problem is.

Vinny
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic Classic

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.