[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 237
  • Last Modified:

Grabbing content in HTML tags from a webpage

I would like to grab content from a webpage and grab HTML nested tags.

For example

<html>
<head>
</head>
<body>
<div><p>Open para here</p></div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here</div>
</div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here1</div>
</div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here2</div>
</div>
</body>

I'd like the code to be able to grab all the div tags like <div class="class1"> . But to grab them correctly nested.

I've thought of using RegExp to do this, but am not sure what kind of regexp would be able to grab tags nested correctly.

Any ideas?

Thanks
0
jaycangel
Asked:
jaycangel
  • 5
  • 4
1 Solution
 
vinnyd79Commented:
Im not sure what you are looking for. Here is an example that will download the file,read it and add the data between the div tags to an array. Is this what you are looking to do?


Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Private Sub Command1_Click()
Dim TmpFile As String, ff As Integer, fData As String
Dim pos1 As Integer, pos2 As Integer, cnt As Long
Dim arrTags() As String

' download file to temp dir
TmpFile = Environ("Temp") & "\TmpFile.tmp"
URLDownloadToFile 0, "http://www.experts-exchange.com", TmpFile, 0, 0

' read file data into fData string
ff = FreeFile
Open TmpFile For Input As #ff
fData = Input$(LOF(ff), ff)
Close #ff

' delete file
Kill TmpFile

cnt = 0
pos1 = 1
While pos1 <> 0
    ' look for div tags
    pos1 = InStr(pos1 + 1, fData, "<div")
    pos2 = InStr(pos1 + 1, fData, "</div>") + 6
    If pos1 = 0 Then GoTo Done
    ' add to array
    ReDim Preserve arrTags(cnt)
    arrTags(cnt) = Mid$(fData, pos1, pos2 - pos1)
    cnt = cnt + 1
Wend
Done:

' loop through array
Dim x As Long
For x = 0 To UBound(arrTags)
    MsgBox arrTags(x)
Next x

End Sub
0
 
jaycangelAuthor Commented:
What that preserve nested tags?

Problem I had is trying to work out how you would deal with

<div><div>hello</div></div>
0
 
vinnyd79Commented:
How do you want to deal with that?  Do you want the above to be returned as a single item in the array?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
jaycangelAuthor Commented:
I'd like it to seperate each div into an array on the same level. So

<div><div>hello1</div></div>
<div><div>hello2</div></div>

would become
array[0] = <div><div>hello1</div></div>
array[1] = <div><div>hello2</div></div>

I can then itterate through each to find the div I'm looking for
0
 
vinnyd79Commented:
I have an idea on how this could be done. I'll try to put together an example this afternoon.
0
 
jaycangelAuthor Commented:
Thanks :-)
0
 
vinnyd79Commented:
ok, give this a try.


Option Explicit
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Dim StartTagStack() As String  'in declarations area
Dim EndTagStack() As String    'in declarations area


Private Sub Command1_Click()
Dim TmpFile As String, ff As Integer, fData As String
Dim cnt As Long, sCnt As Long, eCnt As Long, x As Long, r As Long
Dim arrTags() As String

' initialize vars to 0
cnt = 0: sCnt = 0: eCnt = 0

' redim arrays to 0 to hold positions of tags
ReDim StartTagStack(sCnt)
ReDim EndTagStack(eCnt)

' download file to temp dir
TmpFile = Environ("Temp") & "\TmpFile.tmp"
URLDownloadToFile 0, "http://www.somewebpage.com", TmpFile, 0, 0

''' TmpFile = "C:\Tester.txt"  ' file used for testing

' read file data into fData string
ff = FreeFile
Open TmpFile For Input As #ff
fData = Input$(LOF(ff), ff)
Close #ff

' delete file
Kill TmpFile

' start looking for tags
For x = 1 To Len(fData)
    If LCase(Mid$(fData, x, 4)) = "<div" Then
        ' found <div tag, add position found to StartTagStack
        ReDim Preserve StartTagStack(sCnt)
        StartTagStack(sCnt) = x
        ' increment counter to redim array when another start tag is found
        sCnt = sCnt + 1
    End If
   
    If LCase(Mid$(fData, x, 6)) = "</div>" Then
        ' found </div> tag, add position found to EndTagStack
        ReDim Preserve EndTagStack(eCnt)
        EndTagStack(eCnt) = x
        ' increment counter to redim array when another end tag is found
        eCnt = eCnt + 1
        ' if the # of Start Tags = # of End Tags then we have a complete
        ' tag that will contain any nested tags
        If UBound(StartTagStack) = UBound(EndTagStack) Then
            ' write to arrTags array
            ReDim Preserve arrTags(cnt)
            ' use first item in StartTagStack array and last item in EndTagStack array
            ' to get the string needed to add to array arrTags
            arrTags(cnt) = Mid$(fData, CLng(StartTagStack(LBound(StartTagStack))), CLng(EndTagStack(UBound(EndTagStack))) + 6 - CLng(StartTagStack(LBound(StartTagStack))))
            ' increment array counter for next set of tags found
            cnt = cnt + 1
            ' Clear the stacks
            sCnt = 0
            eCnt = 0
            ReDim StartTagStack(sCnt)
            ReDim EndTagStack(eCnt)
        End If
    End If
Next x

' loop through array
For r = 0 To UBound(arrTags)
    MsgBox arrTags(r)
Next r

End Sub
 
0
 
jaycangelAuthor Commented:
Thanks, that looks like a big piece of code. Will try it out.

Thank you
0
 
vinnyd79Commented:
Did you get a chance to try the code out? It worked perfectly with my test file,but if it doesn't work correctly for you then let me know and I will try to determine what the problem is.

Vinny
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 5
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now