Link to home
Start Free TrialLog in
Avatar of jaycangel
jaycangel

asked on

Grabbing content in HTML tags from a webpage

I would like to grab content from a webpage and grab HTML nested tags.

For example

<html>
<head>
</head>
<body>
<div><p>Open para here</p></div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here</div>
</div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here1</div>
</div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here2</div>
</div>
</body>

I'd like the code to be able to grab all the div tags like <div class="class1"> . But to grab them correctly nested.

I've thought of using RegExp to do this, but am not sure what kind of regexp would be able to grab tags nested correctly.

Any ideas?

Thanks
Avatar of vinnyd79
vinnyd79

Im not sure what you are looking for. Here is an example that will download the file,read it and add the data between the div tags to an array. Is this what you are looking to do?


Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Private Sub Command1_Click()
Dim TmpFile As String, ff As Integer, fData As String
Dim pos1 As Integer, pos2 As Integer, cnt As Long
Dim arrTags() As String

' download file to temp dir
TmpFile = Environ("Temp") & "\TmpFile.tmp"
URLDownloadToFile 0, "https://www.experts-exchange.com", TmpFile, 0, 0

' read file data into fData string
ff = FreeFile
Open TmpFile For Input As #ff
fData = Input$(LOF(ff), ff)
Close #ff

' delete file
Kill TmpFile

cnt = 0
pos1 = 1
While pos1 <> 0
    ' look for div tags
    pos1 = InStr(pos1 + 1, fData, "<div")
    pos2 = InStr(pos1 + 1, fData, "</div>") + 6
    If pos1 = 0 Then GoTo Done
    ' add to array
    ReDim Preserve arrTags(cnt)
    arrTags(cnt) = Mid$(fData, pos1, pos2 - pos1)
    cnt = cnt + 1
Wend
Done:

' loop through array
Dim x As Long
For x = 0 To UBound(arrTags)
    MsgBox arrTags(x)
Next x

End Sub
Avatar of jaycangel

ASKER

What that preserve nested tags?

Problem I had is trying to work out how you would deal with

<div><div>hello</div></div>
How do you want to deal with that?  Do you want the above to be returned as a single item in the array?
I'd like it to seperate each div into an array on the same level. So

<div><div>hello1</div></div>
<div><div>hello2</div></div>

would become
array[0] = <div><div>hello1</div></div>
array[1] = <div><div>hello2</div></div>

I can then itterate through each to find the div I'm looking for
I have an idea on how this could be done. I'll try to put together an example this afternoon.
Thanks :-)
ASKER CERTIFIED SOLUTION
Avatar of vinnyd79
vinnyd79

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks, that looks like a big piece of code. Will try it out.

Thank you
Did you get a chance to try the code out? It worked perfectly with my test file,but if it doesn't work correctly for you then let me know and I will try to determine what the problem is.

Vinny