jaycangel
asked on
Grabbing content in HTML tags from a webpage
I would like to grab content from a webpage and grab HTML nested tags.
For example
<html>
<head>
</head>
<body>
<div><p>Open para here</p></div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here</div>
</div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here1</div>
</div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here2</div>
</div>
</body>
I'd like the code to be able to grab all the div tags like <div class="class1"> . But to grab them correctly nested.
I've thought of using RegExp to do this, but am not sure what kind of regexp would be able to grab tags nested correctly.
Any ideas?
Thanks
For example
<html>
<head>
</head>
<body>
<div><p>Open para here</p></div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here</div>
</div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here1</div>
</div>
<div class="class1">
<p>Title here<img src="xxx.gif"/></p>text here</p><div>some content here2</div>
</div>
</body>
I'd like the code to be able to grab all the div tags like <div class="class1"> . But to grab them correctly nested.
I've thought of using RegExp to do this, but am not sure what kind of regexp would be able to grab tags nested correctly.
Any ideas?
Thanks
ASKER
What that preserve nested tags?
Problem I had is trying to work out how you would deal with
<div><div>hello</div></div >
Problem I had is trying to work out how you would deal with
<div><div>hello</div></div
How do you want to deal with that? Do you want the above to be returned as a single item in the array?
ASKER
I'd like it to seperate each div into an array on the same level. So
<div><div>hello1</div></di v>
<div><div>hello2</div></di v>
would become
array[0] = <div><div>hello1</div></di v>
array[1] = <div><div>hello2</div></di v>
I can then itterate through each to find the div I'm looking for
<div><div>hello1</div></di
<div><div>hello2</div></di
would become
array[0] = <div><div>hello1</div></di
array[1] = <div><div>hello2</div></di
I can then itterate through each to find the div I'm looking for
I have an idea on how this could be done. I'll try to put together an example this afternoon.
ASKER
Thanks :-)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks, that looks like a big piece of code. Will try it out.
Thank you
Thank you
Did you get a chance to try the code out? It worked perfectly with my test file,but if it doesn't work correctly for you then let me know and I will try to determine what the problem is.
Vinny
Vinny
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
Private Sub Command1_Click()
Dim TmpFile As String, ff As Integer, fData As String
Dim pos1 As Integer, pos2 As Integer, cnt As Long
Dim arrTags() As String
' download file to temp dir
TmpFile = Environ("Temp") & "\TmpFile.tmp"
URLDownloadToFile 0, "https://www.experts-exchange.com", TmpFile, 0, 0
' read file data into fData string
ff = FreeFile
Open TmpFile For Input As #ff
fData = Input$(LOF(ff), ff)
Close #ff
' delete file
Kill TmpFile
cnt = 0
pos1 = 1
While pos1 <> 0
' look for div tags
pos1 = InStr(pos1 + 1, fData, "<div")
pos2 = InStr(pos1 + 1, fData, "</div>") + 6
If pos1 = 0 Then GoTo Done
' add to array
ReDim Preserve arrTags(cnt)
arrTags(cnt) = Mid$(fData, pos1, pos2 - pos1)
cnt = cnt + 1
Wend
Done:
' loop through array
Dim x As Long
For x = 0 To UBound(arrTags)
MsgBox arrTags(x)
Next x
End Sub