[Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Anchor Text elements in the VB6 WebBrowser Control - Advanced

Posted on 2007-08-09
7
Medium Priority
?
1,496 Views
Last Modified: 2013-12-26
I am programming in vb6, but not Dot Net, so I would like to find a solution for vb6.


Here is my code:

set HTMLdoc = frmBrowser.WebBrowser1.Document
pageSource = frmBrowser.WebBrowser1.Document.body.innerHTML

if InStr(pageSource, Text1,) <> 0 Then
  ' text found
For Each HTMLlinks in HTMLdoc.links
   strResult = HTMLlinks.href Like "*/target-text/*"
   If strResult = True Then
     MsgBox HTMLlinks.href        ' HERE IS MY QUESTION
   end if
Next HTMLlinks
End If

The code works as intended. It is only giving me the links that I need to get from the HTML source.

The problem I am experiencing is that I also want to extract the Anchor Text from the same link.

Is there a way to do this through a HTMLlinks.property?

All I can figure to do right now is to extract the target URL's and then to use Split commands to widdle down the innerHTML to extract the specific anchor text from the connected link. This is problematic if the page references the same URL more than one time, so I would need to build the extraction code within the procedure above to make sure that I am getting the correct Anchor Text from the correct link.

So, is there a way that I can accomplish this using the webbrowser control elements in vb6?

I hope so.

Thanks.

Bill
0
Comment
Question by:plattservicesinc
  • 4
  • 3
7 Comments
 
LVL 3

Expert Comment

by:VR4
ID: 19667749
why don't you just iterate through anchors
For Each HTMLanchor In frmBrowser.WebBrowser1.Document.anchors
    MsgBox htmlanchor.href
Next

You could also just go through all elements and then
do something like
if WebBrowser1.Document.body.All.Item(X).nodeName="A" then
...
end if

If the above not it, what is your specific example of the link and the info you are trying to get. Use a specific URL
0
 

Author Comment

by:plattservicesinc
ID: 19678504
I hate to think that my brain does not work right, but this dom stuff is getting the better of me.

I tried your code here:

For Each HTMLanchor In frmBrowser.WebBrowser1.Document.anchors
    MsgBox htmlanchor.href
Next

This just engages me in an endless search for data. My process complete cue never fires. I tried changing it to:

For Each HTMLanchor In frmBrowser.WebBrowser1.Document.body.all
    MsgBox htmlanchor.href
Next

Then the code shows me the links in the href tags, but I already had that. Next I tried to work with your second code:

if WebBrowser1.Document.body.All.Item(X).nodeName="A" then
...
end if

For some reason, I am unable to determine how to define (X), but I cannot figure it out. I even tried iterating X in a for statement, resulting in another hung process with zero results.

Please show more detail so that my silly brain can wrap itself around this solution.

0
 
LVL 3

Expert Comment

by:VR4
ID: 19688702
X is a loop variable that you increment ...

Why don't you give me an example of the page and the actual tag and info you are searching for.
I'll then give you a sample based on that.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:plattservicesinc
ID: 19688863
Ok. Here is a sample page:

http://www.sitepronews.com/archives/2007/apr/27.html

Within this page, there are maybe 100 links, but only one of them goes to http://thephantomwriters.com/

I have been able to document the single link on the page that points to the thephantomwriters.com domain, but I need to also retrieve the actual "link text" that is connected to the link itself, in this case, "article ghost writing and article distribution".

I tell the software which target domain to search for in the page, so the software knows in advance that it is only going to pull links that point to my domain: thephantomwriters.com

What I want the software to do is to give me three outputs for this page:

Source Link: http://www.sitepronews.com/archives/2007/apr/27.html
Target Link: http://thephantomwriters.com/ghostwriting
Keywords in Link: article ghost writing and article distribution

I have so far been able to get the first two fields, but I am having problems with the third. By using Split commands, I have been able to extract the info, but not consistently..In about 90% of the cases, the software cannot see the Keywords In The Link.

Right now, I must manually verify the information given to me by the software.

My hope is that there will be a way to use the DOM to extract this info more reliably.

This sample page is only that, a sample. The information is always placed on page in different ways, so my system must be flexible enough to see the variations in the link layouts on the page. That is another reason why I thought the DOM might be able to help me extract this data.

Thank you for your assistance.
0
 
LVL 3

Accepted Solution

by:
VR4 earned 500 total points
ID: 19689632
Ok, I know you have it, but here goes all of it  :)
WB is myname of the webbrowser control on the form


Found=false
For L = 0 To WB.Document..links.length - 1
  if instr(wb.links(L).href ,"http://thephantomwriters.com/") then  'or you could check Left(....,
        SourceSt=wb.LocationURL
        TargetSt= wb.links(L).href
        KeywrdSt=wb.links(L).innerText     'or you could get HTML with   = wb.links(L).innerHTML
        Found=True
        exit for
   end if
Next L

P.S. The code above is just typed in, unfortunately I have no time to test it in code. It should work
0
 
LVL 3

Expert Comment

by:VR4
ID: 19689635
By the way you can loop with For Each just as well.
0
 

Author Comment

by:plattservicesinc
ID: 19689915
Thank you. Your solution was right on the mark with the innerText. Thank you.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
We live in a world of interfaces like the one in the title picture. VBA also allows to use interfaces which offers a lot of possibilities. This article describes how to use interfaces in VBA and how to work around their bugs.
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Six Sigma Control Plans

868 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question