• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1502
  • Last Modified:

IHTMLDocument2

Hi

I have written an IE BHO that uses the IHTMLDocument2 interface to alter the HTML, i need to be able to get the complete modified source including all the tags <!DOCTYPE....<HTML>....</HTML>. the get_body->get_innerHTML only gives me the <BODY>.....</BODY> section, thats no good i need the lot.

I either need to have it as a sting (BSTR) or be able to save it to a file.

i have already seen the few questions on here that suggest get_body, please dont suggest them... it dont work how i want it to.

Ta

Paul
0
makerp
Asked:
makerp
  • 2
  • 2
  • 2
  • +1
2 Solutions
 
purpleblobCommented:
I've done this in C# but using IHTMLDocument3 object - however the following C++ should work

QI your IHTMLDocument2 to get an IHTMLDocument3 interface (hopefully) you'll get an IHTMLDocument3 interface back and from this you should be able to get what you want.

Here's the source (note: it's in the most basic form, i.e. no ATL etc. and for brevity all error handling has been removed).

IHTMLDocument3 pHTMLDoc3;
pHTMLDoc2->QueryInterface(IID_IHTMLDocument3, reinterpret_cast<void**>(&pHTMLDoc3);

IHTMLElement* pDocElem;
pHTMLDoc3->get_DocumentElement(&pDocElem);

pHTMLDoc3->Release();

BSTR bstrHTML;
pDocElem->get_OuterHTML(&bstrHTML);
pDocElem->Release();

I've not tested this C++ code but converted it to C# (as I've been using the browser recently in C#) and it seems to produce everything <html> to </html>.

0
 
_corey_Commented:
I'm pretty sure that'll work for IHTMLDocument2 ?
0
 
purpleblobCommented:
IHTMLDocument2 doesn't include the get_DocumentElement method. However there's another interface called HTMLDocument which also supports this method and can be used in place of IHTMLDocument3.
0
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

 
gimmeadrinkCommented:
What about this...

HRESULT IHTMLDocument2::get_all(IHTMLElementCollection **p);

Its a zero based collection of all the elements in a webpage.
(see http://msdn.microsoft.com/workshop/author/dhtml/reference/collections/all.asp for more info)


the other option is 'tostring'  (http://msdn.microsoft.com/workshop/browser/mshtml/reference/ifaces/document2/tostring.asp)

I have used both of these (although a while ago) in visual basic, should act the same way for VC++ or whatever you are using.

HTH

0
 
_corey_Commented:
ToString actually does that?  I voiced a thought about that in an earlier question but never actually tried it.
0
 
gimmeadrinkCommented:
hmmm, i might double-check..... it was a while ago that i used it ;)

I'll get back to ya.
0
 
makerpAuthor Commented:
i have solved this by getting the IPersistFile interface from the IHTMLDocument2 object...

thanx anyway
0
 
makerpAuthor Commented:
sorry about the delay - since the update of the look & feel of EE the flash, adverts and animated gifs anoy slightly.

none of the solutions proposed here are really suitable because of the way IE alters the raw HTML in its object model. anyway i will split the points.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 2
  • 2
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now