Link to home
Start Free TrialLog in
Avatar of ArthurDent99
ArthurDent99

asked on

Capture XML data only from TWebBrowser XML/XSL browsed website

I am running Delphi 7 Enterprise on Windows XP SP2 with Internet Explorer 6.

I have a program that needs to access some XML data downloaded from a certain website.  The website sends XML data with an XSL stylesheet, which Internet Explorer renders into formatted HTML for viewing.  Most of the methods I've tried so far have been accessing the rendered HTML instead of the underlying XML data directly.  I want to work with the XML data directly, probably with TXMLDocument (unless someone can point me to something better).

Ironically, if you right-click on the WebBrowser and select 'View Source', a Notepad window pops up with JUST the XML data inside, which is exactly what I want; however, my program will be browsing through several hundred web pages automatically, and having it click on 'View Source', then saving the file, then re-loading the file, then processing the XML would take too long in the long run.  I need something a little speedier than having to save, then reload, a file every time I want to work with the XML.

For an example, here is a website that also sends XML data with an XSL stylesheet:
http://www.comptechdoc.org/independent/web/xml/guide/langlist.xml

It's important that I can work with the XML document directly, because in my case, the XML document contains data which is not rendered by the XSL stylesheet, so extracting my data from the rendered HTML would not give me all the data I need.

I've tried saving the contents of TWebBrowser into a StringStream, then loading the StringStream into a string, but not only does that return the rendered HTML and not the XML alone, it also returns it as UTF-16 with #0 characters after every character.  Not fun to read.

Is there a way to get the XML data from TWebBrowser into TXMLDocument?  Preferrably without saving a temporary file?
Avatar of 2266180
2266180
Flag of United States of America image

why not simply use indy or ics to get the xml data?
like idhttp1.get('http://www.comptechdoc.org/independent/web/xml/guide/langlist.xml');
Avatar of ArthurDent99
ArthurDent99

ASKER

In my case, the XML data I am accessing is in response to a Post... I just tried using IdHTTP1.Post, and it does indeed return just the XML data as a string, which I can then put into XMLDocument1 and process.  So, if no one can answer how to make TWebBrowser do the same thing, I might go ahead and accept your answer....  but I'd still prefer to get the same result out of using WebBrowser.

IdHTTP1.Post only submits data in HTTP/1.0 format, while WebBrowser submits it in HTTP/1.1 format.  Also, WebBrowser is submitting cookies, while IdHTTP1.Post isn't.  Most importantly, the WebBrowser is providing a visual feedback of progress for the user and can also allow user interaction, while the IdHTTP1 works invisibly.

Anyone got any ideas how to make this work?
ASKER CERTIFIED SOLUTION
Avatar of Russell Libby
Russell Libby
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ovDocument.XMLDocument.XML was exactly what I was looking for!  I was able to browse to a page manually, then hit a button to call ovDocument.XMLDocument.XML, and the XML parsed beautifully.  Thank you very much!

Just for future information, where do you find documentation for OleVariant?

The OleVariant is just the container, nothing special although it can hold any of 13 different data types. What I believe you are interseted in is the the DOM documentation for the browser control. The MSDN online is a good resource for that (eg, search on IWebBrowser or IWebBrowser2, IHTMLDocument2, etc).

A jumping point for you:
http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/webbrowser/reference/ifaces/iwebbrowser2/iwebbrowser2.asp

Regards,
Russell