Solved

Capture XML data only from TWebBrowser XML/XSL browsed website

Posted on 2006-07-19
5
2,146 Views
Last Modified: 2010-08-05
I am running Delphi 7 Enterprise on Windows XP SP2 with Internet Explorer 6.

I have a program that needs to access some XML data downloaded from a certain website.  The website sends XML data with an XSL stylesheet, which Internet Explorer renders into formatted HTML for viewing.  Most of the methods I've tried so far have been accessing the rendered HTML instead of the underlying XML data directly.  I want to work with the XML data directly, probably with TXMLDocument (unless someone can point me to something better).

Ironically, if you right-click on the WebBrowser and select 'View Source', a Notepad window pops up with JUST the XML data inside, which is exactly what I want; however, my program will be browsing through several hundred web pages automatically, and having it click on 'View Source', then saving the file, then re-loading the file, then processing the XML would take too long in the long run.  I need something a little speedier than having to save, then reload, a file every time I want to work with the XML.

For an example, here is a website that also sends XML data with an XSL stylesheet:
http://www.comptechdoc.org/independent/web/xml/guide/langlist.xml

It's important that I can work with the XML document directly, because in my case, the XML document contains data which is not rendered by the XSL stylesheet, so extracting my data from the rendered HTML would not give me all the data I need.

I've tried saving the contents of TWebBrowser into a StringStream, then loading the StringStream into a string, but not only does that return the rendered HTML and not the XML alone, it also returns it as UTF-16 with #0 characters after every character.  Not fun to read.

Is there a way to get the XML data from TWebBrowser into TXMLDocument?  Preferrably without saving a temporary file?
0
Comment
Question by:ArthurDent99
  • 2
  • 2
5 Comments
 
LVL 28

Expert Comment

by:ciuly
ID: 17143756
why not simply use indy or ics to get the xml data?
like idhttp1.get('http://www.comptechdoc.org/independent/web/xml/guide/langlist.xml');
0
 

Author Comment

by:ArthurDent99
ID: 17148306
In my case, the XML data I am accessing is in response to a Post... I just tried using IdHTTP1.Post, and it does indeed return just the XML data as a string, which I can then put into XMLDocument1 and process.  So, if no one can answer how to make TWebBrowser do the same thing, I might go ahead and accept your answer....  but I'd still prefer to get the same result out of using WebBrowser.

IdHTTP1.Post only submits data in HTTP/1.0 format, while WebBrowser submits it in HTTP/1.1 format.  Also, WebBrowser is submitting cookies, while IdHTTP1.Post isn't.  Most importantly, the WebBrowser is providing a visual feedback of progress for the user and can also allow user interaction, while the IdHTTP1 works invisibly.

Anyone got any ideas how to make this work?
0
 
LVL 26

Accepted Solution

by:
Russell Libby earned 500 total points
ID: 17151414
Its pretty straight forward to do. Below is an example project (source / dfm) using the above link. Done in D5, so don't forget to include Variants in your uses clause.

Regards,
Russell

--

unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  StdCtrls, OleCtrls, SHDocVw, ComObj, ActiveX;  { Include Variants for D6 and up}

type
  TForm1            =  class(TForm)
     Button1:       TButton;
     Edit1:         TEdit;
     WebBrowser1:   TWebBrowser;
     Memo1:         TMemo;
     procedure      Button1Click(Sender: TObject);
     procedure      WebBrowser1DocumentComplete(Sender: TObject; const pDisp: IDispatch; var URL: OleVariant);
     procedure      WebBrowser1NavigateComplete2(Sender: TObject; const pDisp: IDispatch; var URL: OleVariant);
  private
     // Private declarations
     FDispatch:      IDispatch;
  public
     // Public declarations
  end;

var
  Form1:            TForm1;

implementation

{$R *.DFM}

procedure TForm1.Button1Click(Sender: TObject);
begin
  FDispatch:=nil;
  WebBrowser1.Navigate(Edit1.Text);
end;

procedure TForm1.WebBrowser1DocumentComplete(Sender: TObject; const pDisp: IDispatch; var URL: OleVariant);
var  ovDocument:    OleVariant;
begin
  if Assigned(pDisp) and (pDisp = FDispatch) then
  begin
     ovDocument:=WebBrowser1.Document;
     Memo1.Text:=ovDocument.XMLDocument.XML;
  end;
end;

procedure TForm1.WebBrowser1NavigateComplete2(Sender: TObject; const pDisp: IDispatch; var URL: OleVariant);
begin
  if (FDispatch = nil) then FDispatch:=pDisp;
end;

end.


--- dfm ---
object Form1: TForm1
  Left = 255
  Top = 114
  Width = 654
  Height = 540
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  Position = poScreenCenter
  Scaled = False
  PixelsPerInch = 96
  TextHeight = 13
  object Button1: TButton
    Left = 12
    Top = 12
    Width = 75
    Height = 21
    Caption = 'Go'
    TabOrder = 0
    OnClick = Button1Click
  end
  object Edit1: TEdit
    Left = 92
    Top = 12
    Width = 541
    Height = 21
    TabOrder = 1
    Text =
      'http://www.comptechdoc.org/independent/web/xml/guide/langlist.xm' +
      'l'
  end
  object WebBrowser1: TWebBrowser
    Left = 12
    Top = 40
    Width = 621
    Height = 225
    TabOrder = 2
    OnNavigateComplete2 = WebBrowser1NavigateComplete2
    OnDocumentComplete = WebBrowser1DocumentComplete
    ControlData = {
      4C0000002F400000411700000000000000000000000000000000000000000000
      000000004C000000000000000000000001000000E0D057007335CF11AE690800
      2B2E126208000000000000004C0000000114020000000000C000000000000046
      8000000000000000000000000000000000000000000000000000000000000000
      00000000000000000100000000000000000000000000000000000000}
  end
  object Memo1: TMemo
    Left = 8
    Top = 272
    Width = 625
    Height = 221
    Font.Charset = ANSI_CHARSET
    Font.Color = clWindowText
    Font.Height = -11
    Font.Name = 'Courier New'
    Font.Style = []
    ParentFont = False
    ScrollBars = ssBoth
    TabOrder = 3
  end
end
0
 

Author Comment

by:ArthurDent99
ID: 17172438
ovDocument.XMLDocument.XML was exactly what I was looking for!  I was able to browse to a page manually, then hit a button to call ovDocument.XMLDocument.XML, and the XML parsed beautifully.  Thank you very much!

Just for future information, where do you find documentation for OleVariant?
0
 
LVL 26

Expert Comment

by:Russell Libby
ID: 17176416

The OleVariant is just the container, nothing special although it can hold any of 13 different data types. What I believe you are interseted in is the the DOM documentation for the browser control. The MSDN online is a good resource for that (eg, search on IWebBrowser or IWebBrowser2, IHTMLDocument2, etc).

A jumping point for you:
http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/webbrowser/reference/ifaces/iwebbrowser2/iwebbrowser2.asp

Regards,
Russell


0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Introduction I have seen many questions in this Delphi topic area where queries in threads are needed or suggested. I know bumped into a similar need. This article will address some of the concepts when dealing with a multithreaded delphi database…
This video discusses moving either the default database or any database to a new volume.
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now