Link to home
Start Free TrialLog in
Avatar of Slavak
Slavak

asked on

Parsing html document

What is the best way to parse html document?

For example, I download some html page from server and want read one of tables and show it on my string grid.

Can I use WebBrowser.Document interface for this task?
Avatar of Peter_
Peter_

TWebbrowser can be used, try something like this:

Start by including (importing from mshtml.dll) the MSHTML.pas file in uses.

Cast the WebBrowser.Document as IHTMLDocument2 or IHTMLDocument3.

var
 MyDocument: IHTMLDocument3;
 Tables: IHTMLElementsCollection;
 Table: IHTMLTable2;
 Row: IHTMLTableRow2;
 Cell: IHTMLTableCell;

MyDocument := (webbrowser.Document as IHTMLDocument3);

You can access all table elements within the document with:

Tables := (MyDocument.Body.All.tags( 'TABLE' ) as IHTMLElementCollection);

Then look for the element you like to find. This should be easy if it has a name, because you can use:

Table := (Elements.item('NameOfMyTable',EmptyParam) as IHTMLTable2);

Or you have to loop each one:

for I := 0 to Tables.length - 1 do
begin
 Table := Tables.item(EmptyParam,I) as IHTMLTable2;
 ...
end;

When you find the table, get hold of the Rows property and handle each row:

for J:=0 to Table.Rows.Length-1 do
begin
 Row := Table.Rows.Item(J,'');
 for K:=0 to Row.cells.Length-1 do
 begin
  Cell := Row( EmptyParam, K ) as IHTMLTableCell;
  StringGrid1.Cells[K,J] := (( Cell as IHTMLElement).innerText);
 end;
end;

Havent tried running this, but it should be about right...

There are also plenty of different parsers on the net.
If you want a parser that is not reliant on having MSHTML installed then I recommend Turbo Power Internet Professional it has a good parser (and browser component).  It's on sourceforge at http://sourceforge.net/projects/tpipro/

ASKER CERTIFIED SOLUTION
Avatar of Eddie Shipman
Eddie Shipman
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
oops,

this:

StringGrid1.Cells[i+1, j+1]

should be :

StringGrid1.Cells[j+1, i+1]