?
Solved

HTML Table To StringGRID?

Posted on 2011-05-12
9
Medium Priority
?
2,803 Views
Last Modified: 2012-05-11
Hi,

I have an HTML file which contains only one HTML table. I need to extract this html table to TStringGrid Component. Can anyone help me with this task?
0
Comment
Question by:MissManal
  • 4
  • 3
7 Comments
 
LVL 25

Expert Comment

by:epasquier
ID: 35745948
hum. CSV file is much much much (and maybe another much) more easier to handle than HTML table.

For once, there could be a lot of HTML garbage around your table, and other tags inside the cells. And I will not speak about nested tables which would complicate the task.

So, first, I will assume you can isolate the HTML fragment containing ONLY the <TABLE>...</TABLE> text in a string.
Please provide such a sample so that we can test, in the meantime I will code a procedure that fills the stringlist :

Procedure FillStringGridWithHTMLTable(aStrGrid:TStringGrid;HTMLTable:String);
begin
...
end;
0
 

Author Comment

by:MissManal
ID: 35746042
I can't convert the html table to csv so please is there anyway to extract the html table from the html page and then convert it to stringgrid table?
0
 
LVL 25

Expert Comment

by:epasquier
ID: 35746109
and here it is
Procedure FillStringGridWithHTMLTable(aStrGrid:TStringGrid;HTMLTable:String);
Var
 P,TagStart:Integer;
 Function GetNextTag(Var Tag:String):Boolean;
 Var
  P2:Integer;
 begin
  Result:=False;
  Repeat
   P:=PosEx('<',HTMLTable,P);
   if P<1 then Exit;
   TagStart:=P;
   P2:=PosEx('>',HTMLTable,P);
   if P2<1 then Exit;
   Tag:=Trim(Copy(HTMLTable,P+1,P2-P-1));
   P:=P2+1;
   P2:=Pos(' ',Tag);
   if P2>0 then Tag:=Copy(Tag,1,P2-1);
   Tag:=UpperCase(Tag);
   Result:=(Tag='TR') Or (Tag='/TR') Or (Tag='TD') Or (Tag='/TD') Or (Tag='/TABLE');
  Until Result;
 end;
 procedure AddCell(L,C:Integer;Text:String);
 begin
  if L>=aStrGrid.RowCount then aStrGrid.RowCount:=aStrGrid.RowCount+1;
  if C>=aStrGrid.ColCount then
   begin
    aStrGrid.ColCount:=aStrGrid.ColCount+1;
    aStrGrid.Cells[aStrGrid.ColCount-1,0]:='Col'+IntToStr(aStrGrid.ColCount);
   end;
  try
   aStrGrid.Cells[C,L]:=Text;
  except
   // should not happen
  end;
 end;

Var
 L,C,CellStart:Integer;
 Tag:String;
begin
 P:=1;
 L:=1;
 C:=0;
 aStrGrid.RowCount:=2;
 aStrGrid.ColCount:=1;
 aStrGrid.Cells[0,0]:='Col1';
 while GetNextTag(Tag) do
  begin
   if Tag='TR' then C:=0;
   if Tag='/TR' then Inc(L);
   if Tag='TD' then CellStart:=P;
   if Tag='/TD' then
    begin
     AddCell(L,C,Copy(HTMLTable,CellStart,TagStart-CellStart));
     Inc(C);
    end;
   if Tag='/TABLE' then Exit;
  end;
end;

Open in new window

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 

Author Comment

by:MissManal
ID: 35749559
Please give us the link of the source since you removed it.
0
 

Author Comment

by:MissManal
ID: 35749640
I tried the code but it seems not working as i expected.

What I need is a function that extract a table from html file To stringgrid. The table size is variance and it different from html page to page.

For example: the code should extract the table from her
http://news.bbc.co.uk/sport1/hi/football/eng_prem/table/default.stm


and can also extract it from her too
http://web.wi.mit.edu/young/expression/table.html

and also her
http://www.independent.co.uk/news/education/schools/gcse-results-comprehensive-school-results-table-2060948.html

So the code should works in any Table size and it should automatically ignore the non-table html code and only extract the table.
0
 
LVL 25

Accepted Solution

by:
epasquier earned 2000 total points
ID: 35750594
do you have any knowledge about HTML and today's websites techniques ?

Have you understood what I said about the fact that, aside your stated question, you have a complex task to extract the <TABLE>...</TABLE> data ?
Because HTML is not a structured language that is data-oriented, but presentation-oriented. XML was invented for that particular reason, and HTML since then has been allowed to grow in a direction where data retrieval was about the least concern. All the more with dynamic web-sites.
Your websites are choked-full of scripts and nested tables...

Here is the best I could do with those. Please note that I had first to capture the resulting HTML code of the tables (i-e what was generated by the scripts), by a copy operation from a browser, and a 'special paste as HTML code' with the help of an advanded text editor. I put the results in a memo, after adding some line returns before all <td> tags to make it **readable**. As you will see the code generated from some scripts is a pile of crap that no human can grasp.

Yet, my function can get some sense out of it.
I added a CleanInnerText function to get rid of all the presentation tags that can be inside the cells, and remove all unnecessary line returns and spaces.

Please understand that there is NO WAY you could create a generic function that could get a 'table' as you see one in a website, that could work on all. Just because there are too many ways to code 'tables' and the simplest  where only ONE <TABLE> is there, with some <TR><TD></TD><TD></TD>..</TR> lines in it, has become the least common of all tables flavour one can find in Internet.
And even then, a table is rarely alone on one page, so you have to identify it first. I doubt you could create a program that can magically find out what is it your brain wish, if you don't help it a bit.
unit Unit1;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, StdCtrls, ExtCtrls, Grids;

type
  TForm1 = class(TForm)
    Splitter1: TSplitter;
    StringGrid1: TStringGrid;
    Panel1: TPanel;
    Button1: TButton;
    rb1: TRadioButton;
    rb2: TRadioButton;
    Panel2: TPanel;
    Memo1: TMemo;
    Memo2: TMemo;
    procedure rb2Click(Sender: TObject);
    procedure Button1Click(Sender: TObject);
  private
    { Déclarations privées }
   Memo:TMemo;
  public
    { Déclarations publiques }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

Uses StrUtils;

Function CleanInnerText(T:String):String;
Var
 P,P2:Integer;
begin
 P:=1;
 Repeat
  P:=PosEx('<',T,P);
  if P=0 then break;
  P2:=PosEx('>',T,P);
  Delete(T,P,P2-P+1);
 Until False;
 T:=StringReplace(T,#13,' ',[rfReplaceAll]);
 T:=StringReplace(T,#10,' ',[rfReplaceAll]);
 P:=Length(T);
 Repeat
  P2:=P;
  T:=StringReplace(T,'  ',' ',[rfReplaceAll]);
  P:=Length(T);
 Until P=P2;
 Result:=Trim(T);
end;

Procedure FillStringGridWithHTMLTable(aStrGrid:TStringGrid;HTMLTable:String);
Var
 P,TagStart:Integer;
 Function GetNextTag(Var Tag:String):Boolean;
 Var
  P2:Integer;
 begin
  Result:=False;
  Repeat
   P:=PosEx('<',HTMLTable,P);
   if P<1 then Exit;
   TagStart:=P;
   P2:=PosEx('>',HTMLTable,P);
   if P2<1 then Exit;
   Tag:=Trim(Copy(HTMLTable,P+1,P2-P-1));
   P:=P2+1;
   P2:=Pos(' ',Tag);
   if P2>0 then Tag:=Copy(Tag,1,P2-1);
   Tag:=UpperCase(Tag);
   if Tag='TH' then Tag:='TD';
   if Tag='/TH' then Tag:='/TD';
   Result:=(Tag='TR') Or (Tag='/TR') Or (Tag='TD') Or (Tag='/TD') Or (Tag='/TABLE');
  Until Result;
 end;
 procedure AddCell(L,C:Integer;Text:String);
 begin
  if L>=aStrGrid.RowCount then aStrGrid.RowCount:=aStrGrid.RowCount+1;
  if C>=aStrGrid.ColCount then
   begin
    aStrGrid.ColCount:=aStrGrid.ColCount+1;
    aStrGrid.Cells[aStrGrid.ColCount-1,0]:='Col'+IntToStr(aStrGrid.ColCount);
   end;
  aStrGrid.Cells[C,L]:=CleanInnerText(Text);
 end;

Var
 L,C,CellStart:Integer;
 Tag:String;
begin
 P:=1;
 L:=1;
 C:=0;
 aStrGrid.RowCount:=2;
 aStrGrid.ColCount:=1;
 aStrGrid.Cells[0,0]:='Col1';
 while GetNextTag(Tag) do
  begin
   if Tag='TR' then C:=0;
   if Tag='/TR' then Inc(L);
   if Tag='TD' then CellStart:=P;
   if Tag='/TD' then
    begin
     AddCell(L,C,Copy(HTMLTable,CellStart,TagStart-CellStart));
     Inc(C);
    end;
   if Tag='/TABLE' then Exit;
  end;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
 if Memo1.Visible then Memo:=Memo1 Else Memo:=Memo2;
 FillStringGridWithHTMLTable(StringGrid1,Memo.Text);
end;

procedure TForm1.rb2Click(Sender: TObject);
begin
 if Sender=rb1 then Memo:=Memo1 Else Memo:=Memo2;
 Memo.Visible:=True;
 if Sender=rb1 then Memo2.Visible:=False Else Memo1.Visible:=False;
 FillStringGridWithHTMLTable(StringGrid1,Memo.Text);
end;

end.

Open in new window

HTMLTable.zip
0
 

Author Closing Comment

by:MissManal
ID: 35750738
:)
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
Have you ever had your Delphi form/application just hanging while waiting for data to load? This is the article to read if you want to learn some things about adding threads for data loading in the background. First, I'll setup a general applica…
Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video. If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, …
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Suggested Courses
Course of the Month13 days, 12 hours left to enroll

755 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question