Link to home
Start Free TrialLog in
Avatar of escaper
escaper

asked on

about geting data from webpage.

about geting data from webpage.

i want to snatch some data from a certain of webpages. these data are mostly be taged with <tr></tr> in html. i think i can use

Twebbrower control.  i think the function should like this ---
first the WebBrowser Control open the webpage , then i use the mouse select the text to make it highlight. then i click a button , the program would auto generate a text file which record the tag path of the selected text. then the program can auto-snatch data  by the text file .

but i realy don't know how to realize it.
could you give me some code?
any comments would be very much appriciated.
thank you.
Avatar of LMuadDIb
LMuadDIb
Flag of United States of America image

there are several ways to scrap a web page.
This link will give you all you need how to do it:

http://delphi.about.com/od/internetintranet/l/aa062502a.htm

I would try to grab the whole table myself.
But it can be tricky if your webpage has several tables and yours in not the first one or you have embedded tables.
After grabbing the web page, grab the html table and they loop thru the table rows

using expressions is fine for certain things and can be quite fast but its not great for all scraping.
if you grab the table and parse the rows, you will then to have remove all the html code more than likely unless you grab just the web page text, but then you will have to somehow locate the table without any html code.

I made a componet that does what exactly you want, but its not available at this time to anyone though I can show you some code on some specifics

if you dont want to use expressions (or cant)
use a keyword approach, search the retrieved text of the web page for a keyword (this could be header text or link.. etc, but it must be a phrase or html code that is unique and will always be displayed on that webpage). Then search for another keyword (like "<table") after the first keyword is found, then copy from this second keyword until the third keyword is found (like "</table>").

You can grab the whole table this way, or instead of keywords just count the tables found in the webpage. Loop thru them until you find the table you need.

There are 3rd party components that can help parse html code (some free) but for simple scrapping you can do the above unless you need speed then use a 3rd party component.
here some code to help grab text from the webpage,
I had to remove some code and change some vars so I the code might not be perfect...

Keyword approach:

var
  ir, tbleFound: string;
  cnt, TablePos:  integer;
begin
    ir := StrListWebPage.Text; //- downloaded webpage loaded into a tstringlist
    cnt := 0;
   TablePos := 3;

      while Pos(KeywordScanFind,ir) > 0 do
      begin
        Delete(ir,1,pos(KeywordScanFind,ir));
        Delete(ir,1,Pos(KeywordScanFrom,ir)-1);
        tbleFound := Copy(ir,1,pos(KeywordScanTo,ir)-1);

        if cnt+1 = TablePos then begin
          break;
        end;
      end;

The Table approach is basically the same except you do not need the KeyWordScanFrom
Avatar of escaper
escaper

ASKER

hi, LMuadDIb ,
  thank you  for your comments.  there isn't any keyword, infact, the data what i want is only numbers and varible text. so i think i couldn't use a keyword for searching. the table is like this

goods_name   description    price
shoe                big             17.99
shirt                middle         17.99

now, i know the url of the webpage, the name will be stable but the price would be changed frequently, and my question is  how to grab the price and the description.

so i said i should select the text first. to  record the path.  then read it.

so i think the most difficult problem is " how to record the path when i select the text"

thank you

Avatar of escaper

ASKER

i mean i want to record the position manually  first time,  then the program get the price autoly.  and i want to it be easily to record, so i think it should be better "if when i select the
price or the description,the program can record the position autoly"

thank you
ASKER CERTIFIED SOLUTION
Avatar of LMuadDIb
LMuadDIb
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I forgot to mention that if the "HTTPGet1.URL"  parameter does not have an extension (.htm, .html etc) then you have to add a last "/" (without the quotes of course) for example: HTTPGet1.URL := 'http://www.yahoo.com/'; otherwise the page won't load !!!
What's wrong? Isn't this what you wanted? Tell us what it is that you want exactly so we can help you!