asked on

about geting data from webpage.

about geting data from webpage.

i want to snatch some data from a certain of webpages. these data are mostly be taged with <tr></tr> in html. i think i can use

Twebbrower control. i think the function should like this ---
first the WebBrowser Control open the webpage , then i use the mouse select the text to make it highlight. then i click a button , the program would auto generate a text file which record the tag path of the selected text. then the program can auto-snatch data by the text file .

but i realy don't know how to realize it.
could you give me some code?
any comments would be very much appriciated.
thank you.

LMuadDIb

there are several ways to scrap a web page.
This link will give you all you need how to do it:

http://delphi.about.com/od/internetintranet/l/aa062502a.htm

I would try to grab the whole table myself.
But it can be tricky if your webpage has several tables and yours in not the first one or you have embedded tables.
After grabbing the web page, grab the html table and they loop thru the table rows

using expressions is fine for certain things and can be quite fast but its not great for all scraping.
if you grab the table and parse the rows, you will then to have remove all the html code more than likely unless you grab just the web page text, but then you will have to somehow locate the table without any html code.

I made a componet that does what exactly you want, but its not available at this time to anyone though I can show you some code on some specifics

LMuadDIb

if you dont want to use expressions (or cant)
use a keyword approach, search the retrieved text of the web page for a keyword (this could be header text or link.. etc, but it must be a phrase or html code that is unique and will always be displayed on that webpage). Then search for another keyword (like "<table") after the first keyword is found, then copy from this second keyword until the third keyword is found (like "</table>").

You can grab the whole table this way, or instead of keywords just count the tables found in the webpage. Loop thru them until you find the table you need.

There are 3rd party components that can help parse html code (some free) but for simple scrapping you can do the above unless you need speed then use a 3rd party component.

LMuadDIb

here some code to help grab text from the webpage,
I had to remove some code and change some vars so I the code might not be perfect...

Keyword approach:

var
ir, tbleFound: string;
cnt, TablePos: integer;
begin
ir := StrListWebPage.Text; //- downloaded webpage loaded into a tstringlist
cnt := 0;
TablePos := 3;

while Pos(KeywordScanFind,ir) > 0 do
begin
Delete(ir,1,pos(KeywordScanFind,ir));
Delete(ir,1,Pos(KeywordScanFrom,ir)-1);
tbleFound := Copy(ir,1,pos(KeywordScanTo,ir)-1);

if cnt+1 = TablePos then begin
break;
end;
end;

The Table approach is basically the same except you do not need the KeyWordScanFrom

escaper

ASKER

hi, LMuadDIb ,
thank you for your comments. there isn't any keyword, infact, the data what i want is only numbers and varible text. so i think i couldn't use a keyword for searching. the table is like this

goods_name description price
shoe big 17.99
shirt middle 17.99

now, i know the url of the webpage, the name will be stable but the price would be changed frequently, and my question is how to grab the price and the description.

so i said i should select the text first. to record the path. then read it.

so i think the most difficult problem is " how to record the path when i select the text"

thank you

escaper

ASKER

i mean i want to record the position manually first time, then the program get the price autoly. and i want to it be easily to record, so i think it should be better "if when i select the
price or the description,the program can record the position autoly"

thank you

ASKER CERTIFIED SOLUTION

LMuadDIb

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Johnny_D

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Johnny_D

I forgot to mention that if the "HTTPGet1.URL" parameter does not have an extension (.htm, .html etc) then you have to add a last "/" (without the quotes of course) for example: HTTPGet1.URL := 'http://www.yahoo.com/'; otherwise the page won't load !!!

Johnny_D

What's wrong? Isn't this what you wanted? Tell us what it is that you want exactly so we can help you!