Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 226
  • Last Modified:

Parsing HTML - Complete answer please -

Hi there,

Can somebody give me complete answer for the following?

The page is :

http://www.dealtime.co.uk/xFS?GKW=ixus+40&ATV=&RST=PP-&FN=Digital_Cameras&KW=22780686&FD=0&x=16&y=18

The Goals are :

1. I need to find out how many shops selling this product
2. Shop names
3. Shop Prices

if somebody can give me a full working answer, I will give extra 500 points.
0
bilgehanyildirim
Asked:
bilgehanyildirim
  • 5
  • 3
1 Solution
 
Russell LibbySoftware Engineer, Advisory Commented:

I took a look at the html for the link you specified, and I can tell you its not pretty, and will be very difficult (but not impossible) to parse. There are a few other things though, that do make this task pretty much impossible.

1.) The data is split across multiple pages (each page containing a sub set of the data required). This will require parsing out the total page count, the links for the pages, and then require downloading and parsing of those additional pages.
2.) In some cases, the shop names are not text, but are actually images for the shop logo.

Very tough question... listening at this point.

Regards,
Russell
0
 
bilgehanyildirimAuthor Commented:
Thank you very much for your time.

1.) if you meant, for some products there is more than one page, do not worry about that. I just need the first 5 shops actually.
2.) at the end of every row,underneath the buy now button, you can see all shop names in text.

I do realise that this is a tough question, but this is quite important for me. I can raise the points to 2000 if it is any help :))))))))))))
0
 
paulb1989Commented:
I just whipped up a little exe that reads the stores names and prices from the first page (so you get the first 10).

It needs the Indy components but could easily be changed to use another way to get the HTML content.

http://www.burtonsoftware.co.uk/Downloads/tmp.rar
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
bilgehanyildirimAuthor Commented:
YEEESSSS!!!! Thanks you very much Paul..

Just tell me how I can give you the 1500 points!!! shall I open 3 new questions or is there any way around?

I really thank you very much!!!
0
 
paulb1989Commented:
I think you have to use more questions
0
 
paulb1989Commented:
Hmmm in my account info 2000 points were added not 500
0
 
bilgehanyildirimAuthor Commented:
this question was for just 500 pts. I don't know how it became 2000. I do not think these are not my points. I will ask 3 more questions.

http://www.experts-exchange.com/Programming/Programming_Languages/Delphi/Q_21404604.html
http://www.experts-exchange.com/Programming/Programming_Languages/Delphi/Q_21404608.html
http://www.experts-exchange.com/Programming/Programming_Languages/Delphi/Q_21404610.html

after these I will have one more question for you paul, if you don't mind :)
0
 
paulb1989Commented:
I think I do get why, its because you gave me an A grade. It's explained here - http://www.experts-exchange.com/help.jsp#14

Sure, fire away with your question!
0
 
paulb1989Commented:
procedure TForm1.Button1Click(Sender: TObject);
const
  RowID1 = '<tr><td><table border="0"';
  RowID2 = '</script>';
  RowID3 = '</a>';

  PriceID1 = '<font rip-style-borderwidth-backup=""';
  PriceID2 = 'color="#000000">';
  PriceID3 = '</font>';

  CountPerPage = 10;
var
  HTML, Name, Price: String;
  s, e, s2, e2, c, pc, count: Integer;
begin
  Memo1.Lines.Clear;
  Memo1.Lines.Add('Loading...');

  HTML := IdHTTP1.Get('http://www.dealtime.co.uk/xFS?GKW=ixus+40&ATV=&RST=PP-&FN=Digital_Cameras&KW=22780686&FD=0&x=16&y=18');

  Memo1.Lines.Clear;

  s := Pos('<b>We found', HTML);
  s := PosEx('<font color="#cc0000">', HTML, s) + Length('<font color="#cc0000">');
  e := PosEx('matches</font>', HTML, s);

  count := StrToInt(Trim(Copy(HTML, s, e - s)));
  Delete(HTML, 1, e);

  Memo1.Lines.Add('Shop Count: ' + IntToStr(count));
  Memo1.Lines.Add('');

  c := 0;
  pc := 0;

  s := Pos(RowID1, HTML) + Length(RowID1);
  s := PosEx(RowID2, HTML, s) + Length(RowID2);
  s := PosEx(RowID2, HTML, s) + Length(RowID2);
  while (s > 0) and (c <= count) and (pc < CountPerPage) do
  begin
    Inc(c);
    Inc(pc);

    e := PosEx(RowID3, HTML, s);

    Name := Trim(Copy(HTML, s, e - s));

    if Copy(Name, 1, 4) = '<img' then
    begin
      s2 := Pos('alt="', Name) + 5;
      e2 := PosEx('"', Name, s2);
      Name := Trim(Copy(Name, s2, e2 - s2));
    end;

    if Name = '' then
      Name := 'Unknown';

    s2 := PosEx(PriceID1, HTML, s) + Length(PriceID1);
    s2 := PosEx(PriceID2, HTML, s2) + Length(PriceID2);
    e2 := PosEx(PriceID3, HTML, s2);

    Price := Trim(Copy(HTML, s2, e2 - s2));

    if Price = '' then
      Price := 'Unknown';

    Price := StringReplace(Price, '&#163;', '£', []);

    Memo1.Lines.Add('Shop ' + IntToStr(c) + ': ' + Name + ' - ' + Price);

    Delete(HTML, 1, e2);

    s := Pos(RowID1, HTML) + Length(RowID1);
    s := PosEx(RowID2, HTML, s) + Length(RowID2);
    s := PosEx(RowID2, HTML, s) + Length(RowID2);
  end;
end;
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 5
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now