Parsing HTML - Complete answer please -

Hi there,

Can somebody give me complete answer for the following?

The page is :

The Goals are :

1. I need to find out how many shops selling this product
2. Shop names
3. Shop Prices

if somebody can give me a full working answer, I will give extra 500 points.
Who is Participating?
paulb1989Connect With a Mentor Commented:
I just whipped up a little exe that reads the stores names and prices from the first page (so you get the first 10).

It needs the Indy components but could easily be changed to use another way to get the HTML content.
Russell LibbySoftware Engineer, Advisory Commented:

I took a look at the html for the link you specified, and I can tell you its not pretty, and will be very difficult (but not impossible) to parse. There are a few other things though, that do make this task pretty much impossible.

1.) The data is split across multiple pages (each page containing a sub set of the data required). This will require parsing out the total page count, the links for the pages, and then require downloading and parsing of those additional pages.
2.) In some cases, the shop names are not text, but are actually images for the shop logo.

Very tough question... listening at this point.

bilgehanyildirimAuthor Commented:
Thank you very much for your time.

1.) if you meant, for some products there is more than one page, do not worry about that. I just need the first 5 shops actually.
2.) at the end of every row,underneath the buy now button, you can see all shop names in text.

I do realise that this is a tough question, but this is quite important for me. I can raise the points to 2000 if it is any help :))))))))))))
Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

bilgehanyildirimAuthor Commented:
YEEESSSS!!!! Thanks you very much Paul..

Just tell me how I can give you the 1500 points!!! shall I open 3 new questions or is there any way around?

I really thank you very much!!!
I think you have to use more questions
Hmmm in my account info 2000 points were added not 500
bilgehanyildirimAuthor Commented:
this question was for just 500 pts. I don't know how it became 2000. I do not think these are not my points. I will ask 3 more questions.

after these I will have one more question for you paul, if you don't mind :)
I think I do get why, its because you gave me an A grade. It's explained here -

Sure, fire away with your question!
procedure TForm1.Button1Click(Sender: TObject);
  RowID1 = '<tr><td><table border="0"';
  RowID2 = '</script>';
  RowID3 = '</a>';

  PriceID1 = '<font rip-style-borderwidth-backup=""';
  PriceID2 = 'color="#000000">';
  PriceID3 = '</font>';

  CountPerPage = 10;
  HTML, Name, Price: String;
  s, e, s2, e2, c, pc, count: Integer;

  HTML := IdHTTP1.Get('');


  s := Pos('<b>We found', HTML);
  s := PosEx('<font color="#cc0000">', HTML, s) + Length('<font color="#cc0000">');
  e := PosEx('matches</font>', HTML, s);

  count := StrToInt(Trim(Copy(HTML, s, e - s)));
  Delete(HTML, 1, e);

  Memo1.Lines.Add('Shop Count: ' + IntToStr(count));

  c := 0;
  pc := 0;

  s := Pos(RowID1, HTML) + Length(RowID1);
  s := PosEx(RowID2, HTML, s) + Length(RowID2);
  s := PosEx(RowID2, HTML, s) + Length(RowID2);
  while (s > 0) and (c <= count) and (pc < CountPerPage) do

    e := PosEx(RowID3, HTML, s);

    Name := Trim(Copy(HTML, s, e - s));

    if Copy(Name, 1, 4) = '<img' then
      s2 := Pos('alt="', Name) + 5;
      e2 := PosEx('"', Name, s2);
      Name := Trim(Copy(Name, s2, e2 - s2));

    if Name = '' then
      Name := 'Unknown';

    s2 := PosEx(PriceID1, HTML, s) + Length(PriceID1);
    s2 := PosEx(PriceID2, HTML, s2) + Length(PriceID2);
    e2 := PosEx(PriceID3, HTML, s2);

    Price := Trim(Copy(HTML, s2, e2 - s2));

    if Price = '' then
      Price := 'Unknown';

    Price := StringReplace(Price, '&#163;', '£', []);

    Memo1.Lines.Add('Shop ' + IntToStr(c) + ': ' + Name + ' - ' + Price);

    Delete(HTML, 1, e2);

    s := Pos(RowID1, HTML) + Length(RowID1);
    s := PosEx(RowID2, HTML, s) + Length(RowID2);
    s := PosEx(RowID2, HTML, s) + Length(RowID2);
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.