?
Solved

Extract "Link-Title & URL"

Posted on 2004-10-02
8
Medium Priority
?
473 Views
Last Modified: 2010-04-16
Hello All;

  I am trying to find a way to "Extract" URLs and the Link-Title.
From a web page.

  Example page, that I am trying to extract
http://directory.google.com/Top/Arts/Music/Bands_and_Artists/H/

I do not want the links located below the 2nd Green Table.
Just wont all the links in the
[Category] Area

This is what I am needing to do.

  I need the links and there Link-Title to be brought into a component.
Either "StringGrid, ListView.... or" ( I will let you all decide )
In the cells I need the following to show

( Example URL & Title, taken from Google Link above. the 1st link )

In the Cells:



cellTitle  =  H2O
CellURL  = http://directory.google.com/Top/Arts/Music/Bands_and_Artists/H/H2O/

It needs to beable to "extract" all the "Category" URLs out of the current page.
Have all the "URL-Titles" List and beside it needs the "URL" itself listed.

With the Option to [Delete] the [URL & Title]  from the List.

Then Extract the list to a TMemo.

Any idea's on how to do something like this?
I will give [25 Points - Piece "Up to 3 people"] to whom ever provides links to Actual information, on this subject.
I will give [500 Points] to whom ever provides fully working code.

Thank you all
carrzkiss


0
Comment
Question by:Wayne Barron
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 6

Expert Comment

by:vadim_ti
ID: 12209788
something like this:

unit sourse:

unit main;

(*
  MSXML library
  Microsoft.XMLHTTP
*)


interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, msxml, msxmldom, StdCtrls, ComObj, StrUtils;

type
  TForm1 = class(TForm)
    Button1: TButton;
    SiteAddrEdt: TEdit;
    Memo1: TMemo;
    procedure Button1Click(Sender: TObject);
  private
    { Private declarations }
  public
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.Button1Click(Sender: TObject);
var
  xmlObj: IXMLHttpRequest;
  S, T, eCode, ttl, url: String;
  N1, N2, N3, N4, NStart: Integer;
begin
  xmlObj := CoXMLHTTPRequest.Create;
  Screen.Cursor := crHourGlass;
  try
    memo1.Clear;
    xmlObj.open('GET', SiteAddrEdt.text, false, '', '');
    xmlObj.setRequestHeader('Content-type', 'text/http');
    try
      xmlObj.send('');
    except
      on e:eOLEException do begin
        eCode := format('%8.8x', [e.ErrorCode]);
        if pos('800C', eCode) <> 0 then
          ShowMessage('URL PROBLEM')
        else
          ShowMessage(format('source=%s'#13'helpfile=%s'#13'errorCode=%s',
            [e.Source, e.HelpFile, eCode]));
        Abort;
      end;
      on e:exception do begin
        ShowMessage(e.ClassName);
        raise;
      end
    end;
    T := xmlObj.responseText;
    S := lowercase(T);
    NStart := 1;
    while true do begin
      N1 := PosEx('href="', s, NStart);
      if N1 = 0 then
        break;
      Inc(N1, Length('href="'));
      NStart := N1;

      N2 := PosEx('"', s, NStart);
      if N2 = 0 then
        break;
      NStart := N2+1;

      N3 := PosEx('>', s, NStart);
      if N3 = 0 then
        break;
      Inc(N3);
      NStart := N3;

      N4 := PosEx('</a>', s, NStart);
      if N4 = 0 then
        break;
      NStart := N4 + length('</a>');
      ttl := Copy(T, N3, N4-N3);
      url := Copy(T, N1, N2-N1);
      if pos('dirhelp.html', url) <> 0 then
        continue;
      if pos('color=#ffffff', ttl) <> 0 then
        break;
      Memo1.Lines.Add(ttl+'  :  '+url);
    end;
  finally
    Screen.Cursor := crDefault;
  end;
end;

end.

form source:
object Form1: TForm1
  Left = 192
  Top = 133
  Width = 388
  Height = 293
  Caption = 'Form1'
  Color = clBtnFace
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OldCreateOrder = False
  DesignSize = (
    380
    266)
  PixelsPerInch = 96
  TextHeight = 13
  object Button1: TButton
    Left = 152
    Top = 232
    Width = 75
    Height = 25
    Anchors = [akBottom]
    Caption = 'Button1'
    TabOrder = 0
    OnClick = Button1Click
  end
  object SiteAddrEdt: TEdit
    Left = 45
    Top = 11
    Width = 289
    Height = 21
    Anchors = [akLeft, akTop, akRight]
    TabOrder = 1
    Text = 'http://directory.google.com/Top/Arts/Music/Bands_and_Artists/H/'
  end
  object Memo1: TMemo
    Left = 24
    Top = 56
    Width = 329
    Height = 161
    Anchors = [akLeft, akTop, akRight, akBottom]
    TabOrder = 2
  end
end
0
 
LVL 31

Author Comment

by:Wayne Barron
ID: 12209999
Hello [vadim_ti]

  I am receiving the following error:

[Error] Unit1.pas(67): Undeclared identifier: 'PosEx'

Falls on this line:

      N1 := PosEx('href="', s, NStart);

I tried to set in as a

Var
  PosEx : Integer;

And it then gives me this error:

   [Error] Unit1.pas(67): Missing operator or semicolon

It gives it on every line that has the [PosEx]

Thank you
0
 
LVL 6

Accepted Solution

by:
vadim_ti earned 2000 total points
ID: 12210470
do you include

uses  StrUtils?
+++++++++++++++++++++++++++++
anyway it is PosEx from StrUtils
==========================
function PosEx(const SubStr, S: string; Offset: Cardinal = 1): Integer;
var
  I,X: Integer;
  Len, LenSubStr: Integer;
begin
  if Offset = 1 then
    Result := Pos(SubStr, S)
  else
  begin
    I := Offset;
    LenSubStr := Length(SubStr);
    Len := Length(S) - LenSubStr + 1;
    while I <= Len do
    begin
      if S[I] = SubStr[1] then
      begin
        X := 1;
        while (X < LenSubStr) and (S[I + X] = SubStr[X + 1]) do
          Inc(X);
        if (X = LenSubStr) then
        begin
          Result := I;
          exit;
        end;
      end;
      Inc(I);
    end;
    Result := 0;
  end;
end;


0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 31

Author Comment

by:Wayne Barron
ID: 12210532
Thanks,

   That got it to work.
Let me do some checking on it, and see if it is what I am needing.

I was hoping for something along the lines of a "List" but this might do as well.
I will get back with you within the next day or so.

Thank You once again.

carrzkiss
0
 
LVL 31

Author Comment

by:Wayne Barron
ID: 12214041
Hello [vadim_ti];

  Thank you so very much, This code works great for what I need it for.

Take Care

Carrzkiss
0
 
LVL 6

Expert Comment

by:vadim_ti
ID: 12214065
Hi [carrzkiss]

thanks  
i only want understand something
why do you want to delete lines from memo?
i think will be better do not include not wanted lines, it is simpler and faster

good luck
Vadim
0
 
LVL 31

Author Comment

by:Wayne Barron
ID: 12214282
Am open to suggestions?
Please by all means, if you know of a way to make it faster for me, then that Would be great??

Wayne
0
 
LVL 6

Expert Comment

by:vadim_ti
ID: 12215571
I do not know what do you want to do exactly.
but you can do url or title filtering directly before adding new line to memo

1) do not include urlS beginning with \

          if (url <> '') and (url[1] = '\') then
               continue;

add this to code before adding line to memo

2) memo line is built next way:

     Memo1.Lines.Add(ttl+'  :  '+url);

if you do not want colon and lets say you want title in one line and url in another after 2 tabs
you can do

     Memo1.Lines.Add(ttl);
     Memo1.Lines.Add(#9#9 + url)

I think you could do any type of filtering / formatting at building memo (or something else) stage
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A lot of questions regard threads in Delphi.   One of the more specific questions is how to show progress of the thread.   Updating a progressbar from inside a thread is a mistake. A solution to this would be to send a synchronized message to the…
Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
Michael from AdRem Software outlines event notifications and Automatic Corrective Actions in network monitoring. Automatic Corrective Actions are scripts, which can automatically run upon discovery of a certain undesirable condition in your network.…
In this video you will find out how to export Office 365 mailboxes using the built in eDiscovery tool. Bear in mind that although this method might be useful in some cases, using PST files as Office 365 backup is troublesome in a long run (more on t…
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question