Wayne Barron
asked on
Extract "Link-Title & URL"
Hello All;
I am trying to find a way to "Extract" URLs and the Link-Title.
From a web page.
Example page, that I am trying to extract
http://directory.google.com/Top/Arts/Music/Bands_and_Artists/H/
I do not want the links located below the 2nd Green Table.
Just wont all the links in the
[Category] Area
This is what I am needing to do.
I need the links and there Link-Title to be brought into a component.
Either "StringGrid, ListView.... or" ( I will let you all decide )
In the cells I need the following to show
( Example URL & Title, taken from Google Link above. the 1st link )
In the Cells:
cellTitle = H2O
CellURL = http://directory.google.com/Top/Arts/Music/Bands_and_Artists/H/H2O/
It needs to beable to "extract" all the "Category" URLs out of the current page.
Have all the "URL-Titles" List and beside it needs the "URL" itself listed.
With the Option to [Delete] the [URL & Title] from the List.
Then Extract the list to a TMemo.
Any idea's on how to do something like this?
I will give [25 Points - Piece "Up to 3 people"] to whom ever provides links to Actual information, on this subject.
I will give [500 Points] to whom ever provides fully working code.
Thank you all
carrzkiss
I am trying to find a way to "Extract" URLs and the Link-Title.
From a web page.
Example page, that I am trying to extract
http://directory.google.com/Top/Arts/Music/Bands_and_Artists/H/
I do not want the links located below the 2nd Green Table.
Just wont all the links in the
[Category] Area
This is what I am needing to do.
I need the links and there Link-Title to be brought into a component.
Either "StringGrid, ListView.... or" ( I will let you all decide )
In the cells I need the following to show
( Example URL & Title, taken from Google Link above. the 1st link )
In the Cells:
cellTitle = H2O
CellURL = http://directory.google.com/Top/Arts/Music/Bands_and_Artists/H/H2O/
It needs to beable to "extract" all the "Category" URLs out of the current page.
Have all the "URL-Titles" List and beside it needs the "URL" itself listed.
With the Option to [Delete] the [URL & Title] from the List.
Then Extract the list to a TMemo.
Any idea's on how to do something like this?
I will give [25 Points - Piece "Up to 3 people"] to whom ever provides links to Actual information, on this subject.
I will give [500 Points] to whom ever provides fully working code.
Thank you all
carrzkiss
ASKER
Hello [vadim_ti]
I am receiving the following error:
[Error] Unit1.pas(67): Undeclared identifier: 'PosEx'
Falls on this line:
N1 := PosEx('href="', s, NStart);
I tried to set in as a
Var
PosEx : Integer;
And it then gives me this error:
[Error] Unit1.pas(67): Missing operator or semicolon
It gives it on every line that has the [PosEx]
Thank you
I am receiving the following error:
[Error] Unit1.pas(67): Undeclared identifier: 'PosEx'
Falls on this line:
N1 := PosEx('href="', s, NStart);
I tried to set in as a
Var
PosEx : Integer;
And it then gives me this error:
[Error] Unit1.pas(67): Missing operator or semicolon
It gives it on every line that has the [PosEx]
Thank you
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks,
That got it to work.
Let me do some checking on it, and see if it is what I am needing.
I was hoping for something along the lines of a "List" but this might do as well.
I will get back with you within the next day or so.
Thank You once again.
carrzkiss
That got it to work.
Let me do some checking on it, and see if it is what I am needing.
I was hoping for something along the lines of a "List" but this might do as well.
I will get back with you within the next day or so.
Thank You once again.
carrzkiss
ASKER
Hello [vadim_ti];
Thank you so very much, This code works great for what I need it for.
Take Care
Carrzkiss
Thank you so very much, This code works great for what I need it for.
Take Care
Carrzkiss
Hi [carrzkiss]
thanks
i only want understand something
why do you want to delete lines from memo?
i think will be better do not include not wanted lines, it is simpler and faster
good luck
Vadim
thanks
i only want understand something
why do you want to delete lines from memo?
i think will be better do not include not wanted lines, it is simpler and faster
good luck
Vadim
ASKER
Am open to suggestions?
Please by all means, if you know of a way to make it faster for me, then that Would be great??
Wayne
Please by all means, if you know of a way to make it faster for me, then that Would be great??
Wayne
I do not know what do you want to do exactly.
but you can do url or title filtering directly before adding new line to memo
1) do not include urlS beginning with \
if (url <> '') and (url[1] = '\') then
continue;
add this to code before adding line to memo
2) memo line is built next way:
Memo1.Lines.Add(ttl+' : '+url);
if you do not want colon and lets say you want title in one line and url in another after 2 tabs
you can do
Memo1.Lines.Add(ttl);
Memo1.Lines.Add(#9#9 + url)
I think you could do any type of filtering / formatting at building memo (or something else) stage
but you can do url or title filtering directly before adding new line to memo
1) do not include urlS beginning with \
if (url <> '') and (url[1] = '\') then
continue;
add this to code before adding line to memo
2) memo line is built next way:
Memo1.Lines.Add(ttl+' : '+url);
if you do not want colon and lets say you want title in one line and url in another after 2 tabs
you can do
Memo1.Lines.Add(ttl);
Memo1.Lines.Add(#9#9 + url)
I think you could do any type of filtering / formatting at building memo (or something else) stage
unit sourse:
unit main;
(*
MSXML library
Microsoft.XMLHTTP
*)
interface
uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs, msxml, msxmldom, StdCtrls, ComObj, StrUtils;
type
TForm1 = class(TForm)
Button1: TButton;
SiteAddrEdt: TEdit;
Memo1: TMemo;
procedure Button1Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
var
Form1: TForm1;
implementation
{$R *.dfm}
procedure TForm1.Button1Click(Sender
var
xmlObj: IXMLHttpRequest;
S, T, eCode, ttl, url: String;
N1, N2, N3, N4, NStart: Integer;
begin
xmlObj := CoXMLHTTPRequest.Create;
Screen.Cursor := crHourGlass;
try
memo1.Clear;
xmlObj.open('GET', SiteAddrEdt.text, false, '', '');
xmlObj.setRequestHeader('C
try
xmlObj.send('');
except
on e:eOLEException do begin
eCode := format('%8.8x', [e.ErrorCode]);
if pos('800C', eCode) <> 0 then
ShowMessage('URL PROBLEM')
else
ShowMessage(format('source
[e.Source, e.HelpFile, eCode]));
Abort;
end;
on e:exception do begin
ShowMessage(e.ClassName);
raise;
end
end;
T := xmlObj.responseText;
S := lowercase(T);
NStart := 1;
while true do begin
N1 := PosEx('href="', s, NStart);
if N1 = 0 then
break;
Inc(N1, Length('href="'));
NStart := N1;
N2 := PosEx('"', s, NStart);
if N2 = 0 then
break;
NStart := N2+1;
N3 := PosEx('>', s, NStart);
if N3 = 0 then
break;
Inc(N3);
NStart := N3;
N4 := PosEx('</a>', s, NStart);
if N4 = 0 then
break;
NStart := N4 + length('</a>');
ttl := Copy(T, N3, N4-N3);
url := Copy(T, N1, N2-N1);
if pos('dirhelp.html', url) <> 0 then
continue;
if pos('color=#ffffff', ttl) <> 0 then
break;
Memo1.Lines.Add(ttl+' : '+url);
end;
finally
Screen.Cursor := crDefault;
end;
end;
end.
form source:
object Form1: TForm1
Left = 192
Top = 133
Width = 388
Height = 293
Caption = 'Form1'
Color = clBtnFace
Font.Charset = DEFAULT_CHARSET
Font.Color = clWindowText
Font.Height = -11
Font.Name = 'MS Sans Serif'
Font.Style = []
OldCreateOrder = False
DesignSize = (
380
266)
PixelsPerInch = 96
TextHeight = 13
object Button1: TButton
Left = 152
Top = 232
Width = 75
Height = 25
Anchors = [akBottom]
Caption = 'Button1'
TabOrder = 0
OnClick = Button1Click
end
object SiteAddrEdt: TEdit
Left = 45
Top = 11
Width = 289
Height = 21
Anchors = [akLeft, akTop, akRight]
TabOrder = 1
Text = 'http://directory.google.com/Top/Arts/Music/Bands_and_Artists/H/'
end
object Memo1: TMemo
Left = 24
Top = 56
Width = 329
Height = 161
Anchors = [akLeft, akTop, akRight, akBottom]
TabOrder = 2
end
end