Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

test html links from Delphi

Posted on 1998-05-28
23
278 Views
Last Modified: 2010-04-06
This question is about reading a HTML file of the net or of the network, scan it for
links to other HTML or HTM pages and test whether they exist.

In the program I will have a string TOTEST
The value will be a local or Intenet HTML pages e.g. TOTEST:='F:\TESTME.HTML';
or TOTEST:='HTTP://WWW.TESTSIDE.ORD/TESTME.HTML';
It must then read all hyperlinks to HTML or HTM pages in an ARRAY
TOTEST[1..4000] and
then for each element in TOTEST try to test if the page exists :
that is loading it from the Internet, and giving a errormessage when
the page could not be retrieved.
I have a lan-connection to the Internet.

Please provide a functional program which I can test, not just some hints.

With regards,
 Miauw
0
Comment
Question by:miauw
  • 13
  • 4
  • 2
  • +3
23 Comments
 
LVL 5

Expert Comment

by:inter
ID: 1348374
Hi there,
Your old question is deleted before I can be able to read your comment on my comment. Please tell me what did you write in closing the thread. We'll easily deal this one as well afterwards.
Regards, Igor
0
 
LVL 4

Expert Comment

by:d003303
ID: 1348375
miauw,
have you to be fixed on an array ? I would recommend a string list.

Inter,
how far were you developing the code for the prevoius thread ? I have a HTML parser that can also collect all hyperlinks from a web page (IMG SRC, A HREF, ...).

Slash/d003303
0
 
LVL 5

Expert Comment

by:inter
ID: 1348376
Hi,
I just develop a working application to extract all the necessary data that is requested in the previous thread and doit for a given directory, so as I said before only access stuff is missing. Actually it is a parser; you just give it to the opening and closing cases such as <A HREF *> and </A> and it gives you the strings in between. If you can make use of it( I think I have no chance in those questions) I may post it to you. I am now witdrawing.
Regards, Igor
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 

Author Comment

by:miauw
ID: 1348377
To d003303:
I do prefer arrays. It isn't a strict requirement, but any answer
done in this way I will prefer. Why not use a array? We talking
about only 4000 links here, this will not lead to any memory problems.

Regarding the other question: I am not only finished but way beyond that.
Writing the code to of the other question took me about 10% of the total time I spent
on the program after that.

To Igor:

I'm sorry I deleted the other question before you got to read my comment.
I waited something like 10 hours before deleting it and wrote that I would do so soon too.
My comment was as follows: I did not get a working answer, or
anything useful at all (for 12 days I think), so I developed the application myself.
Remember I put a comment up when I started doing this with the warning '' please
be quick in answering cos' I am nearly done"
The thing I wrote works great, I thorougly tested it and it parses
the HTML-structure from the question and several variations into a Paradox DB
(no Access did not work). This was easy.
After reading the file it does a lot of pattern matching and replacing. This was more
difficult.

So, Igor I do not need your source. Although I think it is very nice of you
offering it to me for free. Thank you.
 
0
 
LVL 4

Expert Comment

by:d003303
ID: 1348378
Hi,
about arrays, they are easy to handle, but fixed size at compile time. A string list would be more extendable. You never know what's gonna happen tomorrow. Anyway, you only want the first level of links checked, not all other links in the linked pages (like a web spider) ? Or do you want to recursively check all links in all pages ?

Slash/d003303
0
 

Author Comment

by:miauw
ID: 1348379
Hi,
1 level deep is enough. Just all the links on the page have to be
tested.
About the array: I don't care about tomorrow, since I will adjust the code to my taste
anyway.

0
 

Author Comment

by:miauw
ID: 1348380
Is anyone working on this?
0
 
LVL 4

Expert Comment

by:BoRiS
ID: 1348381
miauw

Yip I'm working on this...

Later
BoRiS
0
 
LVL 2

Expert Comment

by:hrizal
ID: 1348382
First  ... you need delphi 3 or above
Second ... use 'ClientSocket' VCL under internet
           set property HOST to your ISP address
           set property PORT with 80
           set event ONREAD with PAGEVALID=True
           (PAGEVALID is global variable with boolean type)

i believe, you can collect the page links on HTML file
and separate in into array or TStinglist,

Before i forget, you need 'Timer' VCL for ... (see continue ..)
on timer event in Ontimer, fill with

   Timer1.Enabled := False;

and then Open ClientSocket and check the link ...

   ClientSocket1.Open;
   For i:= 1 to TStringList.Count Do
   Begin
     ClientSocket1.SendText(TStringList.Value[i]);
     timer1.enabled := true; // try with 5 ~ 10 s
     VALIDPAGE := False;
     Repeat
       Application.ProcessMessages;
     Until (Not Timer1.Enabled) or (VALIDPAGE);
     If Not VALIDPAGE then ... (give me 1500 point ! ... :)
   End;

Any Question, please feel free.
Any Comment, welcome.
Any girls ? show me ... :) just kid'n
         
0
 

Author Comment

by:miauw
ID: 1348383
Thanks for your answer Hrizal, it sounds very promissing, but I wrote
in my question

- Please provide a functional program which I can test, not just some hints

so if you can give something more 'cut and paste'-ble, I will reconsider
rejecting your answer.  

With kind regards

0
 
LVL 6

Expert Comment

by:Holger101497
ID: 1348384
How crucial is speed?
Does the program have to be multi-threaded or can it test the links one by one? (how many links do you expect? 4000 seems a lot for just one page!)
0
 

Author Comment

by:miauw
ID: 1348385
Speed is not very crucial: The program does not have to be multi-threaded.
On a average page is expect to be 200 links. On the largest page is expect
to be 2000 links, so I doubled this amount to keep a margin.
0
 

Author Comment

by:miauw
ID: 1348386
Adjusted points to 1600
0
 

Author Comment

by:miauw
ID: 1348387
Is anyone working on this question still?
0
 
LVL 2

Expert Comment

by:hrizal
ID: 1348388
try to answer the question ...
with promised ! and functional maybe :)

"reading a HTML file of the net or of the network, scan it for
links to other HTML or HTM pages and test whether they exist."

get the file from me, in

http://www.nettaxi.com/citizens/greencom/software/urlcheck.zip

best regard,

0
 

Author Comment

by:miauw
ID: 1348389
I've downloaded your EXE file. Looks nice.
Please submit or email the source code.
After receiving this I will evaluate your answer,
can't see what will go wrong though, since your EXE
seems OK.
0
 

Author Comment

by:miauw
ID: 1348390
Hrizal: please submit your source code or mail it to me.
0
 

Author Comment

by:miauw
ID: 1348391
Hrizel you did submit your source code, therefore I reject your answer.
0
 

Author Comment

by:miauw
ID: 1348392
I ment: Hrizel did not submit his code offcouse.
Is anyone working on this, otherwise I will delete this question, and will
ask a much simpler one.
0
 
LVL 2

Accepted Solution

by:
hrizal earned 1600 total points
ID: 1348393
Please notify me, direct via email for fast response.
i received if miauw has respond today (5 days later ... :)

Here the source,
but you need cutter and some glue to make this sources
working ... :)

Regard,



Cut & Paste following source,
save as "URL.DFM" and "URL.PAS" and compile with D3

=============================== Starting CUT of URL.DFM
object Form1: TForm1
  Left = 185
  Top = 34
  BorderStyle = bsDialog
  Caption = 'URL Checker'
  ClientHeight = 397
  ClientWidth = 431
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OnClose = FormClose
  PixelsPerInch = 96
  TextHeight = 13
  object Label1: TLabel
    Left = 3
    Top = 5
    Width = 224
    Height = 13
    Caption = 'Cut && Paste URL to be download here !'
    Font.Charset = DEFAULT_CHARSET
    Font.Color = clWindowText
    Font.Height = -11
    Font.Name = 'MS Sans Serif'
    Font.Style = [fsBold]
    ParentFont = False
  end
  object cTime: TLabel
    Left = 328
    Top = 5
    Width = 97
    Height = 13
    Alignment = taRightJustify
    AutoSize = False
  end
  object LED: TSpeedButton
    Left = 406
    Top = 149
    Width = 21
    Height = 25
    Enabled = False
    Glyph.Data = {
      76010000424D7601000000000000760000002800000020000000100000000100
      04000000000000010000120B0000120B00001000000000000000000000000000
      80000080000000808000800000008000800080800000C0C0C000808080000000
      FF0000FF000000FFFF00FF000000FF00FF00FFFF0000FFFFFF00000002333320
      0000000000000000000000023AAAAAA300000000022220000000002AAAAAAAAA
      3000000222222220000002AAAAAAAAAAA300002222222222000003AAAAAAAAAA
      AA2002222222222220000AAAAAAAAAAAAA2002222222222220002AAAAAAAAAAA
      AA3022222222222222002AAAAAAAAAAAAA3022222222222222002AAAAAAAAAAA
      AA3022222222222222002A87AAAAAAAAAA202222222222222200087F7AAAAAAA
      AA202232222222222000027FF7AAAAAAA3000888222222222000008FFF7AAAAA
      300003778222222200000008777AAAA3000000878322222200000000033A3220
      0000000232222200000000000000000000000000000000000000}
    NumGlyphs = 2
  end
  object StartBtn: TButton
    Left = 336
    Top = 149
    Width = 70
    Height = 25
    Caption = 'Check URL'
    TabOrder = 1
    OnClick = StartBtnClick
  end
  object CancelBtn: TButton
    Left = 267
    Top = 149
    Width = 69
    Height = 25
    Caption = 'Cancel'
    TabOrder = 3
    OnClick = CancelBtnClick
  end
  object URLSource: TMemo
    Left = 1
    Top = 25
    Width = 427
    Height = 120
    Lines.Strings = (
      'http://sunsite.icm.edu.pl/delphi/ftp/d30free/hotkeys.zip'
      'http://www.wasantara.net.id/index.htm'
      'http://www.indopubs.com/index.html'
      'http://www.netscape.com/index.html'
      'http://www.microsoft.com/index.htm'
      'http://www.inprise.com/index.html'
      'http://www.inprise.com/delphi/index.html'
      'http://www.xoom.com/xiim/index.htm')
    TabOrder = 0
  end
  object PB1: TProgressBar
    Left = 3
    Top = 150
    Width = 262
    Height = 22
    Min = 0
    Max = 100
    ParentShowHint = False
    ShowHint = True
    TabOrder = 2
  end
  object PageControl1: TPageControl
    Left = 3
    Top = 181
    Width = 424
    Height = 212
    ActivePage = TabSheet1
    TabOrder = 4
    object TabSheet1: TTabSheet
      Caption = 'Good URL'
      object Memo1: TMemo
        Left = 0
        Top = 0
        Width = 416
        Height = 184
        Align = alClient
        Font.Charset = DEFAULT_CHARSET
        Font.Color = clGreen
        Font.Height = -11
        Font.Name = 'MS Sans Serif'
        Font.Style = []
        ParentFont = False
        ReadOnly = True
        TabOrder = 0
      end
    end
    object TabSheet2: TTabSheet
      Caption = 'Bad URL'
      object Memo2: TMemo
        Left = 0
        Top = 0
        Width = 416
        Height = 184
        Align = alClient
        Font.Charset = DEFAULT_CHARSET
        Font.Color = clMaroon
        Font.Height = -11
        Font.Name = 'MS Sans Serif'
        Font.Style = []
        ParentFont = False
        ReadOnly = True
        TabOrder = 0
      end
    end
  end
  object CS: TClientSocket
    Active = False
    ClientType = ctNonBlocking
    Port = 80
    OnRead = CSRead
    OnError = CSError
    Left = 384
    Top = 32
  end
end
=============================== End of URL.DFM



=============================== Starting CUT of URL.PAS
unit URL;
interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  ScktComp, StdCtrls, Buttons, ComCtrls, ExtCtrls;

type
  TForm1 = class(TForm)
    CS: TClientSocket;
    URLSource: TMemo;
    StartBtn: TButton;
    PB1: TProgressBar;
    Label1: TLabel;
    cTime: TLabel;
    CancelBtn: TButton;
    PageControl1: TPageControl;
    TabSheet1: TTabSheet;
    TabSheet2: TTabSheet;
    Memo1: TMemo;
    Memo2: TMemo;
    LED: TSpeedButton;
    procedure CSRead(Sender: TObject; Socket: TCustomWinSocket);
    procedure StartBtnClick(Sender: TObject);
    Procedure CheckURL(s:String);
    procedure CSError(Sender: TObject; Socket: TCustomWinSocket;
      ErrorEvent: TErrorEvent; var ErrorCode: Integer);
    procedure CancelBtnClick(Sender: TObject);
    procedure FormClose(Sender: TObject; var Action: TCloseAction);
  private
    { Private declarations }
  public
    AbortChecking,
    URLValid,
    IsHeader,
    HasAnswer : boolean;
    Host,
    Document  : String;
    FileSize  : Longint;
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation
{$R *.DFM}

procedure TForm1.CSRead(Sender: TObject; Socket: TCustomWinSocket);
var s: string;
begin
    S  := CS.Socket.ReceiveText;
    if (pos('200 OK',s)>0) or
       (pos('200 OK',s)>0) then URLValid := True;
    HasAnswer :=True;
end;

Procedure TForm1.CheckURL(s:String);
Var i,j, TimeOut : integer;

Begin
  cs.close;
  pb1.Hint := s;
  for i:=1 to Length(s) do if s[i]='\' then s[i]:='/';
  j := (pos('://',s));
  Host   := Copy(s,j+3,length(s)-j);
  Host   := Copy(Host,1,pos('/',Host)-1);
  Document  := Copy(s,Pos(Host,s)+length(Host),Length(s)-j-length(Host));
  cs.Host := Host;
  cs.Open;
  TimeOut := 0;
  repeat
    application.ProcessMessages;
    If AbortChecking then Exit;
    inc(TimeOut);
  until (cs.Active) or (TimeOut>1000000);

  if cs.Active=True then
  Begin
    IsHeader := true;
    //see RFC 2000 for details.
    cs.Socket.SendText('HEAD '+Document+' HTTP/1.0'+#13#10+
    'Accept: text/html'+#13#10+
    'User-Agent:  Super Downloader'+#13#10+
    'From:  hrizal@royal.net'+#13#10+#13#10);
    HasAnswer := false;
    URLValid  := False;
    repeat
      application.ProcessMessages;
      If AbortChecking then Exit;
    until HasAnswer;
  End;
  if Not URLValid then Memo2.Lines.Add(s) else Memo1.Lines.add(s);
End;

procedure TForm1.StartBtnClick(Sender: TObject);
var i : word;
begin
  LED.Enabled := true;
  AbortChecking := false;
  StartBtn.Enabled  := false;
  CancelBtn.Visible := true;
  PB1.Position := 0;
  PB1.Max := URLSource.lines.Count;
  PB1.Step :=1;
  if URLSource.lines.Count>0 then
  begin
    for i:= 0 to URLSource.lines.Count-1 do
    begin
      if AbortChecking then
      Begin
        PB1.Position := 0;
        CancelBtn.Visible := False;
        StartBtn.Enabled  := True;
        Exit;
      End;
      PB1.StepIt;
      if trim(URLSource.Lines.Strings[i])<>'' then CheckURL(URLSource.Lines.Strings[i]);
    end;
  end;
  PB1.Position := 0;
  StartBtn.Enabled := True;
  LED.Enabled := false;
  ShowMessage(IntToStr(Memo1.lines.count)+', Good URL'#13#10+IntToStr(Memo2.lines.count)+', Bad URL');
end;


procedure TForm1.CSError(Sender: TObject; Socket: TCustomWinSocket;
  ErrorEvent: TErrorEvent; var ErrorCode: Integer);
begin
  ErrorCode := 0;
  HasAnswer := True;
  Socket.Close;
end;

procedure TForm1.CancelBtnClick(Sender: TObject);
begin
  AbortChecking := true;
end;

procedure TForm1.FormClose(Sender: TObject; var Action: TCloseAction);
begin
  ShowMessage('Hallo, my name is Rizal'+#13#10+'contact for more : hrizal@royal.net');
end;

end.
=============================== End of URL.DFM

0
 

Author Comment

by:miauw
ID: 1348394
Thank you HRIZAL, I will evaluatie this ASAP. Can do this right now,
because by saved the chunks as DFM and PAS I got a ERROR creating
form invalid stream format.
0
 
LVL 2

Expert Comment

by:hrizal
ID: 1348395
More info ... to saving chunk as DFM
open your D3 or windows near you if you getting hot.
and then create new application.
on FORM1, click right button
select 'VIEW AS TEXT'
and then overwrite the text with URL.DFM, you have cut before.
and then click right button again
select 'VIEW AS FORM (ALT+F12)'
overwrite again the text with URL.PAS you have.

OK ?
0
 

Author Comment

by:miauw
ID: 1348396
Some notes:

The algoritme Hrizal send me did only part on the job.
The part it does is OK.
However, you need parts of Delphi C/S to run the stuff.

Thanks Hrizal!
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Delphi cmd execution 6 67
how to manage invalidate between two tvirtualstringtree in same form? 1 129
add combobox item based on numbers 9 154
enhance the following code 3 37
A lot of questions regard threads in Delphi.   One of the more specific questions is how to show progress of the thread.   Updating a progressbar from inside a thread is a mistake. A solution to this would be to send a synchronized message to the…
Introduction The parallel port is a very commonly known port, it was widely used to connect a printer to the PC, if you look at the back of your computer, for those who don't have newer computers, there will be a port with 25 pins and a small print…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question