Solved

test html links from Delphi

Posted on 1998-05-28
23
269 Views
Last Modified: 2010-04-06
This question is about reading a HTML file of the net or of the network, scan it for
links to other HTML or HTM pages and test whether they exist.

In the program I will have a string TOTEST
The value will be a local or Intenet HTML pages e.g. TOTEST:='F:\TESTME.HTML';
or TOTEST:='HTTP://WWW.TESTSIDE.ORD/TESTME.HTML';
It must then read all hyperlinks to HTML or HTM pages in an ARRAY
TOTEST[1..4000] and
then for each element in TOTEST try to test if the page exists :
that is loading it from the Internet, and giving a errormessage when
the page could not be retrieved.
I have a lan-connection to the Internet.

Please provide a functional program which I can test, not just some hints.

With regards,
 Miauw
0
Comment
Question by:miauw
  • 13
  • 4
  • 2
  • +3
23 Comments
 
LVL 5

Expert Comment

by:inter
Comment Utility
Hi there,
Your old question is deleted before I can be able to read your comment on my comment. Please tell me what did you write in closing the thread. We'll easily deal this one as well afterwards.
Regards, Igor
0
 
LVL 4

Expert Comment

by:d003303
Comment Utility
miauw,
have you to be fixed on an array ? I would recommend a string list.

Inter,
how far were you developing the code for the prevoius thread ? I have a HTML parser that can also collect all hyperlinks from a web page (IMG SRC, A HREF, ...).

Slash/d003303
0
 
LVL 5

Expert Comment

by:inter
Comment Utility
Hi,
I just develop a working application to extract all the necessary data that is requested in the previous thread and doit for a given directory, so as I said before only access stuff is missing. Actually it is a parser; you just give it to the opening and closing cases such as <A HREF *> and </A> and it gives you the strings in between. If you can make use of it( I think I have no chance in those questions) I may post it to you. I am now witdrawing.
Regards, Igor
0
 

Author Comment

by:miauw
Comment Utility
To d003303:
I do prefer arrays. It isn't a strict requirement, but any answer
done in this way I will prefer. Why not use a array? We talking
about only 4000 links here, this will not lead to any memory problems.

Regarding the other question: I am not only finished but way beyond that.
Writing the code to of the other question took me about 10% of the total time I spent
on the program after that.

To Igor:

I'm sorry I deleted the other question before you got to read my comment.
I waited something like 10 hours before deleting it and wrote that I would do so soon too.
My comment was as follows: I did not get a working answer, or
anything useful at all (for 12 days I think), so I developed the application myself.
Remember I put a comment up when I started doing this with the warning '' please
be quick in answering cos' I am nearly done"
The thing I wrote works great, I thorougly tested it and it parses
the HTML-structure from the question and several variations into a Paradox DB
(no Access did not work). This was easy.
After reading the file it does a lot of pattern matching and replacing. This was more
difficult.

So, Igor I do not need your source. Although I think it is very nice of you
offering it to me for free. Thank you.
 
0
 
LVL 4

Expert Comment

by:d003303
Comment Utility
Hi,
about arrays, they are easy to handle, but fixed size at compile time. A string list would be more extendable. You never know what's gonna happen tomorrow. Anyway, you only want the first level of links checked, not all other links in the linked pages (like a web spider) ? Or do you want to recursively check all links in all pages ?

Slash/d003303
0
 

Author Comment

by:miauw
Comment Utility
Hi,
1 level deep is enough. Just all the links on the page have to be
tested.
About the array: I don't care about tomorrow, since I will adjust the code to my taste
anyway.

0
 

Author Comment

by:miauw
Comment Utility
Is anyone working on this?
0
 
LVL 4

Expert Comment

by:BoRiS
Comment Utility
miauw

Yip I'm working on this...

Later
BoRiS
0
 
LVL 2

Expert Comment

by:hrizal
Comment Utility
First  ... you need delphi 3 or above
Second ... use 'ClientSocket' VCL under internet
           set property HOST to your ISP address
           set property PORT with 80
           set event ONREAD with PAGEVALID=True
           (PAGEVALID is global variable with boolean type)

i believe, you can collect the page links on HTML file
and separate in into array or TStinglist,

Before i forget, you need 'Timer' VCL for ... (see continue ..)
on timer event in Ontimer, fill with

   Timer1.Enabled := False;

and then Open ClientSocket and check the link ...

   ClientSocket1.Open;
   For i:= 1 to TStringList.Count Do
   Begin
     ClientSocket1.SendText(TStringList.Value[i]);
     timer1.enabled := true; // try with 5 ~ 10 s
     VALIDPAGE := False;
     Repeat
       Application.ProcessMessages;
     Until (Not Timer1.Enabled) or (VALIDPAGE);
     If Not VALIDPAGE then ... (give me 1500 point ! ... :)
   End;

Any Question, please feel free.
Any Comment, welcome.
Any girls ? show me ... :) just kid'n
         
0
 

Author Comment

by:miauw
Comment Utility
Thanks for your answer Hrizal, it sounds very promissing, but I wrote
in my question

- Please provide a functional program which I can test, not just some hints

so if you can give something more 'cut and paste'-ble, I will reconsider
rejecting your answer.  

With kind regards

0
 
LVL 6

Expert Comment

by:Holger101497
Comment Utility
How crucial is speed?
Does the program have to be multi-threaded or can it test the links one by one? (how many links do you expect? 4000 seems a lot for just one page!)
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:miauw
Comment Utility
Speed is not very crucial: The program does not have to be multi-threaded.
On a average page is expect to be 200 links. On the largest page is expect
to be 2000 links, so I doubled this amount to keep a margin.
0
 

Author Comment

by:miauw
Comment Utility
Adjusted points to 1600
0
 

Author Comment

by:miauw
Comment Utility
Is anyone working on this question still?
0
 
LVL 2

Expert Comment

by:hrizal
Comment Utility
try to answer the question ...
with promised ! and functional maybe :)

"reading a HTML file of the net or of the network, scan it for
links to other HTML or HTM pages and test whether they exist."

get the file from me, in

http://www.nettaxi.com/citizens/greencom/software/urlcheck.zip

best regard,

0
 

Author Comment

by:miauw
Comment Utility
I've downloaded your EXE file. Looks nice.
Please submit or email the source code.
After receiving this I will evaluate your answer,
can't see what will go wrong though, since your EXE
seems OK.
0
 

Author Comment

by:miauw
Comment Utility
Hrizal: please submit your source code or mail it to me.
0
 

Author Comment

by:miauw
Comment Utility
Hrizel you did submit your source code, therefore I reject your answer.
0
 

Author Comment

by:miauw
Comment Utility
I ment: Hrizel did not submit his code offcouse.
Is anyone working on this, otherwise I will delete this question, and will
ask a much simpler one.
0
 
LVL 2

Accepted Solution

by:
hrizal earned 1600 total points
Comment Utility
Please notify me, direct via email for fast response.
i received if miauw has respond today (5 days later ... :)

Here the source,
but you need cutter and some glue to make this sources
working ... :)

Regard,



Cut & Paste following source,
save as "URL.DFM" and "URL.PAS" and compile with D3

=============================== Starting CUT of URL.DFM
object Form1: TForm1
  Left = 185
  Top = 34
  BorderStyle = bsDialog
  Caption = 'URL Checker'
  ClientHeight = 397
  ClientWidth = 431
  Font.Charset = DEFAULT_CHARSET
  Font.Color = clWindowText
  Font.Height = -11
  Font.Name = 'MS Sans Serif'
  Font.Style = []
  OnClose = FormClose
  PixelsPerInch = 96
  TextHeight = 13
  object Label1: TLabel
    Left = 3
    Top = 5
    Width = 224
    Height = 13
    Caption = 'Cut && Paste URL to be download here !'
    Font.Charset = DEFAULT_CHARSET
    Font.Color = clWindowText
    Font.Height = -11
    Font.Name = 'MS Sans Serif'
    Font.Style = [fsBold]
    ParentFont = False
  end
  object cTime: TLabel
    Left = 328
    Top = 5
    Width = 97
    Height = 13
    Alignment = taRightJustify
    AutoSize = False
  end
  object LED: TSpeedButton
    Left = 406
    Top = 149
    Width = 21
    Height = 25
    Enabled = False
    Glyph.Data = {
      76010000424D7601000000000000760000002800000020000000100000000100
      04000000000000010000120B0000120B00001000000000000000000000000000
      80000080000000808000800000008000800080800000C0C0C000808080000000
      FF0000FF000000FFFF00FF000000FF00FF00FFFF0000FFFFFF00000002333320
      0000000000000000000000023AAAAAA300000000022220000000002AAAAAAAAA
      3000000222222220000002AAAAAAAAAAA300002222222222000003AAAAAAAAAA
      AA2002222222222220000AAAAAAAAAAAAA2002222222222220002AAAAAAAAAAA
      AA3022222222222222002AAAAAAAAAAAAA3022222222222222002AAAAAAAAAAA
      AA3022222222222222002A87AAAAAAAAAA202222222222222200087F7AAAAAAA
      AA202232222222222000027FF7AAAAAAA3000888222222222000008FFF7AAAAA
      300003778222222200000008777AAAA3000000878322222200000000033A3220
      0000000232222200000000000000000000000000000000000000}
    NumGlyphs = 2
  end
  object StartBtn: TButton
    Left = 336
    Top = 149
    Width = 70
    Height = 25
    Caption = 'Check URL'
    TabOrder = 1
    OnClick = StartBtnClick
  end
  object CancelBtn: TButton
    Left = 267
    Top = 149
    Width = 69
    Height = 25
    Caption = 'Cancel'
    TabOrder = 3
    OnClick = CancelBtnClick
  end
  object URLSource: TMemo
    Left = 1
    Top = 25
    Width = 427
    Height = 120
    Lines.Strings = (
      'http://sunsite.icm.edu.pl/delphi/ftp/d30free/hotkeys.zip'
      'http://www.wasantara.net.id/index.htm'
      'http://www.indopubs.com/index.html'
      'http://www.netscape.com/index.html'
      'http://www.microsoft.com/index.htm'
      'http://www.inprise.com/index.html'
      'http://www.inprise.com/delphi/index.html'
      'http://www.xoom.com/xiim/index.htm')
    TabOrder = 0
  end
  object PB1: TProgressBar
    Left = 3
    Top = 150
    Width = 262
    Height = 22
    Min = 0
    Max = 100
    ParentShowHint = False
    ShowHint = True
    TabOrder = 2
  end
  object PageControl1: TPageControl
    Left = 3
    Top = 181
    Width = 424
    Height = 212
    ActivePage = TabSheet1
    TabOrder = 4
    object TabSheet1: TTabSheet
      Caption = 'Good URL'
      object Memo1: TMemo
        Left = 0
        Top = 0
        Width = 416
        Height = 184
        Align = alClient
        Font.Charset = DEFAULT_CHARSET
        Font.Color = clGreen
        Font.Height = -11
        Font.Name = 'MS Sans Serif'
        Font.Style = []
        ParentFont = False
        ReadOnly = True
        TabOrder = 0
      end
    end
    object TabSheet2: TTabSheet
      Caption = 'Bad URL'
      object Memo2: TMemo
        Left = 0
        Top = 0
        Width = 416
        Height = 184
        Align = alClient
        Font.Charset = DEFAULT_CHARSET
        Font.Color = clMaroon
        Font.Height = -11
        Font.Name = 'MS Sans Serif'
        Font.Style = []
        ParentFont = False
        ReadOnly = True
        TabOrder = 0
      end
    end
  end
  object CS: TClientSocket
    Active = False
    ClientType = ctNonBlocking
    Port = 80
    OnRead = CSRead
    OnError = CSError
    Left = 384
    Top = 32
  end
end
=============================== End of URL.DFM



=============================== Starting CUT of URL.PAS
unit URL;
interface

uses
  Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
  ScktComp, StdCtrls, Buttons, ComCtrls, ExtCtrls;

type
  TForm1 = class(TForm)
    CS: TClientSocket;
    URLSource: TMemo;
    StartBtn: TButton;
    PB1: TProgressBar;
    Label1: TLabel;
    cTime: TLabel;
    CancelBtn: TButton;
    PageControl1: TPageControl;
    TabSheet1: TTabSheet;
    TabSheet2: TTabSheet;
    Memo1: TMemo;
    Memo2: TMemo;
    LED: TSpeedButton;
    procedure CSRead(Sender: TObject; Socket: TCustomWinSocket);
    procedure StartBtnClick(Sender: TObject);
    Procedure CheckURL(s:String);
    procedure CSError(Sender: TObject; Socket: TCustomWinSocket;
      ErrorEvent: TErrorEvent; var ErrorCode: Integer);
    procedure CancelBtnClick(Sender: TObject);
    procedure FormClose(Sender: TObject; var Action: TCloseAction);
  private
    { Private declarations }
  public
    AbortChecking,
    URLValid,
    IsHeader,
    HasAnswer : boolean;
    Host,
    Document  : String;
    FileSize  : Longint;
    { Public declarations }
  end;

var
  Form1: TForm1;

implementation
{$R *.DFM}

procedure TForm1.CSRead(Sender: TObject; Socket: TCustomWinSocket);
var s: string;
begin
    S  := CS.Socket.ReceiveText;
    if (pos('200 OK',s)>0) or
       (pos('200 OK',s)>0) then URLValid := True;
    HasAnswer :=True;
end;

Procedure TForm1.CheckURL(s:String);
Var i,j, TimeOut : integer;

Begin
  cs.close;
  pb1.Hint := s;
  for i:=1 to Length(s) do if s[i]='\' then s[i]:='/';
  j := (pos('://',s));
  Host   := Copy(s,j+3,length(s)-j);
  Host   := Copy(Host,1,pos('/',Host)-1);
  Document  := Copy(s,Pos(Host,s)+length(Host),Length(s)-j-length(Host));
  cs.Host := Host;
  cs.Open;
  TimeOut := 0;
  repeat
    application.ProcessMessages;
    If AbortChecking then Exit;
    inc(TimeOut);
  until (cs.Active) or (TimeOut>1000000);

  if cs.Active=True then
  Begin
    IsHeader := true;
    //see RFC 2000 for details.
    cs.Socket.SendText('HEAD '+Document+' HTTP/1.0'+#13#10+
    'Accept: text/html'+#13#10+
    'User-Agent:  Super Downloader'+#13#10+
    'From:  hrizal@royal.net'+#13#10+#13#10);
    HasAnswer := false;
    URLValid  := False;
    repeat
      application.ProcessMessages;
      If AbortChecking then Exit;
    until HasAnswer;
  End;
  if Not URLValid then Memo2.Lines.Add(s) else Memo1.Lines.add(s);
End;

procedure TForm1.StartBtnClick(Sender: TObject);
var i : word;
begin
  LED.Enabled := true;
  AbortChecking := false;
  StartBtn.Enabled  := false;
  CancelBtn.Visible := true;
  PB1.Position := 0;
  PB1.Max := URLSource.lines.Count;
  PB1.Step :=1;
  if URLSource.lines.Count>0 then
  begin
    for i:= 0 to URLSource.lines.Count-1 do
    begin
      if AbortChecking then
      Begin
        PB1.Position := 0;
        CancelBtn.Visible := False;
        StartBtn.Enabled  := True;
        Exit;
      End;
      PB1.StepIt;
      if trim(URLSource.Lines.Strings[i])<>'' then CheckURL(URLSource.Lines.Strings[i]);
    end;
  end;
  PB1.Position := 0;
  StartBtn.Enabled := True;
  LED.Enabled := false;
  ShowMessage(IntToStr(Memo1.lines.count)+', Good URL'#13#10+IntToStr(Memo2.lines.count)+', Bad URL');
end;


procedure TForm1.CSError(Sender: TObject; Socket: TCustomWinSocket;
  ErrorEvent: TErrorEvent; var ErrorCode: Integer);
begin
  ErrorCode := 0;
  HasAnswer := True;
  Socket.Close;
end;

procedure TForm1.CancelBtnClick(Sender: TObject);
begin
  AbortChecking := true;
end;

procedure TForm1.FormClose(Sender: TObject; var Action: TCloseAction);
begin
  ShowMessage('Hallo, my name is Rizal'+#13#10+'contact for more : hrizal@royal.net');
end;

end.
=============================== End of URL.DFM

0
 

Author Comment

by:miauw
Comment Utility
Thank you HRIZAL, I will evaluatie this ASAP. Can do this right now,
because by saved the chunks as DFM and PAS I got a ERROR creating
form invalid stream format.
0
 
LVL 2

Expert Comment

by:hrizal
Comment Utility
More info ... to saving chunk as DFM
open your D3 or windows near you if you getting hot.
and then create new application.
on FORM1, click right button
select 'VIEW AS TEXT'
and then overwrite the text with URL.DFM, you have cut before.
and then click right button again
select 'VIEW AS FORM (ALT+F12)'
overwrite again the text with URL.PAS you have.

OK ?
0
 

Author Comment

by:miauw
Comment Utility
Some notes:

The algoritme Hrizal send me did only part on the job.
The part it does is OK.
However, you need parts of Delphi C/S to run the stuff.

Thanks Hrizal!
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Hello everybody This Article will show you how to validate number with TEdit control, What's the TEdit control? TEdit is a standard Windows edit control on a form, it allows to user to write, read and copy/paste single line of text. Usua…
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now