Link to home
Start Free TrialLog in
Avatar of miauw
miauwFlag for Netherlands

asked on

test html links from Delphi

This question is about reading a HTML file of the net or of the network, scan it for
links to other HTML or HTM pages and test whether they exist.

In the program I will have a string TOTEST
The value will be a local or Intenet HTML pages e.g. TOTEST:='F:\TESTME.HTML';
or TOTEST:='HTTP://WWW.TESTSIDE.ORD/TESTME.HTML';
It must then read all hyperlinks to HTML or HTM pages in an ARRAY
TOTEST[1..4000] and
then for each element in TOTEST try to test if the page exists :
that is loading it from the Internet, and giving a errormessage when
the page could not be retrieved.
I have a lan-connection to the Internet.

Please provide a functional program which I can test, not just some hints.

With regards,
 Miauw
Avatar of inter
inter
Flag of Türkiye image

Hi there,
Your old question is deleted before I can be able to read your comment on my comment. Please tell me what did you write in closing the thread. We'll easily deal this one as well afterwards.
Regards, Igor
Avatar of d003303
d003303

miauw,
have you to be fixed on an array ? I would recommend a string list.

Inter,
how far were you developing the code for the prevoius thread ? I have a HTML parser that can also collect all hyperlinks from a web page (IMG SRC, A HREF, ...).

Slash/d003303
Hi,
I just develop a working application to extract all the necessary data that is requested in the previous thread and doit for a given directory, so as I said before only access stuff is missing. Actually it is a parser; you just give it to the opening and closing cases such as <A HREF *> and </A> and it gives you the strings in between. If you can make use of it( I think I have no chance in those questions) I may post it to you. I am now witdrawing.
Regards, Igor
Avatar of miauw

ASKER

To d003303:
I do prefer arrays. It isn't a strict requirement, but any answer
done in this way I will prefer. Why not use a array? We talking
about only 4000 links here, this will not lead to any memory problems.

Regarding the other question: I am not only finished but way beyond that.
Writing the code to of the other question took me about 10% of the total time I spent
on the program after that.

To Igor:

I'm sorry I deleted the other question before you got to read my comment.
I waited something like 10 hours before deleting it and wrote that I would do so soon too.
My comment was as follows: I did not get a working answer, or
anything useful at all (for 12 days I think), so I developed the application myself.
Remember I put a comment up when I started doing this with the warning '' please
be quick in answering cos' I am nearly done"
The thing I wrote works great, I thorougly tested it and it parses
the HTML-structure from the question and several variations into a Paradox DB
(no Access did not work). This was easy.
After reading the file it does a lot of pattern matching and replacing. This was more
difficult.

So, Igor I do not need your source. Although I think it is very nice of you
offering it to me for free. Thank you.
 
Hi,
about arrays, they are easy to handle, but fixed size at compile time. A string list would be more extendable. You never know what's gonna happen tomorrow. Anyway, you only want the first level of links checked, not all other links in the linked pages (like a web spider) ? Or do you want to recursively check all links in all pages ?

Slash/d003303
Avatar of miauw

ASKER

Hi,
1 level deep is enough. Just all the links on the page have to be
tested.
About the array: I don't care about tomorrow, since I will adjust the code to my taste
anyway.

Avatar of miauw

ASKER

Is anyone working on this?
miauw

Yip I'm working on this...

Later
BoRiS
First  ... you need delphi 3 or above
Second ... use 'ClientSocket' VCL under internet
           set property HOST to your ISP address
           set property PORT with 80
           set event ONREAD with PAGEVALID=True
           (PAGEVALID is global variable with boolean type)

i believe, you can collect the page links on HTML file
and separate in into array or TStinglist,

Before i forget, you need 'Timer' VCL for ... (see continue ..)
on timer event in Ontimer, fill with

   Timer1.Enabled := False;

and then Open ClientSocket and check the link ...

   ClientSocket1.Open;
   For i:= 1 to TStringList.Count Do
   Begin
     ClientSocket1.SendText(TStringList.Value[i]);
     timer1.enabled := true; // try with 5 ~ 10 s
     VALIDPAGE := False;
     Repeat
       Application.ProcessMessages;
     Until (Not Timer1.Enabled) or (VALIDPAGE);
     If Not VALIDPAGE then ... (give me 1500 point ! ... :)
   End;

Any Question, please feel free.
Any Comment, welcome.
Any girls ? show me ... :) just kid'n
         
Avatar of miauw

ASKER

Thanks for your answer Hrizal, it sounds very promissing, but I wrote
in my question

- Please provide a functional program which I can test, not just some hints

so if you can give something more 'cut and paste'-ble, I will reconsider
rejecting your answer.  

With kind regards

How crucial is speed?
Does the program have to be multi-threaded or can it test the links one by one? (how many links do you expect? 4000 seems a lot for just one page!)
Avatar of miauw

ASKER

Speed is not very crucial: The program does not have to be multi-threaded.
On a average page is expect to be 200 links. On the largest page is expect
to be 2000 links, so I doubled this amount to keep a margin.
Avatar of miauw

ASKER

Adjusted points to 1600
Avatar of miauw

ASKER

Is anyone working on this question still?
try to answer the question ...
with promised ! and functional maybe :)

"reading a HTML file of the net or of the network, scan it for
links to other HTML or HTM pages and test whether they exist."

get the file from me, in

http://www.nettaxi.com/citizens/greencom/software/urlcheck.zip

best regard,

Avatar of miauw

ASKER

I've downloaded your EXE file. Looks nice.
Please submit or email the source code.
After receiving this I will evaluate your answer,
can't see what will go wrong though, since your EXE
seems OK.
Avatar of miauw

ASKER

Hrizal: please submit your source code or mail it to me.
Avatar of miauw

ASKER

Hrizel you did submit your source code, therefore I reject your answer.
Avatar of miauw

ASKER

I ment: Hrizel did not submit his code offcouse.
Is anyone working on this, otherwise I will delete this question, and will
ask a much simpler one.
ASKER CERTIFIED SOLUTION
Avatar of hrizal
hrizal

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of miauw

ASKER

Thank you HRIZAL, I will evaluatie this ASAP. Can do this right now,
because by saved the chunks as DFM and PAS I got a ERROR creating
form invalid stream format.
More info ... to saving chunk as DFM
open your D3 or windows near you if you getting hot.
and then create new application.
on FORM1, click right button
select 'VIEW AS TEXT'
and then overwrite the text with URL.DFM, you have cut before.
and then click right button again
select 'VIEW AS FORM (ALT+F12)'
overwrite again the text with URL.PAS you have.

OK ?
Avatar of miauw

ASKER

Some notes:

The algoritme Hrizal send me did only part on the job.
The part it does is OK.
However, you need parts of Delphi C/S to run the stuff.

Thanks Hrizal!