We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

test html links from Delphi

miauw
miauw asked
on
Medium Priority
319 Views
Last Modified: 2010-04-06
This question is about reading a HTML file of the net or of the network, scan it for
links to other HTML or HTM pages and test whether they exist.

In the program I will have a string TOTEST
The value will be a local or Intenet HTML pages e.g. TOTEST:='F:\TESTME.HTML';
or TOTEST:='HTTP://WWW.TESTSIDE.ORD/TESTME.HTML';
It must then read all hyperlinks to HTML or HTM pages in an ARRAY
TOTEST[1..4000] and
then for each element in TOTEST try to test if the page exists :
that is loading it from the Internet, and giving a errormessage when
the page could not be retrieved.
I have a lan-connection to the Internet.

Please provide a functional program which I can test, not just some hints.

With regards,
 Miauw
Comment
Watch Question

Commented:
Hi there,
Your old question is deleted before I can be able to read your comment on my comment. Please tell me what did you write in closing the thread. We'll easily deal this one as well afterwards.
Regards, Igor

Commented:
miauw,
have you to be fixed on an array ? I would recommend a string list.

Inter,
how far were you developing the code for the prevoius thread ? I have a HTML parser that can also collect all hyperlinks from a web page (IMG SRC, A HREF, ...).

Slash/d003303

Commented:
Hi,
I just develop a working application to extract all the necessary data that is requested in the previous thread and doit for a given directory, so as I said before only access stuff is missing. Actually it is a parser; you just give it to the opening and closing cases such as <A HREF *> and </A> and it gives you the strings in between. If you can make use of it( I think I have no chance in those questions) I may post it to you. I am now witdrawing.
Regards, Igor

Author

Commented:
To d003303:
I do prefer arrays. It isn't a strict requirement, but any answer
done in this way I will prefer. Why not use a array? We talking
about only 4000 links here, this will not lead to any memory problems.

Regarding the other question: I am not only finished but way beyond that.
Writing the code to of the other question took me about 10% of the total time I spent
on the program after that.

To Igor:

I'm sorry I deleted the other question before you got to read my comment.
I waited something like 10 hours before deleting it and wrote that I would do so soon too.
My comment was as follows: I did not get a working answer, or
anything useful at all (for 12 days I think), so I developed the application myself.
Remember I put a comment up when I started doing this with the warning '' please
be quick in answering cos' I am nearly done"
The thing I wrote works great, I thorougly tested it and it parses
the HTML-structure from the question and several variations into a Paradox DB
(no Access did not work). This was easy.
After reading the file it does a lot of pattern matching and replacing. This was more
difficult.

So, Igor I do not need your source. Although I think it is very nice of you
offering it to me for free. Thank you.
 

Commented:
Hi,
about arrays, they are easy to handle, but fixed size at compile time. A string list would be more extendable. You never know what's gonna happen tomorrow. Anyway, you only want the first level of links checked, not all other links in the linked pages (like a web spider) ? Or do you want to recursively check all links in all pages ?

Slash/d003303

Author

Commented:
Hi,
1 level deep is enough. Just all the links on the page have to be
tested.
About the array: I don't care about tomorrow, since I will adjust the code to my taste
anyway.

Author

Commented:
Is anyone working on this?

Commented:
miauw

Yip I'm working on this...

Later
BoRiS

Commented:
First  ... you need delphi 3 or above
Second ... use 'ClientSocket' VCL under internet
           set property HOST to your ISP address
           set property PORT with 80
           set event ONREAD with PAGEVALID=True
           (PAGEVALID is global variable with boolean type)

i believe, you can collect the page links on HTML file
and separate in into array or TStinglist,

Before i forget, you need 'Timer' VCL for ... (see continue ..)
on timer event in Ontimer, fill with

   Timer1.Enabled := False;

and then Open ClientSocket and check the link ...

   ClientSocket1.Open;
   For i:= 1 to TStringList.Count Do
   Begin
     ClientSocket1.SendText(TStringList.Value[i]);
     timer1.enabled := true; // try with 5 ~ 10 s
     VALIDPAGE := False;
     Repeat
       Application.ProcessMessages;
     Until (Not Timer1.Enabled) or (VALIDPAGE);
     If Not VALIDPAGE then ... (give me 1500 point ! ... :)
   End;

Any Question, please feel free.
Any Comment, welcome.
Any girls ? show me ... :) just kid'n
         

Author

Commented:
Thanks for your answer Hrizal, it sounds very promissing, but I wrote
in my question

- Please provide a functional program which I can test, not just some hints

so if you can give something more 'cut and paste'-ble, I will reconsider
rejecting your answer.  

With kind regards

How crucial is speed?
Does the program have to be multi-threaded or can it test the links one by one? (how many links do you expect? 4000 seems a lot for just one page!)

Author

Commented:
Speed is not very crucial: The program does not have to be multi-threaded.
On a average page is expect to be 200 links. On the largest page is expect
to be 2000 links, so I doubled this amount to keep a margin.

Author

Commented:
Adjusted points to 1600

Author

Commented:
Is anyone working on this question still?

Commented:
try to answer the question ...
with promised ! and functional maybe :)

"reading a HTML file of the net or of the network, scan it for
links to other HTML or HTM pages and test whether they exist."

get the file from me, in

http://www.nettaxi.com/citizens/greencom/software/urlcheck.zip

best regard,

Author

Commented:
I've downloaded your EXE file. Looks nice.
Please submit or email the source code.
After receiving this I will evaluate your answer,
can't see what will go wrong though, since your EXE
seems OK.

Author

Commented:
Hrizal: please submit your source code or mail it to me.

Author

Commented:
Hrizel you did submit your source code, therefore I reject your answer.

Author

Commented:
I ment: Hrizel did not submit his code offcouse.
Is anyone working on this, otherwise I will delete this question, and will
ask a much simpler one.
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
Thank you HRIZAL, I will evaluatie this ASAP. Can do this right now,
because by saved the chunks as DFM and PAS I got a ERROR creating
form invalid stream format.

Commented:
More info ... to saving chunk as DFM
open your D3 or windows near you if you getting hot.
and then create new application.
on FORM1, click right button
select 'VIEW AS TEXT'
and then overwrite the text with URL.DFM, you have cut before.
and then click right button again
select 'VIEW AS FORM (ALT+F12)'
overwrite again the text with URL.PAS you have.

OK ?

Author

Commented:
Some notes:

The algoritme Hrizal send me did only part on the job.
The part it does is OK.
However, you need parts of Delphi C/S to run the stuff.

Thanks Hrizal!
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.