How to get file name from URL and find out is it HTML page or File


I am developing online PDF converter for our company, user enter URL in html form and get PDF after submiting it.
The problem is that I can't find out is submited URL is web page or file. I have tried to write URL string parser, but
parsing url string is not the best method, it  is impossible to find URL type from parametrized
URL. For example URL, not possible to know from URL what content I should get by clicking on it - web page or file.

I need find out function which will work like IE do while entering URL in URL box, it detect if URL content is web page it load it to IE if file - show file download dialog and put file name in file dialog.

Any suggestion?

Who is Participating?
2266180Connect With a Mentor Commented:
what you are looking for is the HTTP head command (not get) which will get you the info about the file (whatever the server is sending) so instead of doing a get, do a head.
TomazazAuthor Commented:
I am trying to get header values using IdHTTP(INDY) and code below

Strange thing is that on some pages it works fine, but on other
pages it brings Time Out Error. For example on or
I can access these pages with GET function but not with HEAD.
that is because at least google, redirect the .com address to .<country> address and head might not follow redirects thought I don't see anything in code that would suggest that. I tried google with the country tld and it worked ok. I'll see why head doesn't follow redirect.
btw, I'm on delphi 7, indey 9
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

BigRatConnect With a Mentor Commented:
I have changed my IDE to open/save files exclusively via URLs. I used the WinINet function InternetCrackUrl() to determine what type of URL I have and create a "Loader" object accordingly. The file loader uses the standard function from SysUtils including ExtractFileExt() to determine the "type" of the file (used for syntax highlighting). I do similar things for ftp. But in the http and https loaders I get the file using InternetOpenUrl and InternetReadFile and then determine the type using HttpQueryInfo with HTTP_QUERY_CONTENT_TYPE - which also gives me the character set in case it needs conversion.
TomazazAuthor Commented:
As I said in previuos post URL parsing is not the best method to find out document type. Try to parse link and what you will get?

As ciuly suggested, need to use HEAD function to get URL type and even HEAD function not always work on some servers. I have found solution which works in about 97% cases, it is GET function with abort method after header received. My solution can be used with Indy or IPWorks components.
TheRealLokiConnect With a Mentor Senior DeveloperCommented:
with Indy, you do
IdHTTP1.HandleRedirects := True;
s := IdHTTP1.URL.Document; // s contains the filename
I know I didn't give code (I considered it trivial), but I did answer the question correctly (and partially :) ) so wouldn't a split be more appropriate?
Hmmm, after rereading I would agree with a split, but also put BigRat in the split because of the HTTP_QUERY_CONTENT_TYPE.

Any objections to that?
none. that is a good idea as well. (I only read my post and lokis, otherwise I think I would have mentioned that too :D)
TheRealLokiSenior DeveloperCommented:
split it. since the author did not reply to let anyone with a similar problem know how he got on, they should consider all answers
lol? :D

cwwkie, I guess computer101 did not see the last administrative comment :)
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.