We help IT Professionals succeed at work.

download PDF files from website

418 Views
Last Modified: 2013-11-07
I am trying to automate the download of several PDF documents from a secure website that requires a login.  The website uses https and requires a login (which of course I have).

When I try to make WebRequest GetResponse calls it fails with a 401 error - Requires Authentication - even though I can navigate to the PDF form using the WebBrowser1.Navigate("[URLOfPdfFileHere]") method.  "Stuff" must be happening at the server side, which means there is no direct URL address for each of the PDF documents - so, typically, when clicking manually on a document link the URL would be shown in the status bar of the browser as:
https://www.theirsite.com/subscription/cheatsheets/NXT/gateway.dll?f=id$id=cheatsheets_form_1_pdf$t=document-frame.htm$3.0$p=

I have tried the following to get this working:
Dim oDownload As New System.Net.WebClient
URLReq.Credentials = New NetworkCredential("MYUSERNAME", "MYPASSWORD", wb1.Document.Domain.ToString)
oDownload.DownloadFile(https://www.theirsite.com/subscription/cheatsheets/NXT/gateway.dll?f=id$id=cheatsheets_form_1_pdf$t=document-frame.htm$3.0$p=", "C:\test\cheatsheet1.pdf")  ' ------FAILS HERE WITH A "401 ERROR - REQUIRES AUTHENTICATION"

When setting the Credentials, I used the "wb1.Document.Domain.ToString" value because I figured that whatever network credential WebBrowser1 was using must be OK - since I was able to navigate to the forms with WebBrowser1.  I have also used:
oDownload.Credentials = System.Net.CredentialCache.DefaultNetworkCredentials
...but this too did not work.  Same "401" error.

One idea I tried was to actually one by one navigate to the link for each document and then when the browser loads the form (within an Adobe Reader frame) programmatically do a "Save As" to save the form to disk.  EXCEPT - I then cannot get a handle on the Adobe Reader frame within that webbrowser control to do the Save As.  If I execute the following:
WebBrowser1.Navigate("https://www.theirsite.com/subscription/cheatsheets/NXT/gateway.dll?f=id$id=cheatsheets_form_1_pdf$t=document-frame.htm$3.0$p=")
...then the PDF file opens up within WebBrowser1 - but then I cannot figure out how to get the file out of the WebBrowser (manually, I would simply hit the save button in the WebBrowser's Adobe Reader toolbar, but I am trying to automate this to not require user input).
Comment
Watch Question

Commented:
Hi,

Maybe you tried but;

Did you realize that you are giving the credentials to a different object rather then oDownload?

Try:
Dim oDownload As New System.Net.WebClient
oDownload.Credentials = New NetworkCredential("MYUSERNAME", "MYPASSWORD", b1.Document.Domain.ToString)
oDownload.DownloadFile("https://blabla", "C:\test\cheatsheet1.pdf")

Open in new window

Author

Commented:
Ah, sorry, I checked and my code was OK.  I made a mistake copying and pasting as my solution was in such a mess after all the trial and error!  Sorry to confuse -- I was desperately hoping that the solution was this simple, but alas no.  I am still getting the 401 Unauthorised error.
CERTIFIED EXPERT
Most Valuable Expert 2012
Top Expert 2014
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.