PDF download from Web via VBA - going through a security page

I am trying to download a page from the web and have to go through two security pages before getting to the PDF file.  The attached code is getting me through the security but my last part is apparantly opening a new browser windows before attempting to download the pdf.  I need it to use the open window and download the pdf.

The function (URLDownloadToFile) called in the code is :
Public Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

I have set references to InetTransferLib and Microsoft Internet Controls.

Any assistance would be greatly appreciated.
Dim ieApp As InternetExplorer
    Dim iePage As HTMLDocument
    Dim x As Single
    Dim timestart As String
    Const cTIME = 10000 'in MilliSeconds
    
    Set ieApp = New InternetExplorer
    
    ieApp.Visible = True
    
    'This changes based upon the document needed, hardcoded here for the example
    ieApp.Navigate "https://ecf.ganb.uscourts.gov/cgi-bin/show_case_doc?20,729812,,14604849,"
    
    'wait for page to load
    Do Until ieApp.ReadyState = READYSTATE_COMPLETE
    Loop
    
    Set iePage = ieApp.Document
    
    'Enter the User information and password in the first web page
    iePage.Forms(0).Item("login").Value = "cw0133"
    iePage.Forms(0).Item("key").Value = "4scirvy2"
    iePage.Forms(0).Item("clcode").Value = "system"
    iePage.Forms(0).Item("button1").Click
    
    'Wait for the second page to load and hit the submit button
    Call sSleep(cTIME)
    ieApp.Document.Forms(0).submit
    
    'Wait for the PDF file to load
    Call sSleep(cTIME)
        
    '***********************************************************************************************
    'This is my problem area it opens a new browser instead of using the one that has gone through the security check
    'Download the PDF file  - hardcoded url and file name for this example
    If URLDownloadToFile(0&, "https://ecf.ganb.uscourts.gov/cgi-bin/show_case_doc?20,729812,,14604849,", "c:\TEMP\TEST110908.PDF", 0&, 0&) = 0 Then
        MsgBox "Downloaded"
    Else
        MsgBox "Failed"
    End If
    '***********************************************************************************************
 
 
 
    ' Kill the browser
    Set ieBrowser = Nothing

Open in new window

marshalldavisAsked:
Who is Participating?
 
marshalldavisConnect With a Mentor Author Commented:
rockiroads,

Thank you so much for taking the time to look at the problem.  I have to be able to do this so I am going continue to search for a solution and  leave the question open for a bit longer in case anyone else might have another solution.
0
 
DanRollinsConnect With a Mentor Commented:
You might consider using the

    XMLHttpRequest Object
    http://msdn.microsoft.com/en-us/library/ms535874(VS.85).aspx
of the existing browser. Presumably the session variables would still be in place. The name is decepetive -- this has nothing to do with XML... the oReq.responseBody is the raw bytes of the requested data (in this case, PDF file). All you would need to do is write the data to disk.
0
 
marshalldavisAuthor Commented:
I am unfamiliar with the object and will research and try to implement a solution and report back.  Thank you so much for giving me a direction to try.
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
marshalldavisAuthor Commented:
Dan,

I spent a good deal of time yesterday pursuing the answer and trying to become familiar with the XMLHttpRequest object.

It seems to require that I must provide it a URL and does not use the exisiting window I have open under the InternetExplorer object.

However, unless I missed it, the XMLHttpRequest does not seem to have the same ability to 'navigate' the pages like the IE object.  My system has to be able to get through the user name and password page as in the code I posted above.

Can the XMLHttpRequest navigate pages or be tied into the open IE object window?  If so, can you assist me with what method to use?  I did get it to download a page when it got to a page - so I know that part should work.

Thank you so much for responding.  This problem is causing a huge backload of work for me and your efforts are greatly appreciated.

0
 
DanRollinsCommented:
In your IE instance, if you can access the DOM, then you should be able to access the window object. With IE 7, the XMLHttpRequest object is a member of the window object (as shown in the second example code in the link I provided).
As to "navigating" ... that is mainly a concept associated with interactively browsing; that is, a person looking at a page, clicking a link, looking at a different page, etc. In most cases where automation is the goal, there is not really any reason to display the page to a person -- one just gets the response and processes it (without ever needing to display it). In this case, it sounded like your goal is to download a file and save it do disk, so I assumed that the download (not the display) is the issue.
0
 
marshalldavisAuthor Commented:
Dan,

You are correct.  I really don't care about the display at all.  I simply need it to get through the first two security pages before it can "see" the pdf to download.

The code I have does accomplish this but both methods of download I have tried (URLDownloadToFile and XMLHttpRequest) seem to rely on going directly to the PDF page and are blocked by the security pages.

I will look into the DOM/Window object as you suggest and see if I can find that last magical piece I need to grab the pdf file.

Thanks for the continuing advice.
0
 
DanRollinsCommented:
The key issue is that the server needs to think that the same user (same browser instance) is requesting both items (the login page and the download page).  It does so by setting up a session ID in a local cookie and that session ID gets passed back to the server with each subsequent request.   You need to find a way that uses the same instance for both requests.  The UrlDownloadToFile API must not be passing that cookie/session ID.  I believe that the XMLHttpRequest object of the window of the browser control will do the same thing viv-a-vis cookie handling as would a browser.
0
 
marshalldavisAuthor Commented:
Dan,

Thanks.  I spent much of the weekend and today and am continuing to search for the answer.
0
 
rockiroadsCommented:
Well possibly your code is working but failing maybe on the login. I noticed that the 2nd time you try it, the login prompts are not there. Perhaps that login info was saved in a cookie or something, not sure.

As a test, I ignored errors but switched it back on after
eg
    On Error Resume Next
    iePage.Forms(0).Item("login").Value = "cw0133"
    iePage.Forms(0).Item("key").Value = "4scirvy2"
    iePage.Forms(0).Item("clcode").Value = "system"
    iePage.Forms(0).Item("button1").Click
   
    'Wait for the second page to load and hit the submit button
    On Error Goto 0
    Call sSleep(cTIME)
    ieApp.Document.Forms(0).submit


Ideally the best way forward is to add validation. Check to see if those form items exist when you expect them to.

And since you defined iePage, you could just do iePage.Forms(0).submit but makes no difference really
0
 
marshalldavisAuthor Commented:
rockiroads,

Thank you for the response.  I have encountered the failure on subsequent attempts (I agree it is probably a cookie).  It has not been something I have attacked since I can't get the download to work when it does disply the pdf on the first try.

The frustrating part is that the browser is displaying the pdf that I need but everything I have tried to download it has failed.  The piece I seem to be missing is whether the InternetExplorer object has essentially a "save" command/method once the pdf is displayed.

I am willing to try anything and am not at all tied to the code above if you have another solution.

Thanks for the help.

0
 
rockiroadsConnect With a Mentor Commented:
ive had a look at using execCommand but cant get SaveAs to work. I tried Print at that came up with the dialog so I know it works but SaveAs is probably limited.
    iePage.execCommand "SaveAs", True, "c:\temp\zz.pdf"
    iePage.execCommand "Print", True

I dumped the location and that is the page you have got in URLDownload api
    Debug.Print iePage.Location

I checked the html as well and that didnt help either
    Debug.Print iePage.Body.innerHTML

I placed a breakpoint then ran the save code after the IE window loaded and showed the pdf but to no avail :(

If you look at the URL after pdf has been shown, it still points to the original url.

    If URLDownloadToFile(0&, iePage.Location, "C:\temp\zz2.pdf", 0&, 0&) = 0 Then

creates zz2.pdf but is not a pdf format. it is text. I renamed it to .html and it shows the first login page titled "CM/ECF Filer or PACER Login"

Im not sure how that page is created, it looks like bytes are read and streamed so there is no real pdf as such nor webpage, more dynamically created from scanned images. Probably originally saved in some MODCA format and ran on the server.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.