?
Solved

PDF download from Web via VBA - going through a security page

Posted on 2008-11-09
12
Medium Priority
?
1,666 Views
Last Modified: 2013-11-27
I am trying to download a page from the web and have to go through two security pages before getting to the PDF file.  The attached code is getting me through the security but my last part is apparantly opening a new browser windows before attempting to download the pdf.  I need it to use the open window and download the pdf.

The function (URLDownloadToFile) called in the code is :
Public Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

I have set references to InetTransferLib and Microsoft Internet Controls.

Any assistance would be greatly appreciated.
Dim ieApp As InternetExplorer
    Dim iePage As HTMLDocument
    Dim x As Single
    Dim timestart As String
    Const cTIME = 10000 'in MilliSeconds
    
    Set ieApp = New InternetExplorer
    
    ieApp.Visible = True
    
    'This changes based upon the document needed, hardcoded here for the example
    ieApp.Navigate "https://ecf.ganb.uscourts.gov/cgi-bin/show_case_doc?20,729812,,14604849,"
    
    'wait for page to load
    Do Until ieApp.ReadyState = READYSTATE_COMPLETE
    Loop
    
    Set iePage = ieApp.Document
    
    'Enter the User information and password in the first web page
    iePage.Forms(0).Item("login").Value = "cw0133"
    iePage.Forms(0).Item("key").Value = "4scirvy2"
    iePage.Forms(0).Item("clcode").Value = "system"
    iePage.Forms(0).Item("button1").Click
    
    'Wait for the second page to load and hit the submit button
    Call sSleep(cTIME)
    ieApp.Document.Forms(0).submit
    
    'Wait for the PDF file to load
    Call sSleep(cTIME)
        
    '***********************************************************************************************
    'This is my problem area it opens a new browser instead of using the one that has gone through the security check
    'Download the PDF file  - hardcoded url and file name for this example
    If URLDownloadToFile(0&, "https://ecf.ganb.uscourts.gov/cgi-bin/show_case_doc?20,729812,,14604849,", "c:\TEMP\TEST110908.PDF", 0&, 0&) = 0 Then
        MsgBox "Downloaded"
    Else
        MsgBox "Failed"
    End If
    '***********************************************************************************************
 
 
 
    ' Kill the browser
    Set ieBrowser = Nothing

Open in new window

0
Comment
Question by:marshalldavis
  • 6
  • 3
  • 2
11 Comments
 
LVL 49

Assisted Solution

by:DanRollins
DanRollins earned 1000 total points
ID: 22982452
You might consider using the

    XMLHttpRequest Object
    http://msdn.microsoft.com/en-us/library/ms535874(VS.85).aspx
of the existing browser. Presumably the session variables would still be in place. The name is decepetive -- this has nothing to do with XML... the oReq.responseBody is the raw bytes of the requested data (in this case, PDF file). All you would need to do is write the data to disk.
0
 

Author Comment

by:marshalldavis
ID: 22983908
I am unfamiliar with the object and will research and try to implement a solution and report back.  Thank you so much for giving me a direction to try.
0
 

Author Comment

by:marshalldavis
ID: 22996470
Dan,

I spent a good deal of time yesterday pursuing the answer and trying to become familiar with the XMLHttpRequest object.

It seems to require that I must provide it a URL and does not use the exisiting window I have open under the InternetExplorer object.

However, unless I missed it, the XMLHttpRequest does not seem to have the same ability to 'navigate' the pages like the IE object.  My system has to be able to get through the user name and password page as in the code I posted above.

Can the XMLHttpRequest navigate pages or be tied into the open IE object window?  If so, can you assist me with what method to use?  I did get it to download a page when it got to a page - so I know that part should work.

Thank you so much for responding.  This problem is causing a huge backload of work for me and your efforts are greatly appreciated.

0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 49

Expert Comment

by:DanRollins
ID: 22999247
In your IE instance, if you can access the DOM, then you should be able to access the window object. With IE 7, the XMLHttpRequest object is a member of the window object (as shown in the second example code in the link I provided).
As to "navigating" ... that is mainly a concept associated with interactively browsing; that is, a person looking at a page, clicking a link, looking at a different page, etc. In most cases where automation is the goal, there is not really any reason to display the page to a person -- one just gets the response and processes it (without ever needing to display it). In this case, it sounded like your goal is to download a file and save it do disk, so I assumed that the download (not the display) is the issue.
0
 

Author Comment

by:marshalldavis
ID: 23001202
Dan,

You are correct.  I really don't care about the display at all.  I simply need it to get through the first two security pages before it can "see" the pdf to download.

The code I have does accomplish this but both methods of download I have tried (URLDownloadToFile and XMLHttpRequest) seem to rely on going directly to the PDF page and are blocked by the security pages.

I will look into the DOM/Window object as you suggest and see if I can find that last magical piece I need to grab the pdf file.

Thanks for the continuing advice.
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 23009645
The key issue is that the server needs to think that the same user (same browser instance) is requesting both items (the login page and the download page).  It does so by setting up a session ID in a local cookie and that session ID gets passed back to the server with each subsequent request.   You need to find a way that uses the same instance for both requests.  The UrlDownloadToFile API must not be passing that cookie/session ID.  I believe that the XMLHttpRequest object of the window of the browser control will do the same thing viv-a-vis cookie handling as would a browser.
0
 

Author Comment

by:marshalldavis
ID: 23029390
Dan,

Thanks.  I spent much of the weekend and today and am continuing to search for the answer.
0
 
LVL 65

Expert Comment

by:rockiroads
ID: 23039992
Well possibly your code is working but failing maybe on the login. I noticed that the 2nd time you try it, the login prompts are not there. Perhaps that login info was saved in a cookie or something, not sure.

As a test, I ignored errors but switched it back on after
eg
    On Error Resume Next
    iePage.Forms(0).Item("login").Value = "cw0133"
    iePage.Forms(0).Item("key").Value = "4scirvy2"
    iePage.Forms(0).Item("clcode").Value = "system"
    iePage.Forms(0).Item("button1").Click
   
    'Wait for the second page to load and hit the submit button
    On Error Goto 0
    Call sSleep(cTIME)
    ieApp.Document.Forms(0).submit


Ideally the best way forward is to add validation. Check to see if those form items exist when you expect them to.

And since you defined iePage, you could just do iePage.Forms(0).submit but makes no difference really
0
 

Author Comment

by:marshalldavis
ID: 23043715
rockiroads,

Thank you for the response.  I have encountered the failure on subsequent attempts (I agree it is probably a cookie).  It has not been something I have attacked since I can't get the download to work when it does disply the pdf on the first try.

The frustrating part is that the browser is displaying the pdf that I need but everything I have tried to download it has failed.  The piece I seem to be missing is whether the InternetExplorer object has essentially a "save" command/method once the pdf is displayed.

I am willing to try anything and am not at all tied to the code above if you have another solution.

Thanks for the help.

0
 
LVL 65

Assisted Solution

by:rockiroads
rockiroads earned 1000 total points
ID: 23045739
ive had a look at using execCommand but cant get SaveAs to work. I tried Print at that came up with the dialog so I know it works but SaveAs is probably limited.
    iePage.execCommand "SaveAs", True, "c:\temp\zz.pdf"
    iePage.execCommand "Print", True

I dumped the location and that is the page you have got in URLDownload api
    Debug.Print iePage.Location

I checked the html as well and that didnt help either
    Debug.Print iePage.Body.innerHTML

I placed a breakpoint then ran the save code after the IE window loaded and showed the pdf but to no avail :(

If you look at the URL after pdf has been shown, it still points to the original url.

    If URLDownloadToFile(0&, iePage.Location, "C:\temp\zz2.pdf", 0&, 0&) = 0 Then

creates zz2.pdf but is not a pdf format. it is text. I renamed it to .html and it shows the first login page titled "CM/ECF Filer or PACER Login"

Im not sure how that page is created, it looks like bytes are read and streamed so there is no real pdf as such nor webpage, more dynamically created from scanned images. Probably originally saved in some MODCA format and ran on the server.
0
 

Accepted Solution

by:
marshalldavis earned 0 total points
ID: 23068577
rockiroads,

Thank you so much for taking the time to look at the problem.  I have to be able to do this so I am going continue to search for a solution and  leave the question open for a bit longer in case anyone else might have another solution.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Code that checks the QuickBooks schema table for non-updateable fields and then disables those controls on a form so users don't try to update them.
We live in a world of interfaces like the one in the title picture. VBA also allows to use interfaces which offers a lot of possibilities. This article describes how to use interfaces in VBA and how to work around their bugs.
In Microsoft Access, learn different ways of passing a string value within a string argument. Also learn what a “Type Mis-match” error is about.
Do you want to know how to make a graph with Microsoft Access? First, create a query with the data for the chart. Then make a blank form and add a chart control. This video also shows how to change what data is displayed on the graph as well as form…

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question