Link to home
Start Free TrialLog in
Avatar of pcalabria
pcalabriaFlag for United States of America

asked on

Need to download the contents of a URL into a string with MS Access

I need a new way to read an URL into a variable.

This would be on a Win10 computer running MS Access 2016
Does anyone have any code to share?
Avatar of Scott McDaniel (EE MVE )
Scott McDaniel (EE MVE )
Flag of United States of America image

Do you mean read the actual URL into a variable? Or retrieve something from that URL into a variable?
ASKER CERTIFIED SOLUTION
Avatar of Jim Dettman (EE MVE)
Jim Dettman (EE MVE)
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of pcalabria

ASKER

Yes.. the webpage that results.

@Jim... all I need to do is load the html code from the resulting webpage into a text variable that I can txtURLpage... how would I do that?

Also... do you think this code will work when compiled with Access 2K as mde files and run on XP machines which have Access 2K loaded?

I also have a few Wn10 machines on the network now... but we still compile the source with Access 2K then copy the mde file to the Win10 machine.

If necessary.. I can compile with access 2016, but I don't  hunk that code would run on the computer machines.
@Jim... all I need to do is load the html code from the resulting webpage into a text variable that I can txtURLpage... how would I do that?

 With the above code, when the web page was done loading, you would do:

  txtURLpage = IE.Document.DocumentElement.innerHTML

  to get the HTML for the page (think I got that right).

Also... do you think this code will work when compiled with Access 2K as mde files and run on XP machines which have Access 2K loaded?

  As far as Access, yes but this code relies on using IE (Internet Explorer).   I would use late binding with IE to avoid version problems.

Jim.
What is it exactly that your doing?   If going after one site, it's possible that it has an API that you could work with directly and avoid IE.

Jim.
@Jim Thanks... but how can I tell... or otherwise kill time if I have to wait... and then how do I know when the webpage is loaded?
and then how do I know when the webpage is loaded?

 That's already in that code sample I posted:

    ' Wait while IE loading...
    'IE ReadyState = 4 signifies the webpage has loaded (the first loop is set to avoid inadvertently skipping over the second loop)
    Do While IE.ReadyState = 4: DoEvents: Loop   'Do While
    Do Until IE.ReadyState = 4: DoEvents: Loop   'Do Until

Open in new window


 Please read through the code I posted.

 and BTW that's not my code, but an example that I swiped off the net.   Was faster doing that then me digging out what I had done in the past.

Jim.
The simplest and fastest method is to use a function like DownloadFile found here:

Show pictures directly from URLs in Access forms and reports
@Gustav I'm happy to hear from you.  Although I suspect you may attached the wrong link in this message, you provided me with instructions to use DownloadFile many years ago. The code you provided still works well on our XP machines, however, XP does not allow us to install any version of Internet Explorer higher than 8.0, and the real problem is that IE8 no longer works with the website we are scraping.

When we execute our Access code (that includes the DownloadFile code) as an mde file on a Win10 machine, the DownloadFile routine does not work.  I don't know why.  Every other line of code in the 300MB source code module works on the Win10 machine... except DownloadFile.

So that's the real problem.  We need a version of DownloadFile, or equivalent program, that we can run on the Win10 machine, or a completely new way to duplicate the function of the program, such as what Jim has suggested.

I believe you may have attached the wrong link.  Would this me new code, or the same code you sent me ten years ago?


@Jim  We are visiting a website, and then downloading the resulting page to a file on the disk.  Then we read the file into a string variable (text) and then call a very complicated subroutine that extracts the info we need an stores it in appropriate tables.

Basically:

Call ProcessOurData(txtURLPageText)

ProcessOurData is about 100 pages of code and ten different suproutines.  All I need to do is sent in the HTML code that we download from the URL and everything is happy.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
<< Then we read the file into a string variable (text) and then call a very complicated subroutine that extracts the info we need an stores it in appropriate tables.>>

 With the example I posted, all your doing is skipping the disk file part.   You can get the page into a variable.   However, the solution also depends on having IE installed.  

<<ProcessOurData is about 100 pages of code and ten different suproutines.  All I need to do is sent in the HTML code that we download from the URL and everything is happy.>>

 That would still be doable.   I think though if you can avoid using IE, I would and since you are already using gustav's approach, I would look at getting that to work rather than going to a new approach.

Jim.
I plan to write new code today to see if I can get it to work on the Win10 Machine. Yes 32-bit A2016.

The real problem here are the XP machines as the latest IE I can install is IE8 and IE8 no longer works with many of our websites.
What I really need is an solution that is browser independent... perhaps something that works with Chrome?
Gustav's approach requires adding a reference to Microsoft Internet Controls... I'm not sure buy I suspect this is the core of IE8 on these machines.


So both solutions would only work on our two Win10 mahcines... Whereas we have an office full of XP machines.
Any other thoughts before I dive into this?  If I get it working on Win10 we will need to only run this code from specific mahines...or upgrade machines.
Thanks
My function doesn't require anything special.
You can fetch and run the flag demo right away.
@Gustav Good morning.  I just finished crawling through your code and I understand the new revision, which should not effect me.

As I recall, the problem we are having on the new machine is that this is not a public website.  We pay a membership fee and must log in.  With this said, once we log in using IE8 the first time, we do not need to log in again.  Apparently it uses a cookie.

On our xo machines we log in using IE8, and then your code works perfectly.

On the Win10 machine, if we log in with the version if ie installed by win10, then try to use the code, the code does not recognize that we are already logged in.  That’s the problem that has prevented us from using the code on win10 machines.

Perhaps the code is using a different IE engine, and all we need is to log in with a different browser?
Oh, that explains.

Windows 10 is much more strict regarding security than XP, so this may be the single reason.

Another could be, that the browser will run in its 64-bit version, while Access is running 32-bit.
If you have Access 64-bit running on the Windows 10 machines, you could check out the revised code (the attached PictureUrl.zip file) at the link above to the article. I have updated it a few minutes ago.
Gustav, keep in mind we are running an MDE file that was compiled using Access2K on an XP machine, and then copied to the Win10 machine.

I"m wondering what's going on with the references.  I can open the MDB file with Access 2016 and take a look... the XP version points Microsoft Internet Controls to ieframe
@Gustav  Are you saying that the code you provided requires using the 64-bit version of Access with win10 because the browser used with Win10 is 64 bit?  It seems like if that was the case you would have already run into it, am I missing something?
No. It will work in both 32- and 64-bit Access.
Windows 64-bit will run both, while Windows 32-bit only will run Access 32-bit.
I'm running 64-bit Win 10 Pro with 32-bit access and the browsers installed by windows.. ie and edge... your routine should work... correct?
Yes, but it will only download "what is" ... it will not force an update or something else of the web page.
@Gustav Okay.. I'll try it today on my Win10 machine.  Thanks
@Gustav  I have your code working win the Win10 machine... but I still have a problem.

One of our sites requires that we first log in, however, we only have to log in once.  Even if we close the browser, we do not have to log in a second time.

If I go to this site with IE8 on my XP machine and log in, I can then use Access and the code provided (by Gustave) to automatically download as if the site did not require password access.  Everything works fine in XP.  If I view the Access references, the Microsoft Internet Controls points to ieframe.dll.

On the Win10 machine, when I log into the site, IE 11 is used, because this is the latest version of IE on my machine.  However, when I use the code provided (Gustav's code) the system does not let me get past the log in screen.  If I view the references, the Microsoft Internet Controls points to the ieframe.dll in the sysWOW64 folder.

I suspect the reference used is a different version of IE... perhaps IE 10???  I'm pretty sure the code needs to be using the same IE as we use in the browser.
I'm using the 64 bit version of Windows but the 32 bit version of Access 2016.  What would you try?  Its very important that we be able to access this website programmatically.
I don't think you have control over the bit-version of the browser in Windows 10 64-bit. It will use, whatever it finds best.
I suspect my problem is that I invoke a different Internet Explorer when I start IE manually from Win10 Pro and when I start IE from Access 2016 running in 32-bit mode.

Access 2016 is set with the reference  to Microsoft Internet Control set to ieFrame.dll in the sysWow64 folder.

Can anyone help?  All I want to do at this point is to be able to start the same browser manually as I start programmatically when I use Access 2016 on Win10.
<<On the Win10 machine, if we log in with the version if ie installed by win10, then try to use the code, the code does not recognize that we are already logged in. >>

<<I suspect the reference used is a different version of IE... perhaps IE 10???  I'm pretty sure the code needs to be using the same IE as we use in the browser.>>

  I would think it would have more to do with settings in IE. i.e. not being able to run Active X controls, not a trusted site, not accepting cookies, etc.  Some setting related to security  more than anything.

 Reading back through, I'm a little lost on where things are at.    Gustav's solution is using an API call and not relying on IE; is that not a good approach?

 I'm not sure why were talking about versions of IE.

Jim.
@Jim  Gustavs solution works great on XP as well as Win10 with the First website we need to access... which is one that does not require sign in... so it gets me half way there.  The second website is the problem.  The second website requires that we first log in.   If we log into the website with Chrome of Safari using XP, and then use Gustav's routine, we can not get past the Please Sign In page.  However, on XP if when we log into the Second website with IE 8, everything works perfectly, which leads me to believe Gustav's routine is also in some way using IE 8???... I noticed a reference for Microsoft Internet Controls is set to ieframe.dll so I assumed Gustav's code was using this reference???

Using Win10 Gustaves  routine captures the please log in page regardless of whether we are logged in with IE 11 or not.  My guess is that a different IE 11 is launched manual compared to whatever Gustav's routine is doing...

Since the Microsoft Internet Controls reference is set for ieFrame in the sysWow64 folder, my theory is that Gustav's rouine is using the 32 bit version of IE, and clicking the link launches the 64 bit version, and that's why the website does nor recognize the sign in?

Do you think I'm barking up the wrong tree?

,
<<Do you think I'm barking up the wrong tree?>>

 Yes.   I think it has more to do with security settings in place for each of the browsers.

<<Since the Microsoft Internet Controls reference is set for ieFrame in the sysWow64 folder, my theory is that Gustav's rouine is using the 32 bit version of IE, and clicking the link launches the 64 bit version, and that's why the website does nor recognize the sign in?>>

  You don't need this reference.   Gustav's call is a direct API call.

Jim.
@Jim   <<Yes.   I think it has more to do with security settings in place for each of the browsers.>>

Thanks for getting back to me so quickly, I'm actively working on this now.  I'm not sure which security settings I can change???
If I open Internet Explorer manually, and then log into the website, it DOES remember me the next time I open the browser.
I can close the browser, even reboot the machine, and when I return manually, it remembers my creditilals...  Apparently the security settins for the browser are fine... when started manually...

As far as Gustav's code... is there any way to change security settings?  I'm not aware of any way to make any changes.
<<Thanks for getting back to me so quickly, I'm actively working on this now.  I'm not sure which security settings I can change???>>

  I'm not sure either.  I would suspect it would deal with the saving of cookies, or in running scripts.   For a clue on that, I would check the web site your logging into and see if they list any requirements.

  The other issue that your bumping into is executing things outside of a browser.   Many of the settings you find in the browser for "Internet settings" affects how Windows behaves with certain other functions.   But in this case, it sounds like URLDownloadToFile () doesn't look at those settings at all.   So if this web site relies on cookies for the login, that may or may not work with URLDownloadToFile ()   (and it seems like not).

   I would suspect that to get this working the way you want, you're going to need to resort to using the Web Browser control on a form.   I would do this rather than trying to control IE through automation, as you won't know what's installed.

   One problem with the Web Browser control though is that default mode is for IE8 emulation.  You'll need to add some registry entries to get IE11 emulation, and that setting will apply to all applications, not just your individual DB.   there are docs/blogs all over the place covering this.  i.e.:

https://weblog.west-wind.com/posts/2011/May/21/Web-Browser-Control-Specifying-the-IE-Version

Jim.
Thanks for the help.  I never did get Gustav’s code working with the site that required a password.  The best guess is that the api call was not working correctly with cookies.  The code provided by Jim however did work with the website that requires sign in.  I first meet to sign in with IE but the code then allows me to grab the correct pages.  I split the points because Gustav code will be used in the routine that does not require web sign in.