Link to home
Start Free TrialLog in
Avatar of zxw
zxw

asked on

how can I get the content of the web browser(Internet Explorer)

Hi,

 I want to add a button into IE's toolbar, and when this button is clicked, I can save all the content of the current page of Internet Explorer. I know I should make a COM dll, and regist it, but how? Fully functional source code wanted.

Thanx.
Avatar of rondi
rondi

listening intently...
I haven't looked into it yet, but this site (http://www.euromind.com/iedelphi/) is a _great_ reasource for this kind of stuff. This page: http://www.euromind.com/iedelphi/ie5tools.htm has a unit for adding things like toolbar buttons, it's prob. a good place to start.

Gl
Mike
hi,
what do you need exactly ?
save the page with all the images ? not only the page sources code ?
this can't be done by a single operation. get the page sources code is easy, by save all the linked ressources needs to be done by hand with your own routines that saves all the linked ressources...
I know ie has a save as feature, but unfortunately is can't be called by program without showing the save as dialog...
could you be more specific before I run after some code to show you ?
thanks...
euh, by the way, the mozilla engine has the routines you need, but calling a mozilla engine in back of an ie webbrowser is kinda overkill :)

John.
Sorry for the spam, but I'm all ears too :)
Avatar of zxw

ASKER

Yes, the SAVE function should just like what IE'save feature does, it should save all the things on the page, such as source code, images, flashs,waves,etc.

Thanks for your time.
would it be ok for you to call the dialog then send keypresses to it ?
This would give you a solution a bit glimsy, but a solution...
Avatar of zxw

ASKER

No, it should be slient, no dialog please.
then there is only one way:
+get the source of the page (easy)
+get all the external links in the page (easy)
+create a directory to put the ressources in it (:)
+retrieve all the external ressources one after another,
and put them in the directory
+update the source file with the local links (you have to use absolute paths, for links and for scripts)

it's not that it's very complicated...
It's just borring to do I think...
But I'm interrested in having the solution :)
So what do you need to start develop this ?

By the way, this is something that I searching since a while, and I don't think it can be done the easy way (just call a save as)...

John.
Well, so far I've got this. You can create toolbar buttons that call activeX objects, exe's scripts, new html files, etc. The most flexible seems to be the activeX control, but I don't have access just now to a copy of delphi that can compile new ones, so here I've chosen to try & solve this with a script instead. You could try .exe files, but then it's difficult to get access to the DOM, unless you make the assumption that the current window is still the one you clicked in which case you still have to enum all the ie instances check against the foreground window handle & pull a neat trick to get the IHTMLDocument2 from the HWND. Hence I've decided to try a scripted approach.

If you get the IE5Tools unit here: http://www.euromind.com you can call a function like this:

addToolbarBtn(true,SCRIPT,some_str_for_the_label,'','',some_file_path);

some_file_path should be a full file path to a html file (*.htm*) with _just_ script in it. So if you call this:

addToolbarBtn(true,SCRIPT,'foo','','','c:\windows\foo.html');

and c:\windows\foo.html looks like this:

<script>
var win = external.menuArguments;
var doc = win.document;
doc.body.innerText += 'foo';
</script>


Then clicking the foo button (you'll have to manually add it if you'r ie toolbar is customized) will add the word 'foo' to the bottom of the current web page. So you get the button & you get access to the DOM, but no way of silently saving. The good (depending upon you're point of view) part is that it seems that this breaks the security behavior - you can put something like the following into the c:\windows\foo.html file & there's no warnings:

<script>
var fso = new ActiveXObject('Scripting.FileSystemObject');
var output = fso.createNewTextFile('c:\\windows\\temp\\output.txt');
output = fso.GetFile('c:\\windows\\temp\\output.txt');
output = output.OpenAsTextStream(2,0);
output.write('I can't believe this didn't trip the \"Dangerous ActiveX\" warning!!! You'd think I put hin in a .hta instead of a .html !!!');
output.Close();
</script>

hit the button & sure enough you've got a text file c:\windows\temp\output.txt with my exclamation in it. So it would be easy enough to dump the text of html of the web page like this:

output.write(doc.all.tags('HTML')[0].outerHTML);

but the linked stuff (ie, images) would be harder. It's easy enough to enum doc.body.all.tags('IMG') but there's no native way (in jscript) to save them. You could prob. find (or write) an activeX that takes a url & dumps the image to a local file, but it would be preferrable if you could do it save the one you've already downloaded, or get it from the cache.


GL
Mike
Avatar of zxw

ASKER

get the linked stuff from the cache? good idea! but how?
Avatar of zxw

ASKER

Can we use the technology from "Microsoft Internet Controls" (Shdocvw.dll) and "Microsoft HTML Object Library" (Mshtml.dll)?
> Can we use the technology from "Microsoft Internet Controls" (Shdocvw.dll)
No, you cannot... If you are able to use it let me know...
That component is no longer working for us, since windows 2k, I tried and they is a long thread here about that. no result.
Anyway, it would only allow you to save a screenshot of the site, not the real page... Would this be enought for you to have a screen shot of the page ? Because this is something I can do... With no dialog...
Avatar of zxw

ASKER

To jeurk,

No, you know what I need is not the screen shot of the page!

-->No, you cannot... If you are able to use it let me know...
Please visit http://support.microsoft.com/support/kb/articles/Q292/4/85.ASP .
Which does exactly what I've proposed - saves the html & ignores all imbeded objects (like images). I've also looked in the ie cache api's but either I've missed something or it doesn't look like you can get parts of a page from the cache. Of course there's also no guarantee that the page would be in the cache so you'd still have to be able to get it either from ie or the remote url. I've sugested how one might get another copy of the remote image, but I still am not sure how one might get it from the running copy of ie. I do have a glimmer of an idea though. It's easy enough to enum the images & get a ref to one of them. Now I wonder if there is a way to get the script to pass an OLEPicture to something that can save it?

GL
Mike
I was talking about Shdocvw.dll...
as far as I know you will not ne able to use it in your programs. At least I don't know how to do...and I know it's not the point here.
Yay! A break through.
Reading this usenet thread: http://groups.google.com/groups?hl=en&threadm=eLK3%24a5HAHA.88%40cppssbbsa02.microsoft.com&rnum=16&prev=/groups%3Fq%3Dwsh%2Bsave%2Bimage%26hl%3Den%26start%3D10%26sa%3DN

I was able to modify my srcipt to:

<script>

var win = external.menuArguments;
var doc = win.document;
var oHTTP = new ActiveXObject('Microsoft.XMLHTTP');
var oStream = new ActiveXObject('adodb.stream');


oHTTP.open('Get',doc.all.tags('IMG')[0].href,false);
oHTTP.send();

var adTypeBinary = 1;
var adSaveCreateNotExist = 1;
var adSaveCreateOverWrite = 2;
oStream.type = adTypeBinary;
oStream.open();
oStream.write(oHTTP.responseBody);
oStream.savetofile('h:\\foo.gif',adSaveCreateOverWrite);

</script>

I then loaded up www.hotmail.com, hit save & it _silently_ saved the upper left logo (the first image on the page) to h:\foo.gif . I don't know if it's forcing a reload of the pix or if the XMLHTTP object makes use of the local cache. I really don't know much about either of the two controls used in the example, except that the exmaple did work & the objects both should be available to any recent IE install. Given this it was a relativly simple matter to enum all the images, write them to a local file, & change their src attributes to the local file names:

<script>

var win = external.menuArguments;
var doc = win.document;
var oHTTP = new ActiveXObject('Microsoft.XMLHTTP');
var oStream = new ActiveXObject('adodb.stream');
var images = doc.all.tags('IMG');
images = new Enumerator(images);

adTypeBinary = 1;
adSaveCreateNotExist = 1;
adSaveCreateOverWrite = 2;
imageNumber = 0;

for (;!images.atEnd();images.moveNext())
{
     oHTTP.open('Get',images.item().src,false);
     oHTTP.send();

     var adTypeBinary = 1;
     var adSaveCreateNotExist = 1;
     var adSaveCreateOverWrite = 2;
     oStream.type = adTypeBinary;
     oStream.open();
     oStream.write(oHTTP.responseBody);
     fileName = images.item().href;
     pos = fileName.length-1;
     while (fileName.charAt(pos) != '.')
     {
          pos--;
     };
     fileName = imageNumber+fileName.slice(pos,pos+4);
     images.item().src = fileName;
     oStream.savetofile('h:\\'+fileName,adSaveCreateOverWrite);
     alert(fileName);
     imageNumber++;
     oStream.close();
};

var fso = new ActiveXObject('Scripting.FileSystemObject');
fso.CreateTextFile('h:\\foo.html');
html = fso.GetFile('h:\\foo.html');
html = html.OpenAsTextStream(2,0);
html.write(doc.all.tags('HTML')[0].outerHTML);
html.close();

</script>

GL
Mike
Avatar of zxw

ASKER

Mike,
 
   You are leading me to the script solution step by step, but as I originally asked, I want the solution to be in the range of a COM dll.
ASKER CERTIFIED SOLUTION
Avatar of dyancer
dyancer

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi zxw, sorry about not getting back to you - no notif's are not fun :(.

Anyway, I realize that you were prob. looking for a com solution, but to be honest, while the idea intrigued me, I had no interest in doing a com .dll. That and the question

"I know I should make a COM dll, and regist it, but how?"

seemed to leave the matter open, a little.
At any rate, I hope that the information I've presented has been useful - I know I've learned alot. I'm afraid, though, that I'm not going to pursue the com solution.

Any way, thanks very much for the fascinating question - I've had a lot of fun :)

GL
Mike
Avatar of Asta Cu
Somewhat off-topic, but important.

****************************** ALERT********************************
WindowsUpdate - Critical Update alert March 28, 2002 from Microsoft
http://www.microsoft.com/technet/treeview/default.asp?url=/technet/security/bulletin/ms02-015.asp
Synopsis:
Microsoft Security Bulletin MS02-015  
28 March 2002 Cumulative Patch for Internet Explorer
Originally posted: March 28, 2002
Summary
Who should read this bulletin: Customers using Microsoft® Internet Explorer
Impact of vulnerability: Two vulnerabilities, the most serious of which would allow script to run in the Local Computer Zone.
Maximum Severity Rating: Critical
Recommendation: Consumers using the affected version of IE should install the patch immediately.
Affected Software:
Microsoft Internet Explorer 5.01
Microsoft Internet Explorer 5.5
Microsoft Internet Explorer 6.0

Thought you'd appreciate knowing this.
":0)
Asta
Force-accepted by
Netminder
CS Moderator
Why did u accepted my comment as an answer?
i did not give any source code!!!

any way the prev link is now www.entopia.com