matelindonesia
asked on
Save HTML complete page programatically
Hi experts,
I want to make a program using embeddedWB component to save web page completely, the question How should I write code to to save all the webpage and also images ? Cause I only found SaveToFile procedure?
Thanks
Sirro
I want to make a program using embeddedWB component to save web page completely, the question How should I write code to to save all the webpage and also images ? Cause I only found SaveToFile procedure?
Thanks
Sirro
Hello Sir,
Kindly go through the following site for information and free download of the active X control for Web screen snapshot of a given url.
http://www.ziplib.com/_software/Development--Active_X/download_4.html
with regards,
padmaja.
Kindly go through the following site for information and free download of the active X control for Web screen snapshot of a given url.
http://www.ziplib.com/_software/Development--Active_X/download_4.html
with regards,
padmaja.
ASKER
I've tried save to Mht file, but how I can convert them to html and images? cause if I save in mht file, then it can't be opened in other computer,cause as long as I know, mht file link into IE chace.
I assume you never looked into the EmbeddedWB sources.
Have a look there.
Cheers,
Andrew
Have a look there.
Cheers,
Andrew
matelindonesia,
So, you want to be able to open this "saved" website on another computer.
You have a couple of options. There is MHT or Mozilla Archive Format (MAF).
These two allow you to save the entire site in a single file. While I don't know
that much about the MAF, I do know about creating an MHT.
Basically, what is happening when an MHT is being created is that you are
downloading the source, parsing it for links, images, urls, etc, and downloading
each of them. Now when an image is downloaded, it is MIME encoded and the
image data is essentially a part of the MHT file. I have an example at home I can
post later to show you how it is done. I don't have Mozilla installed so I don't know
how it would handle MHT opened as HTML.
I am finishing up on a Delphi conversion of a MHT Builder that I found on CodeProject.com.
http://www.codeproject.com/vb/net/MhtBuilder.asp
It works exactly like IE in saving a website as a single file. I haven't worked on it in a couple
of months but am ready to get started again. I am about 60% finished.
I'm actually surprised that Mozilla hasn't embraced the MHT format .vs building their own.
MHT is actually based on RFC standard 2557, compliant Multipart MIME Message
(MHTML web archive). http://www.ietf.org/rfc/rfc2557.txt
So, you want to be able to open this "saved" website on another computer.
You have a couple of options. There is MHT or Mozilla Archive Format (MAF).
These two allow you to save the entire site in a single file. While I don't know
that much about the MAF, I do know about creating an MHT.
Basically, what is happening when an MHT is being created is that you are
downloading the source, parsing it for links, images, urls, etc, and downloading
each of them. Now when an image is downloaded, it is MIME encoded and the
image data is essentially a part of the MHT file. I have an example at home I can
post later to show you how it is done. I don't have Mozilla installed so I don't know
how it would handle MHT opened as HTML.
I am finishing up on a Delphi conversion of a MHT Builder that I found on CodeProject.com.
http://www.codeproject.com/vb/net/MhtBuilder.asp
It works exactly like IE in saving a website as a single file. I haven't worked on it in a couple
of months but am ready to get started again. I am about 60% finished.
I'm actually surprised that Mozilla hasn't embraced the MHT format .vs building their own.
MHT is actually based on RFC standard 2557, compliant Multipart MIME Message
(MHTML web archive). http://www.ietf.org/rfc/rfc2557.txt
ASKER
Hi eddie,
I would like to explain the problem, firstly I need function to save webpage include all of images, but I didn't find any code to that whitout showing IE save as dialog, cause I want to make my own saev as dialog. So the solution is to save those in MHT file,cause from the article I read it will also save the image, and fortunately, I found the code. using this:
procedure WB_SaveAs_MHT(WB: TEmbeddedWB;
const FileName: string);
var
Msg: IMessage;
Conf: IConfiguration;
Stream: _Stream;
URL : widestring;
begin
if not Assigned(WB.Document) then Exit;
URL := WB.LocationURL;
Msg := CoMessage.Create;
Conf := CoConfiguration.Create;
try
Msg.Configuration := Conf;
Msg.CreateMHTMLBody(URL, cdoSuppressAll, '', '');
Stream := Msg.GetStream;
Stream.SaveToFile(FileName , adSaveCreateOverWrite);
finally
Msg := nil;
Conf := nil;
Stream := nil;
end;
end; (* WB_SaveAs_MHT *)
But the problem is, when I try to open the mht file into other computer using IE of cource, it only showed html page only,
Eddie, can you tell me why it can be happened? is there any code to convert mht file back itno HTML full page? if my second question is exactly with the solution you have, I would be pleased to get the progress of your project :)
best regards
I would like to explain the problem, firstly I need function to save webpage include all of images, but I didn't find any code to that whitout showing IE save as dialog, cause I want to make my own saev as dialog. So the solution is to save those in MHT file,cause from the article I read it will also save the image, and fortunately, I found the code. using this:
procedure WB_SaveAs_MHT(WB: TEmbeddedWB;
const FileName: string);
var
Msg: IMessage;
Conf: IConfiguration;
Stream: _Stream;
URL : widestring;
begin
if not Assigned(WB.Document) then Exit;
URL := WB.LocationURL;
Msg := CoMessage.Create;
Conf := CoConfiguration.Create;
try
Msg.Configuration := Conf;
Msg.CreateMHTMLBody(URL, cdoSuppressAll, '', '');
Stream := Msg.GetStream;
Stream.SaveToFile(FileName
finally
Msg := nil;
Conf := nil;
Stream := nil;
end;
end; (* WB_SaveAs_MHT *)
But the problem is, when I try to open the mht file into other computer using IE of cource, it only showed html page only,
Eddie, can you tell me why it can be happened? is there any code to convert mht file back itno HTML full page? if my second question is exactly with the solution you have, I would be pleased to get the progress of your project :)
best regards
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You can also use this code from this delphi3000.com article. I have not tested it, however.
Save a webpage with images
URL:http://www.delphi3000.com/article.asp?ID=3464
Category:Internet / Web
Uploader:Ken Wilcox
Question:Ever wanted to duplicate the functionality of your favorite
browser and save a web page with images to disk, well here is a simple
example that does just that. I've created two functions, the other
function just lets you pass in a progress bar to show the status of
the operation.
Please note: It requires Indy to run.
unit URLGet;
interface
uses
Classes, SysUtils, Forms, IdHTTP, ComCtrls;
procedure UrlDownloadToFile(URL, FileName: String); overload;
procedure UrlDownloadToFile(URL, FileName: String; PB: TProgressBar); overload;
implementation
procedure GetImages(html: String; Images: TStringList);
var
i, j: Integer;
tag: String;
link: String;
begin
html := StringReplace(html, #13#10, ' ', [rfReplaceAll]);
i := 1;
while (i <= Length(html)) do
begin
// we have a begin tag
if html[i] = '<' then
begin
tag := '';
while (i <= Length(html)) and (html[i] <> '>') do
begin
tag := tag + html[i];
inc(i);
end;
tag := tag + html[i];
//inc(i);
// we have the tag, see if it is an a href
link := '';
if pos('SRC=', UpperCase(tag)) <> 0 then
begin
j := 1;
while (j <= Length(tag)) do
begin
if (tag[j] = '"') or (tag[j] = '''') then
begin
link := '';
inc(j);
while (j <= Length(tag)) do
begin
link := link + tag[j];
inc(j);
if j > 12 then
begin
if (tag[j + 1] = '"') then
break;
if (tag[j+1] = '''') then
break;
end;
end;
link := link + tag[j];
//inc(j);
break;
end;
inc(j);
end;
if link <> '' then
Images.Add(link);
end;
end;
inc(i);
end;
end;
procedure UrlDownloadToFile(URL, FileName: String);
var
s, dir, path: String;
i: Integer;
ms: TMemoryStream;
imgs, sFile: TStringList;
HTTP: TIdHTTP;
begin
imgs := TStringList.Create;
HTTP := TidHTTP.Create(Application );
sFile := TStringList.Create;
try
s := HTTP.Get(URL);
if s <> '' then
begin
if FileName <> '' then
begin
path := ChangeFileExt(FileName, '') + '_files';
CreateDir(path);
dir := ExtractFileName(ChangeFile Ext(FileNa me, '')) + '_files\';
GetImages(s, imgs);
ms := TMemoryStream.Create;
try
for i := 0 to pred(imgs.Count) do
begin
ms.Clear;
HTTP.Get(URL + imgs[i], ms);
ms.Position := 0;
if ms.Size <> 0 then
ms.SaveToFile(dir + imgs[i]);
s := StringReplace(s, imgs[i], dir + imgs[i], [rfReplaceAll]);
end;
finally
FreeAndNil(ms);
end;
sFile.Text := s;
sFile.SaveToFile(FileName) ;
end;
end;
finally
FreeAndNil(sFile);
FreeAndNil(HTTP);
FreeAndNil(imgs);
end;
end;
procedure UrlDownloadToFile(URL, FileName: String; PB: TProgressbar);
overload;
var
s, dir, path: String;
i: Integer;
ms: TMemoryStream;
imgs, sFile: TStringList;
HTTP: TIdHTTP;
begin
if Assigned(PB) then
begin
imgs := TStringList.Create;
HTTP := TidHTTP.Create(Application );
sFile := TStringList.Create;
try
s := HTTP.Get(URL);
if s <> '' then
begin
if FileName <> '' then
begin
path := ChangeFileExt(FileName, '') + '_files';
CreateDir(path);
dir := ExtractFileName(ChangeFile Ext(FileNa me, '')) + '_files\';
GetImages(s, imgs);
ms := TMemoryStream.Create;
try
PB.Max := pred(imgs.Count);
for i := 0 to pred(imgs.Count) do
begin
ms.Clear;
HTTP.Get(URL + imgs[i], ms);
ms.Position := 0;
if ms.Size <> 0 then
ms.SaveToFile(dir + imgs[i]);
s := StringReplace(s, imgs[i], dir + imgs[i],[rfReplaceAll]);
PB.Position := i;
Application.ProcessMessage s;
end;
finally
FreeAndNil(ms);
end;
sFile.Text := s;
sFile.SaveToFile(FileName) ;
end;
end;
finally
FreeAndNil(sFile);
FreeAndNil(HTTP);
FreeAndNil(imgs);
end;
end
else
UrlDownloadToFile(URL, FileName);
end;
end.
Save a webpage with images
URL:http://www.delphi3000.com/article.asp?ID=3464
Category:Internet / Web
Uploader:Ken Wilcox
Question:Ever wanted to duplicate the functionality of your favorite
browser and save a web page with images to disk, well here is a simple
example that does just that. I've created two functions, the other
function just lets you pass in a progress bar to show the status of
the operation.
Please note: It requires Indy to run.
unit URLGet;
interface
uses
Classes, SysUtils, Forms, IdHTTP, ComCtrls;
procedure UrlDownloadToFile(URL, FileName: String); overload;
procedure UrlDownloadToFile(URL, FileName: String; PB: TProgressBar); overload;
implementation
procedure GetImages(html: String; Images: TStringList);
var
i, j: Integer;
tag: String;
link: String;
begin
html := StringReplace(html, #13#10, ' ', [rfReplaceAll]);
i := 1;
while (i <= Length(html)) do
begin
// we have a begin tag
if html[i] = '<' then
begin
tag := '';
while (i <= Length(html)) and (html[i] <> '>') do
begin
tag := tag + html[i];
inc(i);
end;
tag := tag + html[i];
//inc(i);
// we have the tag, see if it is an a href
link := '';
if pos('SRC=', UpperCase(tag)) <> 0 then
begin
j := 1;
while (j <= Length(tag)) do
begin
if (tag[j] = '"') or (tag[j] = '''') then
begin
link := '';
inc(j);
while (j <= Length(tag)) do
begin
link := link + tag[j];
inc(j);
if j > 12 then
begin
if (tag[j + 1] = '"') then
break;
if (tag[j+1] = '''') then
break;
end;
end;
link := link + tag[j];
//inc(j);
break;
end;
inc(j);
end;
if link <> '' then
Images.Add(link);
end;
end;
inc(i);
end;
end;
procedure UrlDownloadToFile(URL, FileName: String);
var
s, dir, path: String;
i: Integer;
ms: TMemoryStream;
imgs, sFile: TStringList;
HTTP: TIdHTTP;
begin
imgs := TStringList.Create;
HTTP := TidHTTP.Create(Application
sFile := TStringList.Create;
try
s := HTTP.Get(URL);
if s <> '' then
begin
if FileName <> '' then
begin
path := ChangeFileExt(FileName, '') + '_files';
CreateDir(path);
dir := ExtractFileName(ChangeFile
GetImages(s, imgs);
ms := TMemoryStream.Create;
try
for i := 0 to pred(imgs.Count) do
begin
ms.Clear;
HTTP.Get(URL + imgs[i], ms);
ms.Position := 0;
if ms.Size <> 0 then
ms.SaveToFile(dir + imgs[i]);
s := StringReplace(s, imgs[i], dir + imgs[i], [rfReplaceAll]);
end;
finally
FreeAndNil(ms);
end;
sFile.Text := s;
sFile.SaveToFile(FileName)
end;
end;
finally
FreeAndNil(sFile);
FreeAndNil(HTTP);
FreeAndNil(imgs);
end;
end;
procedure UrlDownloadToFile(URL, FileName: String; PB: TProgressbar);
overload;
var
s, dir, path: String;
i: Integer;
ms: TMemoryStream;
imgs, sFile: TStringList;
HTTP: TIdHTTP;
begin
if Assigned(PB) then
begin
imgs := TStringList.Create;
HTTP := TidHTTP.Create(Application
sFile := TStringList.Create;
try
s := HTTP.Get(URL);
if s <> '' then
begin
if FileName <> '' then
begin
path := ChangeFileExt(FileName, '') + '_files';
CreateDir(path);
dir := ExtractFileName(ChangeFile
GetImages(s, imgs);
ms := TMemoryStream.Create;
try
PB.Max := pred(imgs.Count);
for i := 0 to pred(imgs.Count) do
begin
ms.Clear;
HTTP.Get(URL + imgs[i], ms);
ms.Position := 0;
if ms.Size <> 0 then
ms.SaveToFile(dir + imgs[i]);
s := StringReplace(s, imgs[i], dir + imgs[i],[rfReplaceAll]);
PB.Position := i;
Application.ProcessMessage
end;
finally
FreeAndNil(ms);
end;
sFile.Text := s;
sFile.SaveToFile(FileName)
end;
end;
finally
FreeAndNil(sFile);
FreeAndNil(HTTP);
FreeAndNil(imgs);
end;
end
else
UrlDownloadToFile(URL, FileName);
end;
end.
ASKER
Hi eddie, thanks for your comment, I really appreciated it.
"..That may be a pretty cool little utility", yes I wonder it too, and I need to save a single web page, so firstly I will try to use your prefrer solution (http://www.delphi3000.com/article.asp?ID=3464), if it goes well, I will use it, but I still hope there will be such procedure or function to convert mht fiile into HTML with all of those images,because I don't need to get the file (html+images) for twice.(first = when I browsing, second=when I want save the page.).
Thanks
"..That may be a pretty cool little utility", yes I wonder it too, and I need to save a single web page, so firstly I will try to use your prefrer solution (http://www.delphi3000.com/article.asp?ID=3464), if it goes well, I will use it, but I still hope there will be such procedure or function to convert mht fiile into HTML with all of those images,because I don't need to get the file (html+images) for twice.(first = when I browsing, second=when I want save the page.).
Thanks
Well, you can, instead of using the URLDownloadToFile, get the info from the cache but
it is difficult determining what is what in there.
You can take out the idHTTP stuff and assign the source of the TEmbeddedWB to a string
and just use the same string. You would, however, be required to retrieve all the images from
the web again.
I'll see if I can find anything on getting the correct images from the cache.
it is difficult determining what is what in there.
You can take out the idHTTP stuff and assign the source of the TEmbeddedWB to a string
and just use the same string. You would, however, be required to retrieve all the images from
the web again.
I'll see if I can find anything on getting the correct images from the cache.
OK, you can use the IECache utilites from http://www.euromind.com/iedelphi/iecache.htm
to get the info for each entry in the cache. This way, you can iterate through them and
find the image you want by checking the URL against the URL from the HTML. Then
just copy the file to another directory and modify the URL in the HTML to show the new
location in the img tag's SRC attribute.
Look specifically at the GetEntryInfo on the left side of the page
to get the info for each entry in the cache. This way, you can iterate through them and
find the image you want by checking the URL against the URL from the HTML. Then
just copy the file to another directory and modify the URL in the HTML to show the new
location in the img tag's SRC attribute.
Look specifically at the GetEntryInfo on the left side of the page
ASKER
Oke Eddie, mean while waiting for your additional help, I'll try to fix some bugs I found on URLDownloadToFile.
regards
regards
ASKER
Fiuh..., to many tag checking that should be done, it more dificult than I though
No, just use the DOM instead of the way it is doing it. Take a look at these posts:
http://www.delphipages.com/threads/thread.cfm?ID=134726&G=134653
http://www.delphipages.com/threads/thread.cfm?ID=134761&G=134653
http://www.delphipages.com/threads/thread.cfm?ID=134726&G=134653
http://www.delphipages.com/threads/thread.cfm?ID=134761&G=134653
ASKER
Oke eddie, Ive used Dom to get Image TAG, but how about background tag which also can be an image?
Get the background attribute of the body tag (IHTMLBodyElement).
If you get it working, I'd like to see the finished results, please...
https://www.experts-exchange.com/questions/20881307/Saving-TWebBrowser-as-Web-Page-Complete-Error.html
https://www.experts-exchange.com/questions/20165328/Saving-HTML-form-from-download.html
https://www.experts-exchange.com/questions/10416481/Save-HTML-File-With-Delphi-Browser.html
https://www.experts-exchange.com/questions/10084850/Save-html-as-text.html
I'm sure you will find usefull information for your problem.
Cheers