naleo96
asked on
Using powershell to open an MTHML document and same embedded images as a separate files
Using PowerShell script, how do I go about iterating through the images in the file, saving each image to its own individual file.
My scenario is as follows;
I would like to use PowerShell to accomplish the extraction of the images from the HTML (once the image files are created, moving files to the appropriate location is not an issue) but as a Powershell newby I don't really know where to start with getting the images. I have opened the MHTML as per the following, but don't know what to do next.
$ie = new-object -com "InternetExplorer.Applicat ion"
$ie.visible = $true
$ie.navigate("file://c:/pr ojects/rep orts/20150 831/Report .mhtml")
while ($ie.busy) {sleep -milliseconds 50}
Hope someone can help ... thanks.
My scenario is as follows;
An MHTML is produced by daily by SSRS; the file comprising a number of charts (embedded as images) and text
The images need to be extracted as individual files and uploaded to an external FTP site
I would like to use PowerShell to accomplish the extraction of the images from the HTML (once the image files are created, moving files to the appropriate location is not an issue) but as a Powershell newby I don't really know where to start with getting the images. I have opened the MHTML as per the following, but don't know what to do next.
$ie = new-object -com "InternetExplorer.Applicat
$ie.visible = $true
$ie.navigate("file://c:/pr
while ($ie.busy) {sleep -milliseconds 50}
Hope someone can help ... thanks.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi All.
Thanks for your replies. I've been away a couple of days and will be able to review your suggestions later today.
Thanks for your help so far
Thanks for your replies. I've been away a couple of days and will be able to review your suggestions later today.
Thanks for your help so far
ASKER
Thanks for your help.
The theory is all great but I've hit a bit of a brick wall. The BITS transfer method works great on a sample site. In my case however the path to the images is actually a parameterised query (I think thats the right term .. see the example below) and as result the Start-BitsTransfer fails with a 'FileNotFoundException'.
IMG Element Name):
Error Message:
One alternative I have is to load a MHTML file. However this too is unsuccessful and again the images don't actually exist as files ... they're embedded in the file. The IMG source looks like this:
Once again, the Bits-Transfer method cannot be used to copy the file since the physical file does not exist.
My next strategy will be to get the report output as a Word file, and try to extract the images from that.
Any other suggestions would be most welcome.
The theory is all great but I've hit a bit of a brick wall. The BITS transfer method works great on a sample site. In my case however the path to the images is actually a parameterised query (I think thats the right term .. see the example below) and as result the Start-BitsTransfer fails with a 'FileNotFoundException'.
IMG Element Name):
<IMG BORDER="0" style="top:0px;left:0px;position:relative;" SRC="http://hansendbserver/ReportServer?%2FReservoir%20Level%20Reporting%2FReservoir%20Storage%20Level%20Report&rs%3ASessionID=zvai5c550yxoi1454paeu2yd&rs%3AFormat=HTML4.0&rs%3AImageID=IMGCON_1_0"/>
Error Message:
Start-BitsTransfer : The filename, directory name, or volume label syntax is incorrect. (Exception from HRESULT: 0x8007007B)
At C:\projects\Reservoir Level Reporting\Reports\20150831 \Untitled5 .ps1:18 char:19
+ Start-BitsTransfer <<<< $sources $destinations -Prio Foreground # -Display $displayname
+ CategoryInfo : NotSpecified: (:) [Start-BitsTransfer], FileNotFoundException
+ FullyQualifiedErrorId : System.IO.FileNotFoundExce ption,Micr osoft.Back groundInte lligentTra nsfer.Mana gement.New BitsTransf erCommand
At C:\projects\Reservoir Level Reporting\Reports\20150831
+ Start-BitsTransfer <<<< $sources $destinations -Prio Foreground # -Display $displayname
+ CategoryInfo : NotSpecified: (:) [Start-BitsTransfer], FileNotFoundException
+ FullyQualifiedErrorId : System.IO.FileNotFoundExce
One alternative I have is to load a MHTML file. However this too is unsuccessful and again the images don't actually exist as files ... they're embedded in the file. The IMG source looks like this:
<IMG onerror="this.errored=true ;" BORDER="0" class="a25" SRC="cid:C_7iT0R0x0S0T0_1" />
Once loaded into IE, the properties of the image look like this:
mhtml:file://C:\projects\R eservoir Level Reporting\Reports\20150907 \Reservoir Storage Level Report.mhtml!cid:C_17iT1_1
Once again, the Bits-Transfer method cannot be used to copy the file since the physical file does not exist.
My next strategy will be to get the report output as a Word file, and try to extract the images from that.
Any other suggestions would be most welcome.
Correct, BITS needs static content, and the image sources are indeed dynamic reports.
I assume you are not able to post an example MHTML?
I assume you are not able to post an example MHTML?
ASKER
Qlemo, thanks for your help.
Attached is a sample MHTML file - the one used in the previous example but renamed.
Don't waste too much time on this as I've actually moved on to a method that involves using PowerShell to open the file using MSWord and perform a SaveAs to the HTML format (forcing the images to be saved as individual files). From there the script copies the individual files to the relevant external website using FTP. This is working ok but does have the drawback that it needs to run on a PC with Word installed but I can live with that at the moment.
Thanks again
sample.zip
Attached is a sample MHTML file - the one used in the previous example but renamed.
Don't waste too much time on this as I've actually moved on to a method that involves using PowerShell to open the file using MSWord and perform a SaveAs to the HTML format (forcing the images to be saved as individual files). From there the script copies the individual files to the relevant external website using FTP. This is working ok but does have the drawback that it needs to run on a PC with Word installed but I can live with that at the moment.
Thanks again
sample.zip
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Also, the way approaching how to use PS code is exactly what I would have used.
naleo96, if that is not sufficient to get the images, we will need an example mshtml file to try ourselves. Sadly, using and parsing HTML pages for particular info can get tricky sometimes.
The linked script downloads every image into the same folder, not keeping any folder hierarchy. Usually that should be fine, but not if you need to keep some source info.
What is still missing from that is the FTP upload, but you told that is not important ;-). Do you only upload the images? If you need the report itself to be transferred, you'll have to adapt pathes, of course.
Sadly it seems that you cannot use BITS with FTP, so you would have to do that using a System.Net.WebClient object if you want to run that in PS (or use a cmdline FTP tool for that).