BeyondBGCM
asked on
PDF Parser Itextsharp
The color depth 1 is not supported.
Stack Trace
at iTextSharp.text.pdf.parser .PdfImageO bject.Deco deImageByt es()
at iTextSharp.text.pdf.parser .PdfImageO bject..cto r(PdfDicti onary dictionary, Byte[] samples, PdfDictionary colorSpaceDic)
at iTextSharp.text.pdf.parser .PdfImageO bject..cto r(PRStream stream, PdfDictionary colorSpaceDic)
at iTextSharp.text.pdf.parser .ImageRend erInfo.Pre pareImageO bject()
at iTextSharp.text.pdf.parser .ImageRend erInfo.Get Image()
at PdfUtils.ImageRenderListen er.RenderI mage(Image RenderInfo renderInfo) in D:\Inspro\Inspro Health\Code\InsproServices 2\CombineP DConsole\R enderImage .cs:line 135
at iTextSharp.text.pdf.parser .PdfConten tStreamPro cessor.Ima geXObjectD oHandler.H andleXObje ct(PdfCont entStreamP rocessor processor, PdfStream xobjectStream, PdfIndirectReference refi)
at iTextSharp.text.pdf.parser .PdfConten tStreamPro cessor.Dis playXObjec t(PdfName xobjectName)
at iTextSharp.text.pdf.parser .PdfConten tStreamPro cessor.Do. Invoke(Pdf ContentStr eamProcess or processor, PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser .PdfConten tStreamPro cessor.Inv okeOperato r(PdfLiter al oper, List`1 operands)
at iTextSharp.text.pdf.parser .PdfConten tStreamPro cessor.Pro cessConten t(Byte[] contentBytes, PdfDictionary resources)
at iTextSharp.text.pdf.parser .PdfReader ContentPar ser.Proces sContent[E ](Int32 pageNumber, E renderListener)
at PdfUtils.PdfImageExtractor .ExtractIm ages(Strin g filename) in D:\Inspro\Inspro Health\Code\InsproServices 2\CombineP DConsole\R enderImage .cs:line 57
at CombinePDConsole.Program2. Main(Strin g[] args) in D:\Inspro\Inspro Health\Code\InsproServices 2\CombineP DConsole\P rogram2.cs :line 24
-------------------------- ---------- ---------- ---------- ---------- ---------- ---------- --------
can anybody suggest why this error is , and what is the solution of this kind of error .
Stack Trace
at iTextSharp.text.pdf.parser
at iTextSharp.text.pdf.parser
at iTextSharp.text.pdf.parser
at iTextSharp.text.pdf.parser
at iTextSharp.text.pdf.parser
at PdfUtils.ImageRenderListen
at iTextSharp.text.pdf.parser
at iTextSharp.text.pdf.parser
at iTextSharp.text.pdf.parser
at iTextSharp.text.pdf.parser
at iTextSharp.text.pdf.parser
at iTextSharp.text.pdf.parser
at PdfUtils.PdfImageExtractor
at CombinePDConsole.Program2.
--------------------------
can anybody suggest why this error is , and what is the solution of this kind of error .
ASKER
this is the piece of code i am using to get image from a pdf files , and not getting it because of the errors we mentioned earlier.
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser ;
using System;
using System.Collections.Generic ;
using System.IO;
namespace PdfUtils
{
/// <summary>Helper class to extract images from a PDF file. Works with the most
/// common image types embedded in PDF files, as far as I can tell.</summary>
/// <example>
/// Usage example:
/// <code>
/// foreach (var filename in Directory.GetFiles(searchP ath, “*.pdf”, SearchOption.TopDirectoryO nly))
/// {
/// var images = ImageExtractor.ExtractImag es(filenam e);
/// var directory = Path.GetDirectoryName(file name);
///
/// foreach (var name in images.Keys)
/// {
/// images[name].Save(Path.Com bine(direc tory, name));
/// }
/// }
/// </code></example>
public static class PdfImageExtractor
{
#region Methods
#region Public Methods
/// <summary>Checks whether a specified page of a PDF file contains images.</summary>
/// <returns>True if the page contains at least one image; false otherwise.</returns>
public static bool PageContainsImages(string filename, int pageNumber)
{
using (var reader = new PdfReader(filename))
{
var parser = new PdfReaderContentParser(rea der);
ImageRenderListener listener = null;
parser.ProcessContent(page Number, (listener = new ImageRenderListener()));
return listener.Images.Count > 0;
}
}
/// <summary>Extracts all images (of types that iTextSharp knows how to decode) from a PDF file.</summary>
public static Dictionary<string, System.Drawing.Image> ExtractImages(string filename)
{
var images = new Dictionary<string, System.Drawing.Image>();
using (var reader = new PdfReader(filename))
{
var parser = new PdfReaderContentParser(rea der);
ImageRenderListener listener = null;
for (var i = 1; i <= reader.NumberOfPages; i++)
{
parser.ProcessContent(i, (listener = new ImageRenderListener()));
var index = 1;
if (listener.Images.Count > 0)
{
Console.WriteLine("Found {0} images on page {1}.", listener.Images.Count, i);
foreach (var pair in listener.Images)
{
images.Add(string.Format(" {0}_Page_{ 1}_Image_{ 2}{3}",
Path.GetFileNameWithoutExt ension(fil ename), i.ToString("D4"), index.ToString("D4"), pair.Value), pair.Key);
index++;
}
}
}
return images;
}
}
/// <summary>Extracts all images (of types that iTextSharp knows how to decode)
/// from a specified page of a PDF file.</summary>
/// <returns>Returns a generic <see cref=”Dictionary<string , System.Drawing.Image>”/ >,
/// where the key is a suggested file name, in the format: PDF filename without extension,
/// page number and image index in the page.</returns>
public static Dictionary<string, System.Drawing.Image> ExtractImages(string filename, int pageNumber)
{
Dictionary<string, System.Drawing.Image> images = new Dictionary<string, System.Drawing.Image>();
PdfReader reader = new PdfReader(filename);
PdfReaderContentParser parser = new PdfReaderContentParser(rea der);
ImageRenderListener listener = null;
parser.ProcessContent(page Number, (listener = new ImageRenderListener()));
int index = 1;
if (listener.Images.Count > 0)
{
Console.WriteLine("Found {0} images on page {1}.", listener.Images.Count, pageNumber);
foreach (KeyValuePair<System.Drawi ng.Image, string> pair in listener.Images)
{
images.Add(string.Format(" {0}_Page_{ 1}_Image_{ 2}{3}",
Path.GetFileNameWithoutExt ension(fil ename), pageNumber.ToString("D4"), index.ToString("D4"), pair.Value), pair.Key);
index++;
}
}
return images;
}
#endregion Public Methods
#endregion Methods
}
internal class ImageRenderListener : IRenderListener
{
#region Fields
Dictionary<System.Drawing. Image, string> images = new Dictionary<System.Drawing. Image, string>();
#endregion Fields
#region Properties
public Dictionary<System.Drawing. Image, string> Images
{
get { return images; }
}
#endregion Properties
#region Methods
#region Public Methods
public void BeginTextBlock() { }
public void EndTextBlock() { }
public void RenderImage(ImageRenderInf o renderInfo)
{
PdfImageObject image = renderInfo.GetImage();
PdfName filter = (PdfName)image.Get(PdfName .FILTER);
//int width = Convert.ToInt32(image.Get( PdfName.WI DTH).ToStr ing());
//int bitsPerComponent = Convert.ToInt32(image.Get( PdfName.BI TSPERCOMPO NENT).ToSt ring());
//string subtype = image.Get(PdfName.SUBTYPE) .ToString( );
//int height = Convert.ToInt32(image.Get( PdfName.HE IGHT).ToSt ring());
//int length = Convert.ToInt32(image.Get( PdfName.LE NGTH).ToSt ring());
//string colorSpace = image.Get(PdfName.COLORSPA CE).ToStri ng();
/* It appears to be safe to assume that when filter == null, PdfImageObject
* does not know how to decode the image to a System.Drawing.Image.
*
* Uncomment the code above to verify, but when I’ve seen this happen,
* width, height and bits per component all equal zero as well. */
if (filter != null)
{
System.Drawing.Image drawingImage = image.GetDrawingImage();
string extension = ".";
if (filter == PdfName.DCTDECODE)
{
extension += PdfImageObject.ImageBytesT ype.JPG.Fi leExtensio n;
}
else if (filter == PdfName.JPXDECODE)
{
extension += PdfImageObject.ImageBytesT ype.JP2.Fi leExtensio n;
}
else if (filter == PdfName.FLATEDECODE)
{
extension += PdfImageObject.ImageBytesT ype.PNG.Fi leExtensio n;
}
else if (filter == PdfName.LZWDECODE)
{
extension += PdfImageObject.ImageBytesT ype.CCITT. FileExtens ion;
}
/* Rather than struggle with the image stream and try to figure out how to handle
* BitMapData scan lines in various formats (like virtually every sample I’ve found
* online), use the PdfImageObject.GetDrawingI mage() method, which does the work for us. */
this.Images.Add(drawingIma ge, extension);
}
}
public void RenderText(TextRenderInfo renderInfo) { }
#endregion Public Methods
#endregion Methods
}
}
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser
using System;
using System.Collections.Generic
using System.IO;
namespace PdfUtils
{
/// <summary>Helper class to extract images from a PDF file. Works with the most
/// common image types embedded in PDF files, as far as I can tell.</summary>
/// <example>
/// Usage example:
/// <code>
/// foreach (var filename in Directory.GetFiles(searchP
/// {
/// var images = ImageExtractor.ExtractImag
/// var directory = Path.GetDirectoryName(file
///
/// foreach (var name in images.Keys)
/// {
/// images[name].Save(Path.Com
/// }
/// }
/// </code></example>
public static class PdfImageExtractor
{
#region Methods
#region Public Methods
/// <summary>Checks whether a specified page of a PDF file contains images.</summary>
/// <returns>True if the page contains at least one image; false otherwise.</returns>
public static bool PageContainsImages(string filename, int pageNumber)
{
using (var reader = new PdfReader(filename))
{
var parser = new PdfReaderContentParser(rea
ImageRenderListener listener = null;
parser.ProcessContent(page
return listener.Images.Count > 0;
}
}
/// <summary>Extracts all images (of types that iTextSharp knows how to decode) from a PDF file.</summary>
public static Dictionary<string, System.Drawing.Image> ExtractImages(string filename)
{
var images = new Dictionary<string, System.Drawing.Image>();
using (var reader = new PdfReader(filename))
{
var parser = new PdfReaderContentParser(rea
ImageRenderListener listener = null;
for (var i = 1; i <= reader.NumberOfPages; i++)
{
parser.ProcessContent(i, (listener = new ImageRenderListener()));
var index = 1;
if (listener.Images.Count > 0)
{
Console.WriteLine("Found {0} images on page {1}.", listener.Images.Count, i);
foreach (var pair in listener.Images)
{
images.Add(string.Format("
Path.GetFileNameWithoutExt
index++;
}
}
}
return images;
}
}
/// <summary>Extracts all images (of types that iTextSharp knows how to decode)
/// from a specified page of a PDF file.</summary>
/// <returns>Returns a generic <see cref=”Dictionary<string
/// where the key is a suggested file name, in the format: PDF filename without extension,
/// page number and image index in the page.</returns>
public static Dictionary<string, System.Drawing.Image> ExtractImages(string filename, int pageNumber)
{
Dictionary<string, System.Drawing.Image> images = new Dictionary<string, System.Drawing.Image>();
PdfReader reader = new PdfReader(filename);
PdfReaderContentParser parser = new PdfReaderContentParser(rea
ImageRenderListener listener = null;
parser.ProcessContent(page
int index = 1;
if (listener.Images.Count > 0)
{
Console.WriteLine("Found {0} images on page {1}.", listener.Images.Count, pageNumber);
foreach (KeyValuePair<System.Drawi
{
images.Add(string.Format("
Path.GetFileNameWithoutExt
index++;
}
}
return images;
}
#endregion Public Methods
#endregion Methods
}
internal class ImageRenderListener : IRenderListener
{
#region Fields
Dictionary<System.Drawing.
#endregion Fields
#region Properties
public Dictionary<System.Drawing.
{
get { return images; }
}
#endregion Properties
#region Methods
#region Public Methods
public void BeginTextBlock() { }
public void EndTextBlock() { }
public void RenderImage(ImageRenderInf
{
PdfImageObject image = renderInfo.GetImage();
PdfName filter = (PdfName)image.Get(PdfName
//int width = Convert.ToInt32(image.Get(
//int bitsPerComponent = Convert.ToInt32(image.Get(
//string subtype = image.Get(PdfName.SUBTYPE)
//int height = Convert.ToInt32(image.Get(
//int length = Convert.ToInt32(image.Get(
//string colorSpace = image.Get(PdfName.COLORSPA
/* It appears to be safe to assume that when filter == null, PdfImageObject
* does not know how to decode the image to a System.Drawing.Image.
*
* Uncomment the code above to verify, but when I’ve seen this happen,
* width, height and bits per component all equal zero as well. */
if (filter != null)
{
System.Drawing.Image drawingImage = image.GetDrawingImage();
string extension = ".";
if (filter == PdfName.DCTDECODE)
{
extension += PdfImageObject.ImageBytesT
}
else if (filter == PdfName.JPXDECODE)
{
extension += PdfImageObject.ImageBytesT
}
else if (filter == PdfName.FLATEDECODE)
{
extension += PdfImageObject.ImageBytesT
}
else if (filter == PdfName.LZWDECODE)
{
extension += PdfImageObject.ImageBytesT
}
/* Rather than struggle with the image stream and try to figure out how to handle
* BitMapData scan lines in various formats (like virtually every sample I’ve found
* online), use the PdfImageObject.GetDrawingI
this.Images.Add(drawingIma
}
}
public void RenderText(TextRenderInfo renderInfo) { }
#endregion Public Methods
#endregion Methods
}
}
It's not working because it does not support images of bit depth 1 and the PDF file contains images of bit depth 1.
else if (filter == PdfName.LZWDECODE)
{
extension += PdfImageObject.ImageBytesType.CCITT.FileExtension;
}
This part of the code doesn't look right to me. LZW is used in GIF files and possibly TIF files. CCITT compression is used mostly for faxes (which are 1 bit deep). TIF is the only common format which supports CCITT compression.
ASKER
ok, now what is the solution ....
I can't tell you exactly because I don't know the details of iTextSharp. I can still offer some suggestions for how to find the problem.
Firstly, check if the PDF contains a 1 bit depth image (only black or white, no shades of grey). Secondly check the documention to see if PdfName.LZWDECODE should use a different extension. Purely by guess work, I would try PdfImageObject.ImageBytesT ype.GIF.Fi leExtensio n or PdfImageObject.ImageBytesT ype.TIF.Fi leExtensio n (but I doubt its the second option).
There may also be another value for PdfName, possibly CCITTDECODE which should use PdfImageObject.ImageBytesT ype.CCITT. FileExtens ion
Firstly, check if the PDF contains a 1 bit depth image (only black or white, no shades of grey). Secondly check the documention to see if PdfName.LZWDECODE should use a different extension. Purely by guess work, I would try PdfImageObject.ImageBytesT
There may also be another value for PdfName, possibly CCITTDECODE which should use PdfImageObject.ImageBytesT
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
this is an excellent soultion , i found on google , and which provides a code completion to a long time pending code of PDFSharp
It may be that the PDF file has an image which is colour depth 1 and iTextSharp does not support it. It's hard to say because you've given very little information about what the program is doing at this point or what is in the PDF file.