asked on

PDF Parser Itextsharp

The color depth 1 is not supported.

Stack Trace

at iTextSharp.text.pdf.parser.PdfImageObject.DecodeImageBytes()
at iTextSharp.text.pdf.parser.PdfImageObject..ctor(PdfDictionary dictionary, Byte[] samples, PdfDictionary colorSpaceDic)
at iTextSharp.text.pdf.parser.PdfImageObject..ctor(PRStream stream, PdfDictionary colorSpaceDic)
at iTextSharp.text.pdf.parser.ImageRenderInfo.PrepareImageObject()
at iTextSharp.text.pdf.parser.ImageRenderInfo.GetImage()
at PdfUtils.ImageRenderListener.RenderImage(ImageRenderInfo renderInfo) in D:\Inspro\Inspro Health\Code\InsproServices2\CombinePDConsole\RenderImage.cs:line 135
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ImageXObjectDoHandler.HandleXObject(PdfContentStreamProcessor processor, PdfStream xobjectStream, PdfIndirectReference refi)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.DisplayXObject(PdfName xobjectName)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.Do.Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.InvokeOperator(PdfLiteral oper, List`1 operands)
at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
at iTextSharp.text.pdf.parser.PdfReaderContentParser.ProcessContent[E](Int32 pageNumber, E renderListener)
at PdfUtils.PdfImageExtractor.ExtractImages(String filename) in D:\Inspro\Inspro Health\Code\InsproServices2\CombinePDConsole\RenderImage.cs:line 57
at CombinePDConsole.Program2.Main(String[] args) in D:\Inspro\Inspro Health\Code\InsproServices2\CombinePDConsole\Program2.cs:line 24
----------------------------------------------------------------------------------------------

can anybody suggest why this error is , and what is the solution of this kind of error .

Member_2_5069294

I guess colour depth 1 isn't supported by the parser, is that a problem? Can you find out what color depth the image uses? Or load it into a higher colour depth (perhaps 32) and then convert it to monochrome.

It may be that the PDF file has an image which is colour depth 1 and iTextSharp does not support it. It's hard to say because you've given very little information about what the program is doing at this point or what is in the PDF file.

BeyondBGCM

ASKER

this is the piece of code i am using to get image from a pdf files , and not getting it because of the errors we mentioned earlier.

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System;
using System.Collections.Generic;
using System.IO;

namespace PdfUtils
{
/// <summary>Helper class to extract images from a PDF file. Works with the most
/// common image types embedded in PDF files, as far as I can tell.</summary>
/// <example>
/// Usage example:
/// <code>
/// foreach (var filename in Directory.GetFiles(searchPath, “*.pdf”, SearchOption.TopDirectoryOnly))
/// {
/// var images = ImageExtractor.ExtractImages(filename);
/// var directory = Path.GetDirectoryName(filename);
///
/// foreach (var name in images.Keys)
/// {
/// images[name].Save(Path.Combine(directory, name));
/// }
/// }
/// </code></example>
public static class PdfImageExtractor
{
#region Methods
#region Public Methods

/// <summary>Checks whether a specified page of a PDF file contains images.</summary>
/// <returns>True if the page contains at least one image; false otherwise.</returns>
public static bool PageContainsImages(string filename, int pageNumber)
{
using (var reader = new PdfReader(filename))
{
var parser = new PdfReaderContentParser(reader);
ImageRenderListener listener = null;
parser.ProcessContent(pageNumber, (listener = new ImageRenderListener()));
return listener.Images.Count > 0;
}
}

/// <summary>Extracts all images (of types that iTextSharp knows how to decode) from a PDF file.</summary>
public static Dictionary<string, System.Drawing.Image> ExtractImages(string filename)
{
var images = new Dictionary<string, System.Drawing.Image>();

using (var reader = new PdfReader(filename))
{
var parser = new PdfReaderContentParser(reader);
ImageRenderListener listener = null;

for (var i = 1; i <= reader.NumberOfPages; i++)
{
parser.ProcessContent(i, (listener = new ImageRenderListener()));
var index = 1;

if (listener.Images.Count > 0)
{
Console.WriteLine("Found {0} images on page {1}.", listener.Images.Count, i);

foreach (var pair in listener.Images)
{
images.Add(string.Format("{0}_Page_{1}_Image_{2}{3}",
Path.GetFileNameWithoutExtension(filename), i.ToString("D4"), index.ToString("D4"), pair.Value), pair.Key);
index++;
}
}
}
return images;
}
}

/// <summary>Extracts all images (of types that iTextSharp knows how to decode)
/// from a specified page of a PDF file.</summary>
/// <returns>Returns a generic <see cref=”Dictionary<string, System.Drawing.Image>”/>,
/// where the key is a suggested file name, in the format: PDF filename without extension,
/// page number and image index in the page.</returns>
public static Dictionary<string, System.Drawing.Image> ExtractImages(string filename, int pageNumber)
{
Dictionary<string, System.Drawing.Image> images = new Dictionary<string, System.Drawing.Image>();
PdfReader reader = new PdfReader(filename);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
ImageRenderListener listener = null;

parser.ProcessContent(pageNumber, (listener = new ImageRenderListener()));
int index = 1;

if (listener.Images.Count > 0)
{
Console.WriteLine("Found {0} images on page {1}.", listener.Images.Count, pageNumber);

foreach (KeyValuePair<System.Drawing.Image, string> pair in listener.Images)
{
images.Add(string.Format("{0}_Page_{1}_Image_{2}{3}",
Path.GetFileNameWithoutExtension(filename), pageNumber.ToString("D4"), index.ToString("D4"), pair.Value), pair.Key);
index++;
}
}
return images;
}

#endregion Public Methods

#endregion Methods
}

internal class ImageRenderListener : IRenderListener
{
#region Fields

Dictionary<System.Drawing.Image, string> images = new Dictionary<System.Drawing.Image, string>();
#endregion Fields

#region Properties

public Dictionary<System.Drawing.Image, string> Images
{
get { return images; }
}
#endregion Properties

#region Methods

#region Public Methods

public void BeginTextBlock() { }

public void EndTextBlock() { }

public void RenderImage(ImageRenderInfo renderInfo)
{
PdfImageObject image = renderInfo.GetImage();
PdfName filter = (PdfName)image.Get(PdfName.FILTER);

//int width = Convert.ToInt32(image.Get(PdfName.WIDTH).ToString());
//int bitsPerComponent = Convert.ToInt32(image.Get(PdfName.BITSPERCOMPONENT).ToString());
//string subtype = image.Get(PdfName.SUBTYPE).ToString();
//int height = Convert.ToInt32(image.Get(PdfName.HEIGHT).ToString());
//int length = Convert.ToInt32(image.Get(PdfName.LENGTH).ToString());
//string colorSpace = image.Get(PdfName.COLORSPACE).ToString();

/* It appears to be safe to assume that when filter == null, PdfImageObject
* does not know how to decode the image to a System.Drawing.Image.
*
* Uncomment the code above to verify, but when I’ve seen this happen,
* width, height and bits per component all equal zero as well. */
if (filter != null)
{
System.Drawing.Image drawingImage = image.GetDrawingImage();

string extension = ".";

if (filter == PdfName.DCTDECODE)
{
extension += PdfImageObject.ImageBytesType.JPG.FileExtension;
}
else if (filter == PdfName.JPXDECODE)
{
extension += PdfImageObject.ImageBytesType.JP2.FileExtension;
}
else if (filter == PdfName.FLATEDECODE)
{
extension += PdfImageObject.ImageBytesType.PNG.FileExtension;
}
else if (filter == PdfName.LZWDECODE)
{
extension += PdfImageObject.ImageBytesType.CCITT.FileExtension;
}

/* Rather than struggle with the image stream and try to figure out how to handle
* BitMapData scan lines in various formats (like virtually every sample I’ve found
* online), use the PdfImageObject.GetDrawingImage() method, which does the work for us. */
this.Images.Add(drawingImage, extension);
}
}
public void RenderText(TextRenderInfo renderInfo) { }

#endregion Public Methods

#endregion Methods
}
}

Member_2_5069294

It's not working because it does not support images of bit depth 1 and the PDF file contains images of bit depth 1.

 else if (filter == PdfName.LZWDECODE)
{
    extension += PdfImageObject.ImageBytesType.CCITT.FileExtension;
}

Open in new window

This part of the code doesn't look right to me. LZW is used in GIF files and possibly TIF files. CCITT compression is used mostly for faxes (which are 1 bit deep). TIF is the only common format which supports CCITT compression.

BeyondBGCM

ASKER

ok, now what is the solution ....

Member_2_5069294

I can't tell you exactly because I don't know the details of iTextSharp. I can still offer some suggestions for how to find the problem.

Firstly, check if the PDF contains a 1 bit depth image (only black or white, no shades of grey). Secondly check the documention to see if PdfName.LZWDECODE should use a different extension. Purely by guess work, I would try PdfImageObject.ImageBytesType.GIF.FileExtension or PdfImageObject.ImageBytesType.TIF.FileExtension (but I doubt its the second option).

There may also be another value for PdfName, possibly CCITTDECODE which should use PdfImageObject.ImageBytesType.CCITT.FileExtension

ASKER CERTIFIED SOLUTION

BeyondBGCM

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

BeyondBGCM

ASKER

this is an excellent soultion , i found on google , and which provides a code completion to a long time pending code of PDFSharp