Solved

PDF Parser Itextsharp

Posted on 2014-03-07
8
3,065 Views
Last Modified: 2014-03-20
The color depth 1 is not supported.

Stack Trace


  at iTextSharp.text.pdf.parser.PdfImageObject.DecodeImageBytes()
   at iTextSharp.text.pdf.parser.PdfImageObject..ctor(PdfDictionary dictionary, Byte[] samples, PdfDictionary colorSpaceDic)
   at iTextSharp.text.pdf.parser.PdfImageObject..ctor(PRStream stream, PdfDictionary colorSpaceDic)
   at iTextSharp.text.pdf.parser.ImageRenderInfo.PrepareImageObject()
   at iTextSharp.text.pdf.parser.ImageRenderInfo.GetImage()
   at PdfUtils.ImageRenderListener.RenderImage(ImageRenderInfo renderInfo) in D:\Inspro\Inspro Health\Code\InsproServices2\CombinePDConsole\RenderImage.cs:line 135
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ImageXObjectDoHandler.HandleXObject(PdfContentStreamProcessor processor, PdfStream xobjectStream, PdfIndirectReference refi)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.DisplayXObject(PdfName xobjectName)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.Do.Invoke(PdfContentStreamProcessor processor, PdfLiteral oper, List`1 operands)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.InvokeOperator(PdfLiteral oper, List`1 operands)
   at iTextSharp.text.pdf.parser.PdfContentStreamProcessor.ProcessContent(Byte[] contentBytes, PdfDictionary resources)
   at iTextSharp.text.pdf.parser.PdfReaderContentParser.ProcessContent[E](Int32 pageNumber, E renderListener)
   at PdfUtils.PdfImageExtractor.ExtractImages(String filename) in D:\Inspro\Inspro Health\Code\InsproServices2\CombinePDConsole\RenderImage.cs:line 57
   at CombinePDConsole.Program2.Main(String[] args) in D:\Inspro\Inspro Health\Code\InsproServices2\CombinePDConsole\Program2.cs:line 24
----------------------------------------------------------------------------------------------

can anybody suggest why this error is , and what is the solution of this kind of error .
0
Comment
Question by:BeyondBGCM
  • 4
  • 3
8 Comments
 
LVL 12

Expert Comment

by:satsumo
ID: 39914653
I guess colour depth 1 isn't supported by the parser, is that a problem? Can you find out what color depth the image uses? Or load it into a higher colour depth (perhaps 32) and then convert it to monochrome.

It may be that the PDF file has an image which is colour depth 1 and iTextSharp does not support it. It's hard to say because you've given very little information about what the program is doing at this point or what is in the PDF file.
0
 

Author Comment

by:BeyondBGCM
ID: 39916833
this is the piece of code i am using to get image from a pdf files , and not getting it because of the errors we mentioned earlier.

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System;
using System.Collections.Generic;
using System.IO;
 
namespace PdfUtils
{
  /// <summary>Helper class to extract images from a PDF file. Works with the most
   /// common image types embedded in PDF files, as far as I can tell.</summary>
   /// <example>
   /// Usage example:
   /// <code>
   /// foreach (var filename in Directory.GetFiles(searchPath, “*.pdf”, SearchOption.TopDirectoryOnly))
   /// {
   ///    var images = ImageExtractor.ExtractImages(filename);
   ///    var directory = Path.GetDirectoryName(filename);
   ///
   ///    foreach (var name in images.Keys)
   ///    {
   ///       images[name].Save(Path.Combine(directory, name));
   ///    }
   ///  }
   /// </code></example>
   public static class PdfImageExtractor
   {
       #region Methods
       #region Public Methods

       /// <summary>Checks whether a specified page of a PDF file contains images.</summary>
       /// <returns>True if the page contains at least one image; false otherwise.</returns>
       public static bool PageContainsImages(string filename, int pageNumber)
       {
           using (var reader = new PdfReader(filename))
           {
               var parser = new PdfReaderContentParser(reader);
               ImageRenderListener listener = null;
               parser.ProcessContent(pageNumber, (listener = new ImageRenderListener()));
               return listener.Images.Count > 0;
           }
       }

       /// <summary>Extracts all images (of types that iTextSharp knows how to decode) from a PDF file.</summary>
       public static Dictionary<string, System.Drawing.Image> ExtractImages(string filename)
       {
           var images = new Dictionary<string, System.Drawing.Image>();

           using (var reader = new PdfReader(filename))
           {
               var parser = new PdfReaderContentParser(reader);
               ImageRenderListener listener = null;

               for (var i = 1; i <= reader.NumberOfPages; i++)
               {
                   parser.ProcessContent(i, (listener = new ImageRenderListener()));
                   var index = 1;

                   if (listener.Images.Count > 0)
                   {
                       Console.WriteLine("Found {0} images on page {1}.", listener.Images.Count, i);

                       foreach (var pair in listener.Images)
                       {
                           images.Add(string.Format("{0}_Page_{1}_Image_{2}{3}",
                               Path.GetFileNameWithoutExtension(filename), i.ToString("D4"), index.ToString("D4"), pair.Value), pair.Key);
                           index++;
                       }
                   }
               }
               return images;
           }
       }

       /// <summary>Extracts all images (of types that iTextSharp knows how to decode)
       /// from a specified page of a PDF file.</summary>
       /// <returns>Returns a generic <see cref=”Dictionary&lt;string, System.Drawing.Image&gt;”/>,
       /// where the key is a suggested file name, in the format: PDF filename without extension,
       /// page number and image index in the page.</returns>
       public static Dictionary<string, System.Drawing.Image> ExtractImages(string filename, int pageNumber)
       {
           Dictionary<string, System.Drawing.Image> images = new Dictionary<string, System.Drawing.Image>();
           PdfReader reader = new PdfReader(filename);
           PdfReaderContentParser parser = new PdfReaderContentParser(reader);
           ImageRenderListener listener = null;

           parser.ProcessContent(pageNumber, (listener = new ImageRenderListener()));
           int index = 1;

           if (listener.Images.Count > 0)
           {
               Console.WriteLine("Found {0} images on page {1}.", listener.Images.Count, pageNumber);

               foreach (KeyValuePair<System.Drawing.Image, string> pair in listener.Images)
               {
                   images.Add(string.Format("{0}_Page_{1}_Image_{2}{3}",
                       Path.GetFileNameWithoutExtension(filename), pageNumber.ToString("D4"), index.ToString("D4"), pair.Value), pair.Key);
                   index++;
               }
           }
            return images;
        }

        #endregion Public Methods

        #endregion Methods
    }

    internal class ImageRenderListener : IRenderListener
    {
        #region Fields

        Dictionary<System.Drawing.Image, string> images = new Dictionary<System.Drawing.Image, string>();
        #endregion Fields

        #region Properties

        public Dictionary<System.Drawing.Image, string> Images
        {
            get { return images; }
        }
        #endregion Properties

        #region Methods

        #region Public Methods

        public void BeginTextBlock() { }

        public void EndTextBlock() { }

        public void RenderImage(ImageRenderInfo renderInfo)
        {
            PdfImageObject image = renderInfo.GetImage();
            PdfName filter = (PdfName)image.Get(PdfName.FILTER);
 
            //int width = Convert.ToInt32(image.Get(PdfName.WIDTH).ToString());
            //int bitsPerComponent = Convert.ToInt32(image.Get(PdfName.BITSPERCOMPONENT).ToString());
            //string subtype = image.Get(PdfName.SUBTYPE).ToString();
            //int height = Convert.ToInt32(image.Get(PdfName.HEIGHT).ToString());
            //int length = Convert.ToInt32(image.Get(PdfName.LENGTH).ToString());
            //string colorSpace = image.Get(PdfName.COLORSPACE).ToString();

            /* It appears to be safe to assume that when filter == null, PdfImageObject
             * does not know how to decode the image to a System.Drawing.Image.
             *
             * Uncomment the code above to verify, but when I’ve seen this happen,
             * width, height and bits per component all equal zero as well. */
            if (filter != null)
            {
                System.Drawing.Image drawingImage = image.GetDrawingImage();

                string extension = ".";

                if (filter == PdfName.DCTDECODE)
                {
                    extension += PdfImageObject.ImageBytesType.JPG.FileExtension;
                }
                else if (filter == PdfName.JPXDECODE)
                {
                    extension += PdfImageObject.ImageBytesType.JP2.FileExtension;
                }
                else if (filter == PdfName.FLATEDECODE)
                {
                    extension += PdfImageObject.ImageBytesType.PNG.FileExtension;
                }
                else if (filter == PdfName.LZWDECODE)
                {
                    extension += PdfImageObject.ImageBytesType.CCITT.FileExtension;
                }

                /* Rather than struggle with the image stream and try to figure out how to handle
                 * BitMapData scan lines in various formats (like virtually every sample I’ve found
                 * online), use the PdfImageObject.GetDrawingImage() method, which does the work for us. */
               this.Images.Add(drawingImage, extension);
           }
        }
        public void RenderText(TextRenderInfo renderInfo) { }

        #endregion Public Methods

        #endregion Methods
    }
}
0
 
LVL 12

Expert Comment

by:satsumo
ID: 39920164
It's not working because it does not support images of bit depth 1 and the PDF file contains images of bit depth 1.

 else if (filter == PdfName.LZWDECODE)
{
    extension += PdfImageObject.ImageBytesType.CCITT.FileExtension;
} 

Open in new window

This part of the code doesn't look right to me. LZW is used in GIF files and possibly TIF files. CCITT compression is used mostly for faxes (which are 1 bit deep). TIF is the only common format which supports CCITT compression.
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 

Author Comment

by:BeyondBGCM
ID: 39921569
ok, now what is the solution ....
0
 
LVL 12

Expert Comment

by:satsumo
ID: 39921692
I can't tell you exactly because I don't know the details of iTextSharp. I can still offer some suggestions for how to find the problem.

Firstly, check if the PDF contains a 1 bit depth image (only black or white, no shades of grey). Secondly check the documention to see if PdfName.LZWDECODE should use a different extension. Purely by guess work, I would try PdfImageObject.ImageBytesType.GIF.FileExtension or PdfImageObject.ImageBytesType.TIF.FileExtension (but I doubt its the second option).

There may also be another value for PdfName, possibly CCITTDECODE which should use PdfImageObject.ImageBytesType.CCITT.FileExtension
0
 

Accepted Solution

by:
BeyondBGCM earned 0 total points
ID: 39927333
below is the solution to above problem,, i got it from google, you can close this thread

 PdfSharp.Pdf.Filters.FlateDecode flate = new PdfSharp.Pdf.Filters.FlateDecode();
            byte[] decodedBytes = flate.Decode(image.Stream.Value);

            System.Drawing.Imaging.PixelFormat pixelFormat;

            switch (bitsPerComponent)
            {
                case 1:
                    pixelFormat = PixelFormat.Format1bppIndexed;
                    break;
                case 8:
                    pixelFormat = PixelFormat.Format8bppIndexed;
                    break;
                case 24:
                    pixelFormat = PixelFormat.Format24bppRgb;
                    break;
                default:
                    throw new Exception("Unknown pixel format " + bitsPerComponent);
            }

            Bitmap bmp = new Bitmap(width, height, pixelFormat);
            var bmpData = bmp.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.WriteOnly, pixelFormat);
            int length = (int)Math.Ceiling(width * bitsPerComponent / 8.0);
            for (int i = 0; i < height; i++)
            {
                int offset = i * length;
                int scanOffset = i * bmpData.Stride;
                Marshal.Copy(decodedBytes, offset, new IntPtr(bmpData.Scan0.ToInt32() + scanOffset), length);
            }
            bmp.UnlockBits(bmpData);
            using (FileStream fs = new FileStream(@"D:\BeyondBGCM\BeyondBGCM\PDFToWord\" + String.Format("Image{0}.png", count++), FileMode.Create, FileAccess.Write))
            {
                bmp.Save(fs, System.Drawing.Imaging.ImageFormat.Png);
            }
0
 

Author Closing Comment

by:BeyondBGCM
ID: 39941777
this is an excellent soultion , i found on google , and which provides a code completion to a long time pending code of PDFSharp
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
asp.net bundle 8 34
Get String split 5 31
Different Delete Messages 7 10
Red error squiggly on vb.net 7 0
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
If you’re thinking to yourself “That description sounds a lot like two people doing the work that one could accomplish,” you’re not alone.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now