Avatar of fritzdsa
fritzdsa
 asked on

CCITTFaxDecode filter for Acrobat

I writing a app to extract text from a PDF file - I'm able to extract the text if the PDF file is compressed with Flatedecode - But it fails if the compression is CCITTFaxDecode. Can anyone help me to get the filter for CCITTFaxDecode compression. Thanks in advance
Adobe Acrobat

Avatar of undefined
Last Comment
Karl Heinz Kremer

8/22/2022 - Mon
Karl Heinz Kremer

Text should never be encoded with CCITTFaxDecode - that is a filter that is only useful for monochrome images (black dots on white paper or vice versa). Are you sure that you need that filter for text extraction?

What environment are you using? Are you using any PDF library (that would be the ideal situation, because you really do not want to write the complete PDF handling from scratch)?
fritzdsa

ASKER
I'm using the code found at http://www.codeproject.com/KB/cpp/ExtractPDFText.aspx
This code assumes that the PDF file has text objects compressed using FlateDecode filter and it uses zlib for uncompressing the streams.
But my PDF files have text behind image and the filter used is CCITTFaxDecode - Now I want to know how do I uncompress this filter. Thanks in advance
ASKER CERTIFIED SOLUTION
Karl Heinz Kremer

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
fritzdsa

ASKER
yes your clue to ignore the image helped as the pdf we were working stored text data a little differently and at the end of the file.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Karl Heinz Kremer

The order of information in a PDF file has no meaning. You need to read up on how the XRef table is used in PDF.