• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 336
  • Last Modified:

compare text in txt and pdf files

I have some text in a text file and a pdf file.  I want to compare the files to see if the text is identical.  What's the best way to do this?  In reality there will be lots of files like this, so the question is an attempt to automate a manual process (eg manually copying both files to textpad and comparing them).
0
AlHal2
Asked:
AlHal2
1 Solution
 
hjgodeCommented:
This may work or not. As you know, sometimes texts in pdf are not continously. For Postscript (what is the bas of PDF) there is no really text, only letters. Especially, if you have a multi-column layout or pages with hanging indents or images, the 'text' may not be set continously in PDF.

But, if you are lucky, you can extract the text (possibly remove all spaces and paragraphs and line endings) and compare that to a text file (without any line endings and spaces).

I would use iTextSharp, here is an example on how to extract text from pdf: http://www.kunalpriyadarshi.com/2012/12/using-itextsharp-to-extract-text-from.html
0
 
AlHal2Author Commented:
thanks.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now