compare text in txt and pdf files

I have some text in a text file and a pdf file.  I want to compare the files to see if the text is identical.  What's the best way to do this?  In reality there will be lots of files like this, so the question is an attempt to automate a manual process (eg manually copying both files to textpad and comparing them).
AlHal2Asked:
Who is Participating?
 
hjgodeConnect With a Mentor Commented:
This may work or not. As you know, sometimes texts in pdf are not continously. For Postscript (what is the bas of PDF) there is no really text, only letters. Especially, if you have a multi-column layout or pages with hanging indents or images, the 'text' may not be set continously in PDF.

But, if you are lucky, you can extract the text (possibly remove all spaces and paragraphs and line endings) and compare that to a text file (without any line endings and spaces).

I would use iTextSharp, here is an example on how to extract text from pdf: http://www.kunalpriyadarshi.com/2012/12/using-itextsharp-to-extract-text-from.html
0
 
AlHal2Author Commented:
thanks.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.