Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 290
  • Last Modified:

compare text in txt and pdf files

I have some text in a text file and a pdf file.  I want to compare the files to see if the text is identical.  What's the best way to do this?  In reality there will be lots of files like this, so the question is an attempt to automate a manual process (eg manually copying both files to textpad and comparing them).
0
AlHal2
Asked:
AlHal2
1 Solution
 
hjgodeCommented:
This may work or not. As you know, sometimes texts in pdf are not continously. For Postscript (what is the bas of PDF) there is no really text, only letters. Especially, if you have a multi-column layout or pages with hanging indents or images, the 'text' may not be set continously in PDF.

But, if you are lucky, you can extract the text (possibly remove all spaces and paragraphs and line endings) and compare that to a text file (without any line endings and spaces).

I would use iTextSharp, here is an example on how to extract text from pdf: http://www.kunalpriyadarshi.com/2012/12/using-itextsharp-to-extract-text-from.html
0
 
AlHal2Author Commented:
thanks.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now