Required an ASP.NET Control to OCR, highlight text and text comparison regarding PDF doc.


We need an ASP.NET control that can provide the following three features regarding PDF Documents ..

1. it can OCR the non-search PDF.
2. it can highlight the searched text within PDF (Text can be one word or phrase)
3. it can do the text comparison of two pdfs and can show the result in a newly generated PDF.

Considering our re-distributable product, we are not looking for a costly control here.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Obadiah ChristopherCommented:
Not sure but maybe you could take a look at the PDFBox library to access PDF files.

To the best of my knowledge there is not a control that offers the functionality you are requesting.  You would have to build it out of the pieces that are available.

The three features that you have listed are not typically something that is built into a control.

The first feature: OCR the non-searchable Pdf can be implemented by using any of the OCR Libraries.  That said, ALL the OCR Libraries pretty much take a Pdf Page, convert the entire page to an image (remember a single page can be a mix of images and text, or all text or all imge), and then OCR it.

You could use a free library like Tesseract (Downoad it here: tesseractdotnet_v301_r590), or a .Net wrapper library (Like

Many of these libraries have the ability to save the image and the searchable text directly to a Pdf file.  That allows the Adobe control to do hit highlighting for you.

Second feature: Control that does hit highlighting:  Adobe can do this for you, but there is a big difference between various hit highlights.  Full word, partial word, phrase, etc.  You are probably going to have to look at a bunch of different controls.

Third Feature: Text compare two Pdf's.  I am not aware of any controls that do a compare of two Pdf files and highlight the differences.  Unlike say a Word or Text file, two completely different Pdf files could render to an almost identical visual picture on the screen.  If you are having to OCR the files it would be even worse.  I don't think you are gonig to find this pre-built!

-- Michael.

Experts Exchange Solution brought to you by ConnectWise

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.