Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium


Required an ASP.NET Control to OCR, highlight text and text comparison regarding PDF doc.

Posted on 2012-09-20
Medium Priority
Last Modified: 2013-01-22

We need an ASP.NET control that can provide the following three features regarding PDF Documents ..

1. it can OCR the non-search PDF.
2. it can highlight the searched text within PDF (Text can be one word or phrase)
3. it can do the text comparison of two pdfs and can show the result in a newly generated PDF.

Considering our re-distributable product, we are not looking for a costly control here.
Question by:amazursky
LVL 20

Expert Comment

ID: 38424321
Not sure but maybe you could take a look at the PDFBox library to access PDF files.


Accepted Solution

mjdeale earned 2000 total points
ID: 38446169

To the best of my knowledge there is not a control that offers the functionality you are requesting.  You would have to build it out of the pieces that are available.

The three features that you have listed are not typically something that is built into a control.

The first feature: OCR the non-searchable Pdf can be implemented by using any of the OCR Libraries.  That said, ALL the OCR Libraries pretty much take a Pdf Page, convert the entire page to an image (remember a single page can be a mix of images and text, or all text or all imge), and then OCR it.

You could use a free library like Tesseract (Downoad it here: tesseractdotnet_v301_r590), or a .Net wrapper library (Like www.crossfieldsoftware.com).

Many of these libraries have the ability to save the image and the searchable text directly to a Pdf file.  That allows the Adobe control to do hit highlighting for you.

Second feature: Control that does hit highlighting:  Adobe can do this for you, but there is a big difference between various hit highlights.  Full word, partial word, phrase, etc.  You are probably going to have to look at a bunch of different controls.

Third Feature: Text compare two Pdf's.  I am not aware of any controls that do a compare of two Pdf files and highlight the differences.  Unlike say a Word or Text file, two completely different Pdf files could render to an almost identical visual picture on the screen.  If you are having to OCR the files it would be even worse.  I don't think you are gonig to find this pre-built!

-- Michael.
LVL 53

Expert Comment

ID: 38804858
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.

Featured Post

Become an Android App Developer

Ready to kick start your career in 2018? Learn how to build an Android app in January’s Course of the Month and open the door to new opportunities.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
An introduction to basic programming syntax in Java by creating a simple program. Viewers can follow the tutorial as they create their first class in Java. Definitions and explanations about each element are given to help prepare viewers for future …
Loops Section Overview

571 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question