Required an ASP.NET Control to OCR, highlight text and text comparison regarding PDF doc.

Posted on 2012-09-20
Last Modified: 2013-01-22

We need an ASP.NET control that can provide the following three features regarding PDF Documents ..

1. it can OCR the non-search PDF.
2. it can highlight the searched text within PDF (Text can be one word or phrase)
3. it can do the text comparison of two pdfs and can show the result in a newly generated PDF.

Considering our re-distributable product, we are not looking for a costly control here.
Question by:amazursky
    LVL 20

    Expert Comment

    Not sure but maybe you could take a look at the PDFBox library to access PDF files.
    LVL 1

    Accepted Solution


    To the best of my knowledge there is not a control that offers the functionality you are requesting.  You would have to build it out of the pieces that are available.

    The three features that you have listed are not typically something that is built into a control.

    The first feature: OCR the non-searchable Pdf can be implemented by using any of the OCR Libraries.  That said, ALL the OCR Libraries pretty much take a Pdf Page, convert the entire page to an image (remember a single page can be a mix of images and text, or all text or all imge), and then OCR it.

    You could use a free library like Tesseract (Downoad it here: tesseractdotnet_v301_r590), or a .Net wrapper library (Like

    Many of these libraries have the ability to save the image and the searchable text directly to a Pdf file.  That allows the Adobe control to do hit highlighting for you.

    Second feature: Control that does hit highlighting:  Adobe can do this for you, but there is a big difference between various hit highlights.  Full word, partial word, phrase, etc.  You are probably going to have to look at a bunch of different controls.

    Third Feature: Text compare two Pdf's.  I am not aware of any controls that do a compare of two Pdf files and highlight the differences.  Unlike say a Word or Text file, two completely different Pdf files could render to an almost identical visual picture on the screen.  If you are having to OCR the files it would be even worse.  I don't think you are gonig to find this pre-built!

    -- Michael.
    LVL 53

    Expert Comment

    This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Find Ransomware Secrets With All-Source Analysis

    Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

    In this article you will learn how to create a free basic website on Bitbucket, a git service provider. Polymer creates dynamic HTML components, which allow more flexibility than static HTML. This tutorial uses Ubuntu Linux but can also be done on W…
    This is about my first experience with programming Arduino.
    The purpose of this video is to demonstrate how to set up the WordPress backend so that each page automatically generates a Mailchimp signup form in the sidebar. This will be demonstrated using a Windows 8 PC. Tools Used are Photoshop, Awesome…
    In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

    761 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    8 Experts available now in Live!

    Get 1:1 Help Now