Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

OCR assisted solution!

Posted on 2011-03-25
4
354 Views
Last Modified: 2012-05-11
I have a unique need. I am trying to read a pdf that has unicode content. unfortunately when i tried to copy the text, some of the characters are not copied properly. Instead, the ascii value of few characters gets changed to 8-bit values.

Therefore, I want to develop a OCR assisted solution in VB.NET.

Is it possible to convert a PDF file to 2 arrays. A character array and an image array. Let all characters get populated in the character array and the subsequent image rectangle (as it looks in pdf) of every character gets populated in the image array. This will help me to develop a OCR assisted solution to extract text from a PDF, in case the usual text extraction method fails.

Also, please assist me on which library will be suitable for me to perform this development?

Thanks.
0
Comment
Question by:mrnags
4 Comments
 
LVL 18

Accepted Solution

by:
JoseParrot earned 250 total points
ID: 35233200
Take a look at
http://forums.gdpicture.com/gdpicture-imaging-sdk/
It supports PDF files, OCR and image processing as well. It is capable of extracting both text and images, also to render text to image. A lot of features.

Hope has the functions you need.

Jose
0
 
LVL 100

Expert Comment

by:mlmcc
ID: 36275171
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article discusses the PaperPort 14 Scanner Connection Tool, which Nuance provides at no charge in order to fix scanning problems in Windows 8. Furthermore, users of PaperPort 14 in Windows 7 and Windows 10 have reported that the tool works in t…
If you need to start windows update installation remotely or as a scheduled task you will find this very helpful.
In this video, we show how to perform Bates Numbering/Stamping of PDF documents using Power PDF Advanced, the newest product from the Document Imaging division of Nuance Communications. There are two editions of Power PDF — Standard and Advanced. Th…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question