Solved

OCR assisted solution!

Posted on 2011-03-25
4
328 Views
Last Modified: 2012-05-11
I have a unique need. I am trying to read a pdf that has unicode content. unfortunately when i tried to copy the text, some of the characters are not copied properly. Instead, the ascii value of few characters gets changed to 8-bit values.

Therefore, I want to develop a OCR assisted solution in VB.NET.

Is it possible to convert a PDF file to 2 arrays. A character array and an image array. Let all characters get populated in the character array and the subsequent image rectangle (as it looks in pdf) of every character gets populated in the image array. This will help me to develop a OCR assisted solution to extract text from a PDF, in case the usual text extraction method fails.

Also, please assist me on which library will be suitable for me to perform this development?

Thanks.
0
Comment
Question by:mrnags
4 Comments
 
LVL 18

Accepted Solution

by:
JoseParrot earned 250 total points
ID: 35233200
Take a look at
http://forums.gdpicture.com/gdpicture-imaging-sdk/
It supports PDF files, OCR and image processing as well. It is capable of extracting both text and images, also to render text to image. A lot of features.

Hope has the functions you need.

Jose
0
 
LVL 100

Expert Comment

by:mlmcc
ID: 36275171
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

PaperPort (http://www.nuance.com/for-individuals/by-product/paperport/index.htm) is among the most important applications that I run on my Windows computers. I use it every day, for nearly all of my document and photo scanning, as well as most of my…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
In this third video of the Xpdf series, we discuss and demonstrate the PDFtoText utility, which converts PDF files into plain text files. Download and install the software.: You may have already downloaded and installed the Xpdf tools while watching…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now