OCR assisted solution!

I have a unique need. I am trying to read a pdf that has unicode content. unfortunately when i tried to copy the text, some of the characters are not copied properly. Instead, the ascii value of few characters gets changed to 8-bit values.

Therefore, I want to develop a OCR assisted solution in VB.NET.

Is it possible to convert a PDF file to 2 arrays. A character array and an image array. Let all characters get populated in the character array and the subsequent image rectangle (as it looks in pdf) of every character gets populated in the image array. This will help me to develop a OCR assisted solution to extract text from a PDF, in case the usual text extraction method fails.

Also, please assist me on which library will be suitable for me to perform this development?

Thanks.
mrnagsAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Jose ParrotConnect With a Mentor Graphics ExpertCommented:
Take a look at
http://forums.gdpicture.com/gdpicture-imaging-sdk/
It supports PDF files, OCR and image processing as well. It is capable of extracting both text and images, also to render text to image. A lot of features.

Hope has the functions you need.

Jose
0
 
mlmccCommented:
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.