OCR assisted solution!
Posted on 2011-03-25
I have a unique need. I am trying to read a pdf that has unicode content. unfortunately when i tried to copy the text, some of the characters are not copied properly. Instead, the ascii value of few characters gets changed to 8-bit values.
Therefore, I want to develop a OCR assisted solution in VB.NET.
Is it possible to convert a PDF file to 2 arrays. A character array and an image array. Let all characters get populated in the character array and the subsequent image rectangle (as it looks in pdf) of every character gets populated in the image array. This will help me to develop a OCR assisted solution to extract text from a PDF, in case the usual text extraction method fails.
Also, please assist me on which library will be suitable for me to perform this development?