Solved

OCR assisted solution!

Posted on 2011-03-25
4
362 Views
Last Modified: 2012-05-11
I have a unique need. I am trying to read a pdf that has unicode content. unfortunately when i tried to copy the text, some of the characters are not copied properly. Instead, the ascii value of few characters gets changed to 8-bit values.

Therefore, I want to develop a OCR assisted solution in VB.NET.

Is it possible to convert a PDF file to 2 arrays. A character array and an image array. Let all characters get populated in the character array and the subsequent image rectangle (as it looks in pdf) of every character gets populated in the image array. This will help me to develop a OCR assisted solution to extract text from a PDF, in case the usual text extraction method fails.

Also, please assist me on which library will be suitable for me to perform this development?

Thanks.
0
Comment
Question by:mrnags
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
4 Comments
 
LVL 18

Accepted Solution

by:
JoseParrot earned 250 total points
ID: 35233200
Take a look at
http://forums.gdpicture.com/gdpicture-imaging-sdk/
It supports PDF files, OCR and image processing as well. It is capable of extracting both text and images, also to render text to image. A lot of features.

Hope has the functions you need.

Jose
0
 
LVL 100

Expert Comment

by:mlmcc
ID: 36275171
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Iteration Help (Asp.net VB) 5 36
Name Space error VS2015 1 36
MYSQL responding very slow 3 49
vb.net deleting excel sheet in workbook 11 32
I. Introduction In a previous article (http://www.experts-exchange.com/Web_Development/Document_Imaging/A_6537-PaperPort-Upgrade-How-to-download-and-install-updated-versions-of-PaperPort-11-and-12.html) (now deprecated), I discussed how to upgrad…
The Adobe PDF proprietary file format is recognized as secure and formulated. But these PDF files are also prone to corruption and any external threat like virus attacks, improper storage can hit PDF file integrity.This type of damages can make cruc…
In this video, we show how to perform Bates Numbering/Stamping of PDF documents using Power PDF Advanced, the newest product from the Document Imaging division of Nuance Communications. There are two editions of Power PDF — Standard and Advanced. Th…
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…

756 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question