Create Searchable PDF from tif Image and OCR Text with VB.net

We are running an OCR package that can provide me with a tif image and an OCR text file which includes all of the data from the document and relitave coordinates of that data.  What I want to do is create a PDF from the imgage file with this data (in the background) which make the PDF searchable.

I obviously want to do this in an automated fashion preferable written in VB with some controls, etc.  I have ADOBE professional... is there some controls in that package I can leverage in VB.net to accompolish this?

Tks,
J
jimtxasAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Karl Heinz KremerCommented:
Acrobat Professional (or any Adobe package) does not allow you to create these files - at least not with the VB API. You probably can do it within a plug-in.

If you want to do this, you need a pretty good understanding of the PDF spec, because you will create a PDF file from scratch. For this, you need a PDF toolkit or a PDF library that allows you to do this. This will be a major project.

It would be much easier to just take a OCR package that already creates PDF files in the correct format (e.g. Abbyy's FineReader or ScanSoft's OmniPage).
0
jimtxasAuthor Commented:
Can you suggest any toolkits that would accompolish this?  We cannot use a different OCR package.  The system we are using is an extremely powerful data processing/extraction engine with an investment in excess of $1M.  The software outputs 3 components: the data requested for extraction, tif image, full OCR 'map' of all the data ocr'd in the document...
0
Karl Heinz KremerCommented:
I don't have any experience with tools for VB. I would do this either from scratch - without a PDF library (the PDF format is relatively simple to write), or with an Acrobat plug-in (this has to be C or C++), or with either the Adobe PDF Library (expensive - http://partners.adobe.com/public/developer/pdf/library/index.html) or the Appligent sPDF library (which is API compatible to Adobe's library and the plug-in API - http://www.appligent.com/developers/developers.html).

If you want to create your own PDF creator, you need to read and understand the PDF Reference: http://partners.adobe.com/public/developer/pdf/index_reference.html

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

michaelbuddyCommented:
I know you have this searching database technology you want to take advantage of, but I want to offer an alternative suggestion if I may.

this may work for you.  

If you're image have unique names and other data, you could in your source page document, pagemaker, indesign quark or whatever.  Create a text box with the data you need for the image.

drop that text box Behind the image.  Then when you want to do a search based on that data, the text find will take you to the image, because the text box is hidden behind it.


that may be more work than you want to do, but it will make it so more people could search it without special plugins.

or you could just caption the image underneath it.

0
Karl Heinz KremerCommented:
michaelbuddy, this would allow you to search and find the _page_ the search string is on, but not the exact position. You would see Acrobat highlight one or more areas on your page that have nothing to do with your search string. It's better than not having any search capabilities, but nowhere near what you get with the normal "hidden text" mode that Accrobat supports.
0
michaelbuddyCommented:
I see.  

have you looked at any of the products from Enfocus.  They might have something that works for you.  I know you can do a lot of PDF diagnostics with it.

check out http://www.enfocus.com

we use that company for checking our pdfs for print, but I couldn't tell you all the products they have, it's quite a few.
0
Karl Heinz KremerCommented:
The only SDK that Enfocus distributes is for preflighing PDFs, there is nothing to create PDF.
0
jimtxasAuthor Commented:
khkremer,  will you email me directly at jimtx@arn.net

Tks,
J
0
Karl Heinz KremerCommented:
The EE membership agreement does not allow any discussion outside of the EE forum, and it also does not allow any email addresses in EE comments. Please continue the discussion in this forum.
0
jimtxasAuthor Commented:
Sorry, I just wanted to make a propoisition for some contracted assistance...
0
Karl Heinz KremerCommented:
I'm not available for any contract work. Sorry.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Adobe Acrobat

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.