Link to home
Start Free TrialLog in
Avatar of curiouswebster
curiouswebsterFlag for United States of America

asked on

Baby steps with PDFtoText for OCR

Baby steps with PDFtoText for OCR

What steps are the first for me to take as I create a proof of concept that will be:

- a C# Winforms program
- uses the PDFtoText library for OCR

Are there any demo programs I can review? Should I just dive in?

Thanks
SOLUTION
Avatar of Joe Winograd
Joe Winograd
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of curiouswebster

ASKER

I guess I was unclear about the PDF containing the text. Where did this text come from in the case of a fax, dowloaded as a PDF, for example, from MetroFax?

Is that an OCR service MetroFax offers, to "pre-process" the PDF and sent as complete a PDF as possible?

How do I determine if a PDF has text in it or not?

Is it the PDF Normal File which has some or all of the field already converted to Text?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
> Each page will have a form-feed character, so a 1,000-page doc with no text would be 1,000 bytes.

You're a genius, Joe.

I love "poor man's" solutions like that.

I will read the rest of your post today and ask a few more questions.

Thanks!
I have another question, already.

If we are running a superior OCR tool, then do we even care if there is test in it? Or, is there a smarter way to:

- leave the PDF untouched
- generate a new text file
- supplement the text file with the text already in the PDF?

That superset may be closer to the 100% we seek.

Does this make any sense?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thanks
You're welcome. If you wouldn't mind re-opening and re-closing the previous question as I suggested in my last post there, I'll appreciate it. Thanks, Joe
Joe, there are these fancy ways to reach you with EE, but none involve a simple email. I want to talk to you about a project. How can I get in touch with you?
EE doesn't allow sharing email addresses in the public forum...I'll send it to you in a PM.