curiouswebster
asked on
Baby steps with PDFtoText for OCR
Baby steps with PDFtoText for OCR
What steps are the first for me to take as I create a proof of concept that will be:
- a C# Winforms program
- uses the PDFtoText library for OCR
Are there any demo programs I can review? Should I just dive in?
Thanks
What steps are the first for me to take as I create a proof of concept that will be:
- a C# Winforms program
- uses the PDFtoText library for OCR
Are there any demo programs I can review? Should I just dive in?
Thanks
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
> Each page will have a form-feed character, so a 1,000-page doc with no text would be 1,000 bytes.
You're a genius, Joe.
I love "poor man's" solutions like that.
I will read the rest of your post today and ask a few more questions.
Thanks!
You're a genius, Joe.
I love "poor man's" solutions like that.
I will read the rest of your post today and ask a few more questions.
Thanks!
ASKER
I have another question, already.
If we are running a superior OCR tool, then do we even care if there is test in it? Or, is there a smarter way to:
- leave the PDF untouched
- generate a new text file
- supplement the text file with the text already in the PDF?
That superset may be closer to the 100% we seek.
Does this make any sense?
If we are running a superior OCR tool, then do we even care if there is test in it? Or, is there a smarter way to:
- leave the PDF untouched
- generate a new text file
- supplement the text file with the text already in the PDF?
That superset may be closer to the 100% we seek.
Does this make any sense?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
thanks
You're welcome. If you wouldn't mind re-opening and re-closing the previous question as I suggested in my last post there, I'll appreciate it. Thanks, Joe
ASKER
Joe, there are these fancy ways to reach you with EE, but none involve a simple email. I want to talk to you about a project. How can I get in touch with you?
EE doesn't allow sharing email addresses in the public forum...I'll send it to you in a PM.
ASKER
Is that an OCR service MetroFax offers, to "pre-process" the PDF and sent as complete a PDF as possible?
How do I determine if a PDF has text in it or not?
Is it the PDF Normal File which has some or all of the field already converted to Text?