Link to home
Start Free TrialLog in
Avatar of oaktrees
oaktrees

asked on

ABBYY Fine Reader

Hi,

Working with some very old PDF books that have yellowed backgrounds.  https://ia600701.us.archive.org/18/items/plato00coll/plato00coll.pdf  The pages have faded. The background color makes the text harder to work with.

Got the ABBYY fine reader.  Wow!  Cool suite.  But, I can't figure it out.  I know the fine reader can clean up these files so that they have basically white backgrounds with black text.  Do it all in one fell swoop.  But, I keep getting lost in the process.

Help! :D

Sincerely,

OT
Avatar of David Favor
David Favor
Flag of United States of America image

This will be more difficult than you might imagine, because you'll have an entire range of yellowish shades to convert.

If I just had to do this, I'd likely split the pdf into separate imagines, then determine my correct "whitening algorithm" using ImageMagick on 1x image... then modify my algorithm on 1x image till I got a good run, then apply the algorithm to several images, continually refining my algorithm till I got a good result.

Then apply the algorithm to all images.

Then reassemble the PDF from new "whitened" images.

Be prepared for a large amount of time invested to get this working.
SOLUTION
Avatar of David Favor
David Favor
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi OT,

It seems that you already have the answer to this from your question at the ABBYY support site:

https://support.abbyy.com/hc/en-us/community/posts/360009858959-Remove-Cloudy-Background-Image-Behind-All-Pages-of-a-PDF-in-One-Fell-Swoop

While I haven't tested the solution posted there, your own comment was:
Ah! GOT IT! In the Tools Tab > Options.


Conversion went through and the resulting file is just what I hoped for! :D

All the text is there AND it is much more responsive without those big images. SUPER

Are you looking for something beyond the answer there?

And then there's this EE thread from last year:

Remove Cloudy Background Image Behind All Pages of a PDF in One Fell Swoop

Regards, Joe
Hi OT,

Two other comments for you: (1) There's another EE thread where I made a video showing you how to do it in PDF-XChange Editor:
https://www.experts-exchange.com//questions/29193078/Mouse-Recorder-Macro-Maker-That-Works-Well.html

Video attached again here. If you find out that ABBYY FineReader can't do it (I haven't tried with AFR), then you can feed the PDF made in PDF-XChange Editor to AFR.

(2) PDF-XChange Editor Version 8 uses the LeadTools OCR engine, which is good OCR (certainly better than the freebie Tesseract), but not as good as ABBYY, by Tracker's own admission...that's why Tracker switched to ABBYY in PDF-XChange Editor Version 9...see item #2 here:
New and Updated Enhanced Optical Character Recognition (OCR) Plugin

Regards, Joe
oaktrees-demo.mp4
Avatar of oaktrees
oaktrees

ASKER

Hi Joe,

You're right.  I did get it once, but since then I haven't been able to get it to work.  

The program is complex, consisting of both the (main?) PDF FineReader - which is sometimes referred to as the editor - and the OCR Editor.  Documents opened in the FineReader seem to go immediately into the OCR Editor.  By immediately I mean unbidden.  Which is where my follow with the program begins to fly apart: in converting a document when am I reading, or editing, or OCR editing.  Often times it seems clicking an action in FineReader means you'll be running it in the OCR Editor.

I know there's a way to turn the dials on the two main programs to get them to work.  But, there's a symbiosis there that's hard to follow.  

Analogy for where I am that comes to mind: I had a Fiat Spider once years ago, and in order to put the car into reverse you had to push the gear shift DOWN first, and then back  to gear position 6.  COOL as heck, but the down always seemed always seemed counterintuitive.  (LOVED that car - CONVERTIBLE and I was in FLORIDA! :D)

Was hoping for a kind of step-by-step for how to get from the FineReader to the OCR Editor in this cloudy background task so as to reveal the defining metaphors for how the software works.  So far, the synergy between them confuses.

For what its worth, I find myself in the ignominious position right now that...all the files I open go black on white! :)))  I think I have a solution to that forthcoming.

Headed to the office now.

Sincerely,

OT
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Joe,

Your answer in one was spot on above, Joe.  That worked, but I needed to find a program that could do it all in one.

I SAW that about Tracker's new OCR!  You're right - ABBYY renders documents sooooooo WELL!  Spooky good - right down to headers, footers, even microscopic italicized footnote subscripts!

My only fear in updating is similar to what happens each time Calibre updates.  There things that worked or settings that I had down PAT always need to be redone - or become inoperable for a while.  These days my kind of watch word is...wait a month for updates! :D  I can imagine there's a huge downside for security-related software.  I don't think I'll do that with Norton AV. :D

At the same time, I'm really hoping to crack the how-to on ABBYY.  As I mentioned in my post above, if I can just get the idea of the way the two main parts hand things off to each other, in context of this cloudy document challenge, I'm sure I'll be able to use the program and understand it MUCH more effectively.

I know it can do it.  When it works it is...AMAZING!  Just need to know when to push the gearshift down. :D

Sincerely,

OT
Hi Joe,

TOTALLY understand about the time sink! :D  When I see how often you help people I feel impressed, and grateful.  Thank you, Joe! : )))))))))))))))))))))))))

Sincerely,

OT
Thanks for understanding! I don't use AFR (or PDFX) often, so I'd have to spend some time to figure it out...and time is what I don't have right now. AFR is one of many PDF products that I have in my bag-o-tricks because I do a lot of business in the document imaging/PDF space, so it's more of a test/experiment/sandbox product for me (PDFX, too). Regards, Joe