Data reduction on scanned images

Hello Experts,

I'm hoping someone can either give me the name of 3rd party software that will handle the use case below or provide a creative fix for an issue we are having.

We have a HP Digital Sender flow 8500 fn1 scanner.

A clerk takes a stack of letters (some are handwritten, some typed, some on forms) and scans them into the scanner. The pdf image is then stored in a temp folder until a ticket is assigned then it is moved to the ticketing system.

The issue is some of the letters have information we didn't request or want - like credit card numbers - and we don't want them in the ticketing system.

I am looking for either an add-on software that will find the string of numbers that match the credit card pattern and redact the string of characters or not scan the document at all.  I'd prefer the image was redacted or kicked out before it hit the temp folder but the redaction could occur there.

Any suggestions?

Thank you,
Steph
LVL 1
Steph_MAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

gheistCommented:
Do you OCR your scans?
Steph_MAuthor Commented:
No we do not.
Joe Winograd, Fellow&MVEDeveloperCommented:
Hi Steph,

> some are handwritten, some typed, some on forms

This implies that the private data (like credit card numbers) are not in a fixed location, which means that it's not possible to redact based on fixed X-Y locations. This would work on a particular form where you could specify the X-Y location(s) to redact, but with free-from handwritten or typed letters, that is not possible.

The HP Digital Sender Flow 8500 fn1 has an embedded IRIS OCR engine, so you may be able to use that to create a recognizable pattern with text (not a bitmap/graphic/raster image), such as the xxxx-xxxx-xxxx-xxxx of a Visa/MasterCard, the xxxx-xxxxxx-xxxxx of an Amex card, the xxx-xx-xxxx of an SSN, the xx-xxxxxxx of a TIN, etc. But I think that will be highly prone to error, especially for handwritten letters, where you really need ICR, not OCR, and even ICR is extremely unlikely to give you the accuracy you need.

The HP Digital Sending Software works on the HP Digital Sender Flow 8500 fn1. It is available for free download at the HP site. You need to be signed into your HP account and then visit this page:
https://h20392.www2.hp.com/portal/swdepot/try.do?productNumber=DSS-SW&lang=en&cc=us&hpappid=PDAPI_PRO_SWD

It has some additional capabilities beyond the standard software, including workflow, but I'm not aware of any way to do what you're looking for with it. However, I'm not an expert on that software and it may be worth a deep dive and some experimentation.

> Any suggestions?

I don't know what your volume is, but my suggestion is to have the scanning operator manually redact at scanning time (or have someone else redact right after scanning if you don't want the scanning operator to spend time redacting). It's labor-intensive and certainly not what you wanted to hear, but I can't think of a way to do it (with high accuracy) in an automated fashion. Regards, Joe

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Steph_MAuthor Commented:
This was exactly the type of information I was looking for. We are having a clerk look at it first - it seems to be the most efficient and accurate method.

Thank you.
Steph
Joe Winograd, Fellow&MVEDeveloperCommented:
You're welcome, Steph. Having a clerk look at it first is a good call. Regards, Joe
gheistCommented:
I wanted to suggest calculating LUHN's on OCR-d text and jamming it.
Steph_MAuthor Commented:
Thank you. Our issue is identifying the sequence of numbers in unstructured text and not the false-positive of the string.
gheistCommented:
Yes, but I dont think your clerk can calculate it https://en.wikipedia.org/wiki/Luhn_algorithm
ozoCommented:
I would not expect the clerk to often see things that look like a credit card number with the wrong checksum.
On the other hand, OCR software could have difficulty with handwritten numbers that a clerk could still read.
A clerk might also make better use of contextual clues to identify credit card numbers than most OCR engines would.
What things might the letter contain which could resemble credit card numbers but which you want to keep?
gheistCommented:
OCR would make clerks job much easier...
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
OCR

From novice to tech pro — start learning today.