Solved

Scanning software recommendation needed

Posted on 2013-05-14
9
357 Views
Last Modified: 2013-05-17
Hello,

I'm looking for software that will address the following situation. Any recommendations would be greatly appreciated!

The client has a storeroom full of boxes of old invoices. Each invoice is stored with all of its associated paperwork. So you have an invoice, the paperwork for that invoice (anywhere from zero to 30 more pages), a second invoice, the paperwork for the second invoice, etc.

They want to scan the invoices and the associated paperwork in so that:

1. They can do some basic OCR (invoice number, date, customer code) on the invoice (which will be the top sheet).

2. The scans of all the associated paperwork are associated with the scan of the invoice. So either have the invoice and all of its associated paperwork scanned into one PDF, or have them as separate PDF's but with related names (e.g. the PDF names all start with the invoice number).

3. Scanned files are saved on the local network (rather than in the cloud somewhere).

It would be nice if they could just stuff a stack of invoices & paperwork into the scanner and have the scanner/software recognize an invoice when it gets to it, but that isn't necessary. They're willing to manually do the scans one invoice (and its paperwork) at a time, if necessary.

Also, all invoices are the same format and layout, so we don't have to worry about OCR for different types of invoices.

If we could find software that does the above, that would be excellent.

We have one last desire, but I understand it's unlikely - I only ask in case it's possible. Some of the paperwork associated with each invoice will be work orders. What would be ideal is software that, as it's scanning through a stack of papers, can first identify if a paper is an invoice or a work order, and then do basic OCR depending on what type of page it is. (Most of the pages aren't invoices or work orders, so no OCR will be needed on those.)

Thanks in advance.

James
0
Comment
Question by:jrmcanada2
  • 5
  • 4
9 Comments
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
Hi James,
The client has a storeroom full of boxes of old invoices.
My advice - don't do it! In the imaging business, we call this backfile conversion, and it's hardly ever as easy or as successful as we'd like it to be...and it nearly always takes more time and costs more money than we'd like it to. With "a storeroom full of boxes" you are headed for lots of grief, imho. My suggestion is to let the old invoices die a natural death and image only on a go-forward basis. Or at least limit the time frame - for example, do just one to two years worth of back invoices.

Btw, I assume you're talking about inbound/AP invoices, not outbound/AR invoices. If that's the case, then it is among the better doc types for this strategy since AP invoices have a relatively short life - from an operational perspective (you will, of course, want to retain them for a long time after they are paid for tax purposes). But unless you have Manhattan-level real estate costs, leave the old invoices in the storeroom.

Now, in terms of go-forward, there are many solutions that will do what you want...even the "last desire"...but they can be very expensive. There are many more factors that would need to be understood before recommendations can be made. For starters, workflow is a key issue. Do you want to workflow the invoices around for online review and approval? Or will all of the review/approval take place from the paper and the imaging is just an archival process? To give you a sense of the options, take a look at:
http://www.kofax.com/software/markview-ap-automation/

Conflict of interest statement: I was involved with MarkView for Accounts Payable for more than 15 years before it was acquired by Kofax.

If all you're looking for is archival/retrieval, then you could use almost any imaging software that comes with modern scanners or is available separately. One strategy is simply to make PDF searchable image files (all pages of all invoices...forget about zones) and then use a top-quality indexing/search tool. PDF searchable image files have the raster image (bitmap) from scanning and the text from OCR, so you'd be able to search on, for example, work order numbers.

I mentioned earlier that there are many factors to consider in looking at a solution. Here are just a few:

(1) How many users?

(2) If you don't take my earlier advice, how many thousands of pages of existing paper to be scanned?

(3) How many pages per day of new paper coming in to be scanned?

(4) Are the docs all single-sided? If not, what percentage of the docs are double-sided? In other words, would a duplex scanner be helpful?

(5) Are the docs all (or mostly) black and white? In other words, is color scanning needed?

(6) Are the docs all (or mostly) letter size? Any small/odd-size docs? Or large ones? Any legal size? Any wider than 8.5"?

(7) Does your client already have a scanner (or two or more)? If so, what makes/models? If not, is selecting a scanner part of this project? If so, how many?

(8) How complex is the workflow process? As mentioned earlier, do invoices have to be routed around for online review and approval?

(9) What is the budget for the project? It doesn't have to be exact, but it's important to know approximately how much money the client is willing to spend on hardware, software, and services to implement the solution.

Answers to these questions are necessary for recommendations to make any sense. Regards, Joe
0
 

Author Comment

by:jrmcanada2
Comment Utility
Hi Joe,

Thank you very much for the thoughtful response. And for the warning about archiving.

Here are a few more details:

1. This is entirely for the purpose of archiving historical A/R invoices. Since upgrading their computer system a couple years ago, all current A/R invoices are already being saved digitally. This scanning project is for all of the A/R invoices prior to two years ago. I realize this sounds crazy but after lots of discussion/debate, this is what the client wants.

2. They are planning to hire one student to do all the scanning as a summer job.

3. Duplex scanning is a necessity.

4. Colour scanning is highly desired but not absolutely mandatory. 90% of the documents would be black & white.

5. Most of the documents are letter size. Perhaps 10% of them are smaller. There might be a few that are wider than 8.5" or longer than 11" but they could be handled manually.

6. The client is planning on buying a new scanner for this project - one that would suit the needs of the project.

7. There is no complex workflow. They have a couple storage rooms filled with boxes of old invoices (and associated documentation). Due to the nature of their business, they get regular requests for copies of some of these papers that could go back years. They're weary of the wasted space. They're even more weary of the incredible amount of wasted time and effort in having to pull out box after box, fish out the required papers, scan them, and put them back. 95% of the papers will never be consulted again but they have no way of knowing WHICH 95% so they need to scan them all.

Does that help?

James

P.S. Thanks again for the input!
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
Hi James,
Yes - very helpful! AR invoices represent a different situation from AP invoices. Now for my comments to your comments:

1. This is entirely for the purpose of archiving historical A/R invoices. Since upgrading their computer system a couple years ago, all current A/R invoices are already being saved digitally. This scanning project is for all of the A/R invoices prior to two years ago. I realize this sounds crazy but after lots of discussion/debate, this is what the client wants.

JW: My standard comment on AR invoices is that they should already exist on your computer system so there should be no need to scan. In other words, scanning docs that you printed from your own system is normally unnecessary. So now I understand that this is the current situation with your client, but that this project is for their old AR invoices (from prior to two years ago).

2. They are planning to hire one student to do all the scanning as a summer job.

JW: I've heard this approach many times before. I will simply warn you of the perils. Be prepared for difficulties.

3. Duplex scanning is a necessity.

JW: Very important to know this.

4. Colour scanning is highly desired but not absolutely mandatory. 90% of the documents would be black & white.

JW: Also good to know, although most scanners these days can do color. Speed can be an issue, but that's not going to be a problem with just occasional color scans.

5. Most of the documents are letter size. Perhaps 10% of them are smaller. There might be a few that are wider than 8.5" or longer than 11" but they could be handled manually.

JW: Then a scanner with a letter size ADF will do the trick. Do they have a budget figure in mind?

6. The client is planning on buying a new scanner for this project - one that would suit the needs of the project.

JW: That's good! Approximately how many total pages (not documents) to scan, i.e., total page count in the storeroom?

7. There is no complex workflow. They have a couple storage rooms filled with boxes of old invoices (and associated documentation). Due to the nature of their business, they get regular requests for copies of some of these papers that could go back years. They're weary of the wasted space. They're even more weary of the incredible amount of wasted time and effort in having to pull out box after box, fish out the required papers, scan them, and put them back. 95% of the papers will never be consulted again but they have no way of knowing WHICH 95% so they need to scan them all.

JW: I suggest that you simply create PDF searchable image files (scan and OCR), then file and index the PDFs, allowing for easy retrieval (via both file names and file contents).

I'll make some hardware and software recommendations after you give me the page count and the budget (doesn't have to be exact, but it's important to know approximately how much money your client is willing to spend). Regards, Joe
0
 

Author Comment

by:jrmcanada2
Comment Utility
Hi Joe,

Thanks again for the quick and helpful response!

My VERY rough guess is that they have 750,000 pages. In that regard, I suspect I gave you the wrong idea when I said they would hire a summer student. They might hire a summer student or two to get the project started. But they have enough employees around with not enough to do that they could task a few of those employees to do this work and therefore their actual labour cost is relatively small (since they're paying these people anyway).

Regarding budget, I can't be super helpful. Part of the request is that they want an estimate on cost from us so that they know if it's even feasible. My guess (and it's just a guess) is that they're expecting to pay $15,000-$20,000 for two scanners, software, and any programming or configuration work needed to set it up.

I appreciate your warnings. And I know this set-up is less than ideal. But this client has very stringent ideas about what they want, which leaves me very little latitude to suggest alternatives.

James
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
One more question - what is their expected time frame for this project? I don't mean hardware and software setup/config - I mean from scanning the first page to scanning the last page. By knowing that time frame and the fact that there are approx 750K pages, I'll be able to make an educated guess on the scanner(s) needed.
0
 

Author Comment

by:jrmcanada2
Comment Utility
I think they would like to complete the project within a year, but that's probably flexible if it makes a radical difference in scanner cost.
0
 
LVL 51

Accepted Solution

by:
Joe Winograd, EE MVE earned 450 total points
Comment Utility
Hi James,

With 250 work days in a year, they need to do around 3,000 pages per day to finish in a year. There's a temptation to say that with a modern scanner at 50ppm, you could handle 3,000 pages in an hour, but that's simply not the case, for a number of reasons, including paper prep (removing staples and paper clips), organizing/filing the PDFs, handling odd-size docs, running OCR, indexing, and more. Also, you said that an invoice may have anywhere from zero to 30 additional pages, which means that 3,000 pages could be as many as 3,000 invoices or as few as 100 invoices, which makes a big difference in the time required to handle them.

Another huge issue is the quality of the paper documents. If they're in bad shape (and considering their age and how/where they've been stored, that's a distinct possibility in this case), the task is far more difficult and time-consuming.

All of that said, I think that 3,000 pages per day (that's DAY, not HOUR) is a reasonable goal for a dedicated person with a fast, high-quality scanner connected to a fast, dedicated computer...with numerous caveats and constraints. But I would run a pilot on a few thousand pages before committing to that.

In terms of hardware, I'd consider a Kodak i2800 scanner:
http://graphics.kodak.com/DocImaging/US/en/Products/Document_Scanners/Desktop/i2800_Scanner/index.htm

It is rated 40ppm (80ipm duplex) at 300DPI/B&W, which is what I would scan at, except for unusual docs (the occasional grayscale or color scan). It has a 100-page ADF (no flatbed) and a duty cycle of 6,000 pages per day. The list price is US$1,895 but the street price is around US$1,500 (based on "colour", I'm guessing you're in the UK, but I'm going with US prices in this post...adjust accordingly). Note that they're cheap enough to buy two :) ...and stay within your budget...have one around for backup and perhaps to increase the daily throughput. Connect the i2800 to a dedicated, fast computer...something like a quad-core i7 with 16GB of memory running 64-bit W7...another grand or two.

For software, I'd go with scanning directly to PDF searchable image files and then indexing the entire contents of all files in an automated fashion. For the first task, I'd consider Nuance's new release - OmniPage Ultimate:
http://www.nuance.com/for-business/by-product/omnipage/ultimate/index.htm

The list price is US$500, but since the Kodak i2800 is bundled with a copy of OmniPage (an old release, no doubt), you should be able to get OmniPage Ultimate for the upgrade price of US$200.

For the second task, I'd go with dtSearch:
http://www.dtsearch.com/

It sounds as if just a single person will be retrieving an old invoice when needed, so you can probably go with dtSearch Desktop at US$200:
http://www.dtsearch.com/PLF_desktop_2.html

But if you think multiple users may need access, then you'll want to go with dtSearch Network, which provides for the sharing of indexes among network users:
http://www.dtsearch.com/PLF_network_2.html

This is priced by the seat:
http://www.dtsearch.com/dtStore.html

By creating searchable PDF files and indexing them with dtSearch, you will be able to use dtSearch to search on anything in the invoices, such as customer name, order number, invoice number, etc...whatever is in the invoices and their additional/supporting documents. Btw, I just ran a test with Nuance's PaperPort Professional 14 (which uses OmniPage under the covers to do OCR) and it took about one second per page to create a PDF searchable image file from a PDF non-searchable image file (i.e., from a 300DPI/B&W raster image).

Give some thought to how you want the files organized...by customer? by year?

As you're doing this project over the course of many months (or a year), I would set up a scheduled task to run in the wee hours that updates the index (dtSearch utilizes the built-in Windows Task Scheduler to run scheduled indexing jobs). With this approach, the user(s) will be able to search any and all invoices that have been scanned to date...each day, another day's worth will be added to the repository. When it's all over and the entire storeroom of old invoices has been scanned, the dtSearch index will be static, having been run for the last time to create the final index, allowing search on the entire repository of old invoices.

This is just one possible approach to handling this situation. There are many others! Regards, Joe
0
 

Author Comment

by:jrmcanada2
Comment Utility
Thank you, Joe! This is excellent!
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
James,
You're welcome! Happy to help. Even though the question is closed, please post back here occasionally to let me know how it's going. Good luck with the project! Cheers, Joe
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Update 21-May-2015: I temporarily removed the source code and the code snippets to make major changes to the program. Regards, Joe A recent question here at Experts Exchange piqued my interest, so I decided to provide a thorough solution and publ…
PaperPort (http://www.nuance.com/for-individuals/by-product/paperport/index.htm) is among the most important applications that I run on my Windows computers. I use it every day, for nearly all of my document and photo scanning, as well as most of my…
This video Micro Tutorial is the first in a two-part series that shows how to create and use custom scanning profiles in Nuance's PaperPort 14.5 (http://www.experts-exchange.com/articles/17490/). But the ability to create custom scanning profiles al…
This video Micro Tutorial is the second in a two-part series that shows how to create and use custom scanning profiles in Nuance's PaperPort 14.5 (http://www.experts-exchange.com/articles/17490/). But the ability to create custom scanning profiles a…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now