?
Solved

Scanning software recommendation, compression alternatives and how do you estimate hard drive space needed for scanned documents and their file size

Posted on 2006-06-07
12
Medium Priority
?
797 Views
Last Modified: 2013-12-27
I intend to scan hundreds of financial type documents, store them on a network hard drive, so network users can download them to their remote computers.  What is a recommended scanning software to do this, is there a recommended compression technology to use, and how can I estimate what hard drive space is going to be needed for the scanned documents?
0
Comment
Question by:SpringLake
  • 3
  • 3
  • 3
  • +2
12 Comments
 
LVL 93

Expert Comment

by:nobus
ID: 16859265
if the file size is an issue, you can scan them in black and white.
abby fine reader can help you put in text form :   http://www.abbyy.com/
as for the space needed, i would do a test run with different documents, to make an average estimation
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 16916494
I would also suggest the Abbyy FineReader. If you want to make sure that it's easy to get to the content of your scanned pages, you should save the files as PDF documents. You can specify to OCR the documents, this OCR information will also be stored in the PDF files. This will enable you to search through the files.

It's hard to calculate space for scanned images because it depends on the content, and on how clean the scans are. The same page will result in a much smaller file if you can cleanup the scan before it's compressed and saved. This would for example mean to run a despeckle routine and to get rid of scan artifacts (shadows, holes, ...). It also depends on teh resolution that you want to scan the documents in. What resolution to use depends on what you want to do with the documents. If you want to print them, you need at least 300 dpi to make the prints look good. If it's just for on screen viewing, you can go down to maybe 150 dpi. But this also depends on the content of your documents (e.g. font size).

Do you already have a scanner?
0
 

Author Comment

by:SpringLake
ID: 16916977
No, I do not have a scanner and would be open to any suggestions you have.  The scanned pages will be financial info like tax returns, accounting ledgers, etc. (not photos) and will range in size from 8x10 to irregular ledger size sheets like 11x17 or so.  Most, however will be 8x10 and some will have printing on both sides.  I suspect we will be scanning about 500-1000 pages a month.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 16919230
I don't have any first hand experience with any current document scanners, but I hear good things about Fujitsu and Kodak scanners.
0
 
LVL 15

Expert Comment

by:mcp_jon
ID: 16991631
Hi,

Defenitelly Fujitsu Document Scanners !

Has for the space to allocate, the more the better, if you can't get to "crazy" values, then I guess a full glass scanning, no margins, scanned at 2400 DPI, which is my Highest resolution, takes up to 1,7 Gb in Color, and about 70 Mb in B&W Bitmap!

At 1200 DPI, should take 420 Mb Colour and 17.5 Mb in B&W Bitmap.

At 600 DPI, Which is on of the most common, takes up to 105 Mb Colour and 4.4 in B&W Bitmap.

At 300 DPI, which is the one I often use, 26,2 Mb in Colour and 1,1Mb in B&W Bitmap.

So, scanning at 300 DPI, TRUE Colour, 250 GB would hold about 9.500 images !

Best Regards !
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 16991985
A word of caution: Even though mcp_jon mentioned a bunch of really high resolutions, don't go overboard when selecting your scan resolution. For most documents 300dpi is more than enough. If you need to print the documents in very good quality again, then go for 600dpi for monochrome documents, for color 300dpi is still sufficient. More than 600dpi are only necessary if your source material actually has that resolution to start with.
0
 
LVL 15

Expert Comment

by:mcp_jon
ID: 17057370
Hi,

Any news ?

Best Regards !
0
 
LVL 2

Accepted Solution

by:
trh01 earned 1000 total points
ID: 17223094
Some further thoughts on this decision...

Choice of scanner
--------------------

1.  Is someone going to hand feed these 500 - 1000 pages per month - or do you want an autofeed scanner?

2.  Make sure you get a scanner large enough to scan your largest sheets in one go.   If your largest is 11x17 that is equivalent to the European A3 format.

3.  Make sure you have or can fit the appropriate interface on whatever PC you are connecting the scanner to.   SCSI, USB and Firewire are common.  Networking is possible if you want to share the scanner amongst a workgroup, rather than have just one person doing the scanning.

My recommendation for the above would be an Epson GT-15000.   It costs about $1,500 in the US according to Epson's website (www.epson.com), though you can get it cheaper by shopping around.   An automatic document feeder is optional, and costs an extra $1,100 approx.   USB 2.0 and SCSI are standard interfaces, and Firewire and networking are optional extras.  This is a professional level scanner which will do everything you want, and is backed by technical support packages.   I use the older GT-10000+ myself, and find it really excellent.

If you look at the Epson website you may note the cheaper GT-2500, which includes an ADF for about a third of the price of the GT-15000.  The problem here is the restricted sheet size (14" max length).  There is a heavy price premium for the larger sheet size and you may want to review whether you really need this.

I have no personal experience of Fujitsu or Kodak scanners, however they look excellent machines on the Fujitsu website. The fi-5650C model for instance, appears to offer not just an ADF, sheet size up to 11x17, but also automatic deskew and automatic cropping.  Whether these latter features are important to you will depend on whether you want the final output in image format or OCR'd   (see below), and whether you care if an image is not perfectly straight or has black borders around its edges.  If you just want to display the information - then you probably don't. If you OCR, then deskewing etc will happen in the software (see below). As the price for the Fujitsu fi-5650C appears to be in the $6,000 region, you might find it nice but not economic.


Choice of file format
-----------------------

In terms of small file size, the accepted standard for documents scanned in black and white (i.e. just 1 bit per pixel) is TIFF with Group 4 compression, i.e. file with a ".tif" extension.  This will give you the smallest file sizes per page at the moment, and is an industry standard.  Actual file sizes will vary depending on the content, but I get a file size of about 50kb per A4/Letter page using 300dpi resolution.   But note this really is an average - a blank sheet comes in at 1kb, but a very full page can reach as high as 100kb.    The way to find out is to run some of your documents through a scanner, save them as TIF+G4 and see for yourself.

Be wary of cutting scan resolution much below 300dpi for black and white scans.   Once you have taken a scan there is nothing you can do to recover information lost because you did not use sufficient resolution.   Hard disk storage space is cheap these days - a lot cheaper than the hassle involved in having to recover information lost because you skimped on resolution at the scanning stage.  And a document stored - but unreadable - is a real waste of space!

If you have multi-page documents, then you will probably not want one TIF file per page.   Your choices here are to save the pages as a "multi-page TIFF", or as others have suggested, a PDF document.

Either multi-page TIFF or PDF has reader software readily available.   For the former,   "Windows Picture and Fax Viewer" comes as standard with WinXP, and will allow users to view and print TIFF and multi-page TIFF files.   For PDF, of course Adobe Reader is widely available.

You should note that TIFF is strictly an image only format.   PDF is either - depending on whether you have applied OCR to your scans or not.   If you are applying OCR, then that will push you down the PDF route.

To produce either format, I would strongly recommend Abbyy Finereader as others have done.

To OCR or not to OCR?
--------------------------

To understand this question  you need to appreciate that what comes out of the scanner is just a digitised photo.  A PC (with the right software) will display it, but any text is not readable or searchable, because as an image, the PC can't tell the difference between an image of text or your granny!  To convert to computer readable text, you are going to need to OCR it, and this can be p-i-t-a if the original documents are anything other than pristine, with very clear printing.   Laser printed text on white paper works well.   Anything less, I find you can spend a great deal of time correcting misreads - and for financial figures this could be a disaster.   A good OCR package is able to tell when it is unsure about reading a character - and will flag up its uncertainty, so that a human operator can decide what the correct reading is - but all this takes a lot of time and trouble.

If you are using OCR, then the scan resolution will be whatever is needed to produce the minimum of OCR read errors, since it is no longer a factor in the final file size.   Abbyy Finereader advises if you have not used sufficient resolution when you run its OCR package.   The final file size for an OCR'd page is very similar to that produced by going through word processing and converting to PDF.    Around 33kb per A4/letter page is what I get, i.e. slightly less than if left as an image.

If you are going the OCR route - then again I would recommend Abbyy Finereader.

Scan resolution
0
 

Author Comment

by:SpringLake
ID: 17223438
Great answer.
Query:  if I am scanning say tax returns, and using an OCR software like Finereader, is it possible to import the entire tax return numbers right into a spreadsheet program like Excel or must this be done number by number manually?
Thanks.
0
 
LVL 15

Expert Comment

by:mcp_jon
ID: 17223493
Yes, it is !

Best Regards !
0
 
LVL 2

Expert Comment

by:trh01
ID: 17228168
Yes, with Abbyy Finereader you can save direct to a number of different file formats once you have OCR'd a page.  With Finereader version 5, which I am using (now quite old), I have the possibility of saving to MS Word, MS Excel, Corel WordPerfect, Lotus WordPro, and also as RTF, PDF and HTML.   No doubt later versions of FineReader do even better.

I just tested it with saving a table of figures into Excel after OCR, and its done a good job.

0
 

Author Comment

by:SpringLake
ID: 17229639
thanks again.  good advice
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

To Enable Full Function of the Microsoft Office Keyboard (RT9450) in Vista 64 and Windows 7 These instructions worked for me using IntelliType v 6.1, but later versions of IntelliType might also work.  These suggestions work on Vista 64 bit, but …
Well I am not sure whether i deserve anything (credit or points) for this article, since I have not written the source code but discovered the same while browsing the net. I only wish to help EE users save some money and probably help the environmen…
In a question here at Experts Exchange (https://www.experts-exchange.com/questions/29062564/Adobe-acrobat-reader-DC.html), a member asked how to create a signature in Adobe Acrobat Reader DC (the free Reader product, not the paid, full Acrobat produ…
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question