Solved

Best format to scan documents to for long term use PDF JPG TIF BMP or other

Posted on 2014-01-25
12
2,381 Views
Last Modified: 2014-01-25
I'm scanning old documents that likely will never be looked at again or once in the distant future. Cleaning out filing cabinets - old tax bills, closing statements from property my uncle owned and sold years ago, old school report cards, old wills of uncle & father that died years ago / estates are settled.  

Likely just things our kids / grandkids / etc will look at to reminisce (sp?) and not much more.

Best way to scan?  they are text docs so I am scanning at 200DPI - it's the text and overall appearance that's important.  Not to zoom in and see the nuances of the font used, etc.

And then what format?  PDF? JPG, something else?  A couple things are 2 page docs, so as a PDF, it's nice that the 2 pages can be in a single file.  There's no 2 page JPG, right?  TIF would do that, right?

But who knows in 50+ years what formats will still be readable?

care to guess?
0
Comment
  • 4
  • 3
  • 3
  • +1
12 Comments
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 143 total points
ID: 39809173
Well, I'm willing to bet that in 50 years there's still going to be some way to read a pdf or a jpeg/png. You can still listen to audio tapes and vinyl, right?

PDF is the most comfortable format. For multiple page documents and the ability to OCR on the fly.
If you're concerned about quality and scan in b/w or have few colors, I would advise PNG. It uses lossless compression, so you're not losing quality by saving the file. And if you have few colors the file size won't be huge.

HTH,
Dan
0
 
LVL 52

Assisted Solution

by:Joe Winograd, EE MVE
Joe Winograd, EE MVE earned 215 total points
ID: 39809175
Do not use BMP or JPG. You are correct...they are single-page only. When I started out in the scanning/imaging business 25 years ago, I would have recommended multi-page TIFF, and that's still a fine choice. But there's no denying that PDF has become ubiquitous, so I'd say PDF is fine, too. I have no doubts that multi-page TIFF and PDF will be readable in 50+ years from now. The much bigger issue is the media they're stored on. Imagine trying to read an 8" or even 5 1/4" floppy disks today. Regards, Joe
0
 
LVL 93

Assisted Solution

by:John Hurst
John Hurst earned 142 total points
ID: 39809181
I do what you are doing and I use PDF format. It works fine. I normally scan in 300 DPI because the disk space for 200 DPI is not that great. Resolution is better at 300 DPI.

Also look at your scanner settings. If I think I want to search the file down the road, I scan in Searchable PDF. I only do a few this way.

PDF is the way to go for the foreseeable future.

.... Thinkpads_User
0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 
LVL 52

Assisted Solution

by:Joe Winograd, EE MVE
Joe Winograd, EE MVE earned 215 total points
ID: 39809193
A few more comments. I just saw Dan's post and my suggestion is not to use PNG. For the docs you describe, I think TIFF and PDF are better. I'm personally scanning nearly everything to PDF Searchable Image files – that's a PDF file that has been through an OCR process so it has both an image and a layer of (searchable!) text in it. For some docs (rarely), I'll scan to an image-only PDF (if I don't think the OCR process will work well on it). Also, I scan almost everything at 300 DPI, black&white (1-bit). Occasionally, I'll scan at 200 DPI grayscale (8-bit) and on even rarer occasions, 150 or 200 DPI color (24-bit). Regards, Joe
0
 

Author Comment

by:BeGentleWithMe-INeedHelp
ID: 39809195
yes, thanks guys - media issues - yeah, I was going to joke with you dan about how it's not all that easy to listen to vinyl or or audio tape (reel to reel, 8 track).  I think the next 50 years will go 'faster' than the last 50 - bigger changes.  

I'm keeping these files on my 'data'drive with old and active things.  Yeah, the idea of archiving things could be another question.  But these and my current quickbooks file are on the same 1TB drive, get backed up with shadow protect, etc.  so when drive types change / I get a new machine, these docs and the quickbooks / all pics will move together to the new drive.  so media won't be as big an issue in the future?  (things were put on floppy because hard drives were expensive / small.  then you forget about the floppy with that important data till it's long gone from current machines?).  With bigger / relateively cheaper drives, you can keep more data 'live' (more likely to get corrupted - the drive is spinning all the time, 1 bit changes and the file is SOL?).

I'm stuck in the house, snowbound, so a little cabin fever.  I'm up for the conversation if you are.

Dan - HTH means hopes this helps?  cute.  hadn't seen that before.
0
 
LVL 93

Assisted Solution

by:John Hurst
John Hurst earned 142 total points
ID: 39809204
my current QuickBooks file are on the same 1TB drive

QuickBooks (or any other like financial system) is a VERY different question. There is NO guarantee that QB version 2025 will read your old file. People have difficulty now trying to upgrade a QB V2002 file to QB V2013 or 2014.

So if you wish to archive QB data, open the ledger, and save General Ledgers and Trial Balances.

Alternatively, purchase the new version of QB each year and upgrade the ledgers as you go along. I do this and it works just fine.

.... Thinkpads_User
0
 
LVL 34

Assisted Solution

by:Dan Craciun
Dan Craciun earned 143 total points
ID: 39809213
I think it's the first time me and Joe openly disagree ... somewhat :)

I did say pdf was the most "comfortable" format. BTW, you can OCR directly from inside Acrobat (Tools->Recognize Text), regardless of when you scanned the original document.

But, coming from a design background, if the appearance of the original document is important then I don't trust lossy compression. Actually png are ideal for archiving because they keep every pixel as you scanned it. The downside is that they are usually bigger than jpeg's, but not if you're keeping in 8 colors.

It's about the same conversation as what's best for audio archiving: mp3's or flac? I would always go for flac, simply because I can recreate the source with 100% fidelity.

Yup, HTH means hope this helps. I borrowed it for about 10 years now, from a message board somewhere :)
0
 
LVL 52

Accepted Solution

by:
Joe Winograd, EE MVE earned 215 total points
ID: 39809219
Yes, just make sure you keep moving all of the docs to your new computer and you'll be fine. The danger is if you archive them to external media and then that external media becomes obsolete with no devices around that will read it. But as long as the docs stay on your latest-and-greatest BelchFire 9000, you're good.
0
 

Author Comment

by:BeGentleWithMe-INeedHelp
ID: 39809221
as I type I am scanning to PDF.  Don't need / want to go through the trouble of OCR - there's some proofreading / checking you need to do to see that it got it right? It's hard enough getting the time to scan, let alone look at each doc.

2 other questions I have to post today:

what IS a good program for organizing pics and docs to make them easy to search.  (as I say I don't want to do OCR).  But pics aren't going to do OCR anyway. want to be able to search on keywords that have check boxes (don't want to have to worry with free form keywords, I sometimes type WDW other times Walt Disney World, etc.  Just set a keyword? 'WDW' and then I'll see that in a list of words as I view a doc / picture and can check that box.

http://www.experts-exchange.com/Software/Photos_Graphics/Images_and_Photos/Q_28348236.html
And PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best.

http://www.experts-exchange.com/Web_Development/Document_Imaging/Adobe_Acrobat/Q_28348235.html
0
 

Author Comment

by:BeGentleWithMe-INeedHelp
ID: 39809222
Oh, I still have the Belchfire 8000. Have to upgrade : )
0
 

Author Closing Comment

by:BeGentleWithMe-INeedHelp
ID: 39809230
thinkpad - YES!  quickbooks would be upgraded every couple years.  I used a bad example  there - just describing actively used data vs. things I scan and forget about.

Yeah, anyone have VisiCalc files they need opened?!
0
 
LVL 93

Expert Comment

by:John Hurst
ID: 39809233
For organizing files, I use Windows Explorer. I have developed file categorization that works fine with Explorer.   If you do not scan Searchable, then you can only scan for file name in most cases.

Why do you get different sizes?  You asked that as another question and I answered there. It depends entirely on the scanner (settings being equal). Different scanners (Xerox and HP say) produce different PDF sizes.

.... Thinkpads_User
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

*Adobe Acrobat 9 was used for this article.  Particular steps may vary depending on software versions. Adobe Acrobat has many, many variables that my be utilized to customize your forms for clarity and ease of use. The Form Editing Tool will be y…
In a previously published article (http://www.experts-exchange.com/articles/10331/Automatic-Duplex-Scanning-in-PaperPort-Versions-11-12-14.html) here at Experts Exchange, I explained how to achieve duplex (double-sided) scanning in Nuance's PaperPor…
The goal of the tutorial is to teach the user what exposure is and how to use the exposure slider. Analyze the photo that you want to edit, then adjust the exposure slider to your liking.
The goal of the tutorial is to teach the user how to use import presets downloaded from the internet into Adobe Lightroom. Once you downloaded the presets go into the preset folder and press import then import your preset and your set it to go.

803 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question