Solved

Best format to scan documents to for long term use PDF JPG TIF BMP or other

Posted on 2014-01-25
12
2,814 Views
Last Modified: 2014-01-25
I'm scanning old documents that likely will never be looked at again or once in the distant future. Cleaning out filing cabinets - old tax bills, closing statements from property my uncle owned and sold years ago, old school report cards, old wills of uncle & father that died years ago / estates are settled.  

Likely just things our kids / grandkids / etc will look at to reminisce (sp?) and not much more.

Best way to scan?  they are text docs so I am scanning at 200DPI - it's the text and overall appearance that's important.  Not to zoom in and see the nuances of the font used, etc.

And then what format?  PDF? JPG, something else?  A couple things are 2 page docs, so as a PDF, it's nice that the 2 pages can be in a single file.  There's no 2 page JPG, right?  TIF would do that, right?

But who knows in 50+ years what formats will still be readable?

care to guess?
0
Comment
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 3
  • +1
12 Comments
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 143 total points
ID: 39809173
Well, I'm willing to bet that in 50 years there's still going to be some way to read a pdf or a jpeg/png. You can still listen to audio tapes and vinyl, right?

PDF is the most comfortable format. For multiple page documents and the ability to OCR on the fly.
If you're concerned about quality and scan in b/w or have few colors, I would advise PNG. It uses lossless compression, so you're not losing quality by saving the file. And if you have few colors the file size won't be huge.

HTH,
Dan
0
 
LVL 54

Assisted Solution

by:Joe Winograd, EE MVE 2015&2016
Joe Winograd, EE MVE 2015&2016 earned 215 total points
ID: 39809175
Do not use BMP or JPG. You are correct...they are single-page only. When I started out in the scanning/imaging business 25 years ago, I would have recommended multi-page TIFF, and that's still a fine choice. But there's no denying that PDF has become ubiquitous, so I'd say PDF is fine, too. I have no doubts that multi-page TIFF and PDF will be readable in 50+ years from now. The much bigger issue is the media they're stored on. Imagine trying to read an 8" or even 5 1/4" floppy disks today. Regards, Joe
0
 
LVL 95

Assisted Solution

by:John Hurst
John Hurst earned 142 total points
ID: 39809181
I do what you are doing and I use PDF format. It works fine. I normally scan in 300 DPI because the disk space for 200 DPI is not that great. Resolution is better at 300 DPI.

Also look at your scanner settings. If I think I want to search the file down the road, I scan in Searchable PDF. I only do a few this way.

PDF is the way to go for the foreseeable future.

.... Thinkpads_User
0
[Webinar] Learn How Hackers Steal Your Credentials

Do You Know How Hackers Steal Your Credentials? Join us and Skyport Systems to learn how hackers steal your credentials and why Active Directory must be secure to stop them. Thursday, July 13, 2017 10:00 A.M. PDT

 
LVL 54

Assisted Solution

by:Joe Winograd, EE MVE 2015&2016
Joe Winograd, EE MVE 2015&2016 earned 215 total points
ID: 39809193
A few more comments. I just saw Dan's post and my suggestion is not to use PNG. For the docs you describe, I think TIFF and PDF are better. I'm personally scanning nearly everything to PDF Searchable Image files – that's a PDF file that has been through an OCR process so it has both an image and a layer of (searchable!) text in it. For some docs (rarely), I'll scan to an image-only PDF (if I don't think the OCR process will work well on it). Also, I scan almost everything at 300 DPI, black&white (1-bit). Occasionally, I'll scan at 200 DPI grayscale (8-bit) and on even rarer occasions, 150 or 200 DPI color (24-bit). Regards, Joe
0
 

Author Comment

by:BeGentleWithMe-INeedHelp
ID: 39809195
yes, thanks guys - media issues - yeah, I was going to joke with you dan about how it's not all that easy to listen to vinyl or or audio tape (reel to reel, 8 track).  I think the next 50 years will go 'faster' than the last 50 - bigger changes.  

I'm keeping these files on my 'data'drive with old and active things.  Yeah, the idea of archiving things could be another question.  But these and my current quickbooks file are on the same 1TB drive, get backed up with shadow protect, etc.  so when drive types change / I get a new machine, these docs and the quickbooks / all pics will move together to the new drive.  so media won't be as big an issue in the future?  (things were put on floppy because hard drives were expensive / small.  then you forget about the floppy with that important data till it's long gone from current machines?).  With bigger / relateively cheaper drives, you can keep more data 'live' (more likely to get corrupted - the drive is spinning all the time, 1 bit changes and the file is SOL?).

I'm stuck in the house, snowbound, so a little cabin fever.  I'm up for the conversation if you are.

Dan - HTH means hopes this helps?  cute.  hadn't seen that before.
0
 
LVL 95

Assisted Solution

by:John Hurst
John Hurst earned 142 total points
ID: 39809204
my current QuickBooks file are on the same 1TB drive

QuickBooks (or any other like financial system) is a VERY different question. There is NO guarantee that QB version 2025 will read your old file. People have difficulty now trying to upgrade a QB V2002 file to QB V2013 or 2014.

So if you wish to archive QB data, open the ledger, and save General Ledgers and Trial Balances.

Alternatively, purchase the new version of QB each year and upgrade the ledgers as you go along. I do this and it works just fine.

.... Thinkpads_User
0
 
LVL 35

Assisted Solution

by:Dan Craciun
Dan Craciun earned 143 total points
ID: 39809213
I think it's the first time me and Joe openly disagree ... somewhat :)

I did say pdf was the most "comfortable" format. BTW, you can OCR directly from inside Acrobat (Tools->Recognize Text), regardless of when you scanned the original document.

But, coming from a design background, if the appearance of the original document is important then I don't trust lossy compression. Actually png are ideal for archiving because they keep every pixel as you scanned it. The downside is that they are usually bigger than jpeg's, but not if you're keeping in 8 colors.

It's about the same conversation as what's best for audio archiving: mp3's or flac? I would always go for flac, simply because I can recreate the source with 100% fidelity.

Yup, HTH means hope this helps. I borrowed it for about 10 years now, from a message board somewhere :)
0
 
LVL 54

Accepted Solution

by:
Joe Winograd, EE MVE 2015&2016 earned 215 total points
ID: 39809219
Yes, just make sure you keep moving all of the docs to your new computer and you'll be fine. The danger is if you archive them to external media and then that external media becomes obsolete with no devices around that will read it. But as long as the docs stay on your latest-and-greatest BelchFire 9000, you're good.
0
 

Author Comment

by:BeGentleWithMe-INeedHelp
ID: 39809221
as I type I am scanning to PDF.  Don't need / want to go through the trouble of OCR - there's some proofreading / checking you need to do to see that it got it right? It's hard enough getting the time to scan, let alone look at each doc.

2 other questions I have to post today:

what IS a good program for organizing pics and docs to make them easy to search.  (as I say I don't want to do OCR).  But pics aren't going to do OCR anyway. want to be able to search on keywords that have check boxes (don't want to have to worry with free form keywords, I sometimes type WDW other times Walt Disney World, etc.  Just set a keyword? 'WDW' and then I'll see that in a list of words as I view a doc / picture and can check that box.

http://www.experts-exchange.com/Software/Photos_Graphics/Images_and_Photos/Q_28348236.html
And PDFs - as much as it's a standard, why do some PDF writers make a 20kb file vs. other apps would make a 100kb file for the same document / image?  which is best.

http://www.experts-exchange.com/Web_Development/Document_Imaging/Adobe_Acrobat/Q_28348235.html
0
 

Author Comment

by:BeGentleWithMe-INeedHelp
ID: 39809222
Oh, I still have the Belchfire 8000. Have to upgrade : )
0
 

Author Closing Comment

by:BeGentleWithMe-INeedHelp
ID: 39809230
thinkpad - YES!  quickbooks would be upgraded every couple years.  I used a bad example  there - just describing actively used data vs. things I scan and forget about.

Yeah, anyone have VisiCalc files they need opened?!
0
 
LVL 95

Expert Comment

by:John Hurst
ID: 39809233
For organizing files, I use Windows Explorer. I have developed file categorization that works fine with Explorer.   If you do not scan Searchable, then you can only scan for file name in most cases.

Why do you get different sizes?  You asked that as another question and I answered there. It depends entirely on the scanner (settings being equal). Different scanners (Xerox and HP say) produce different PDF sizes.

.... Thinkpads_User
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

PDF files have been in the limelight due to its unmatched features.  Personal documents, emails, business reports and eBooks are all converted into PDF files owing to peerless features provided by it. Adding watermark to a PDF file is a method to se…
*Adobe Acrobat 9 was used for this article.  Particular steps may vary depending on software versions. Adobe Acrobat has many, many variables that my be utilized to customize your forms for clarity and ease of use. The Form Editing Tool will be y…
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question