Solved

Electronic Documents storage and database in SQL server

Posted on 2013-06-03
9
281 Views
Last Modified: 2013-06-07
We have a big project coming on which is about storing physical documents, papers, could be blueprints, typed documents with hand add notations, etc, in electronic format(they will be scanned) and made available trough a database and then application on top of it, web services, etc. The format will be originally in TIFF but from there we will have to come up with a more suitable format like PDF, PNG or something else. We are a Microsoft shop and we use SQL Server 2008 and 2012 and in the end the files will reside on a network location but they will be available through database tables that will point to each document. We are talking about tens of millions of documents.

The question is, actually kind of more, what would be the best way of doing this? What format to use and if the new file stream feature introduced in SQL2008 is recommendable? What are the benefits of file streaming, if any?

I thought that some of the experts may have already experience with similar situations.

Thank you in advance.

Zberteoc.
0
Comment
Question by:Zberteoc
  • 2
  • 2
  • 2
  • +2
9 Comments
 
LVL 38

Expert Comment

by:Aaron Tomosky
ID: 39217921
I've always used SQL to store the path of the file In a varchar. With tens of millions of records there is no reason to make your db also serve up files.
0
 
LVL 38

Assisted Solution

by:Jim P.
Jim P. earned 150 total points
ID: 39217924
Ok, you have the most important part specced out already: NEVER store binary data in a database.

My last company, a bank, used Hyland's SW called OnBase to do it.

The data files were in the 600GB range and the DB in the 80GB range a few years ago. Access times weren't bad. The VAR we worked with was good about getting us the metatag forms quickly along with the OCR for our custom forms.

I'm just commenting from experience of the one company that I worked with.

And just to reinforce the most important thing: You should NOT store binary data in a database. Our CIO built an Access database to just store the IT images and scans in the DB. It took 2 months of steady work to unload all the crap and rebuild his system after he retired.
0
 
LVL 29

Assisted Solution

by:Rich Weissler
Rich Weissler earned 125 total points
ID: 39219120
I have to concur with the other experts who have commented so far.  Even Sharepoint, which stores the binary files in the database by default runs into some technical roadblocks fairly quickly, and there are third party vendors who provide external BLOB storage.  Folks are strongly discouraged from Filestreams, and I've not heard anyone indicate that Filetables are better.

The benefits of Filestreams or internal BLOB storage (simplification and ease of keeping components in sync) are far outweighed by problems in scalability.
0
 
LVL 28

Expert Comment

by:Ryan McCauley
ID: 39219596
I've not seen the benefits of using File Stream when we attempted to roll it out - yes, the files are stored outside the database, but they're still part of the backups and still appear inline in the table when you query it, so there didn't really seem to be any advantage.

That said, I'd put them on a file share on your network and then store the network path. I've seen folder browsing get problematic when there are more than about 10,000 files in a single folder, so you may want to consider some kind of sub-folder situation. If you're never browsing it manually, then don't worry about it, but even something like "26 sub folders, one for each letter, and then use the first character of the filename to put it in the relevant folder" might help. Though if you're talking about 10's of millions of documents anyways, then I don't know that this extra level matters much at all.

I'd shy away from saying NEVER store binary files in the database - I've seen it done without issue a number of times - but the other experts are right that it can lead to scalability concerns pretty quickly, and when other options are available, it should generally be avoided.
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 38

Expert Comment

by:Jim P.
ID: 39219635
I'd shy away from saying NEVER store binary files in the database - I've seen it done without issue a number of times - but the other experts are right that it can lead to scalability concerns pretty quickly, and when other options are available, it should generally be avoided.

Ok, so I may have overstated it a bit. ;-) I can see it for something like company logos and letterheads. The pictures of just the CxOs for the annual report. The standard instruction PDF for forms, maybe. Doing it for declared document imaging system -- No. Really bad idea.
0
 
LVL 26

Author Comment

by:Zberteoc
ID: 39219691
Thank all of you for your responses. I agree that documents should be kept outside the database. I was mainly curios if the new FILESTREAM feature in SQL 2008, which keeps the files in the system files but links them to the table and row,  is worth using. I can't see the immediate benefits in order to switch from folder and file name storing in a varchar column in a table.  

Also a word about the format would be appreciated. Wen it comes to blueprints, such as for buildings, photos plus regular written and/or typed documents, what format would be preferable? We thought of making all PDF after scanned.
0
 
LVL 38

Assisted Solution

by:Aaron Tomosky
Aaron Tomosky earned 50 total points
ID: 39219810
pdf is better than tiff only if it's turned into vector (ocr). If it's just a tiff embedded in a pdf that still helps it display correctly but the filesize stays large.
0
 
LVL 28

Accepted Solution

by:
Ryan McCauley earned 175 total points
ID: 39219881
It's fine on a small scale - even something like a website where images are stored in the database is okay, though I'd probably shy away from even that. Agreed, though, that it's a terrible choice as a large-scale document management system. I'm not clear on the value added by Filestream, actually - even after implementing it for something because it seemed like a good idea at the time that would address the same types of concerns the OP has here, we ran into issues that seemed exactly as if I'd just stored them in tables as an inline VARBINARY(MAX). Maybe there's some special use case that I'm not aware of.

Agreed on the PDF - it might be helpful for consistency's sake, but you won't see any file size savings. Where PDF shines is where it's a container for actual content, like drawings or text - if you're just using it to hold images, you won't see much (if any) savings.
0
 
LVL 26

Author Comment

by:Zberteoc
ID: 39229450
Thank you all for your valuable input. I realize this is a general question with no really wrong answer so I will split the points.
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Join & Write a Comment

Suggested Solutions

Use this article to create a batch file to backup a Microsoft SQL Server database to a Windows folder.  The folder can be on the local hard drive or on a network share.  This batch file will query the SQL server to get the current date & time and wi…
How to update Firmware and Bios in Dell Equalogic PS6000 Arrays and Hard Disks firmware update.
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now