Solved

Electronic Documents storage and database in SQL server

Posted on 2013-06-03
9
282 Views
Last Modified: 2013-06-07
We have a big project coming on which is about storing physical documents, papers, could be blueprints, typed documents with hand add notations, etc, in electronic format(they will be scanned) and made available trough a database and then application on top of it, web services, etc. The format will be originally in TIFF but from there we will have to come up with a more suitable format like PDF, PNG or something else. We are a Microsoft shop and we use SQL Server 2008 and 2012 and in the end the files will reside on a network location but they will be available through database tables that will point to each document. We are talking about tens of millions of documents.

The question is, actually kind of more, what would be the best way of doing this? What format to use and if the new file stream feature introduced in SQL2008 is recommendable? What are the benefits of file streaming, if any?

I thought that some of the experts may have already experience with similar situations.

Thank you in advance.

Zberteoc.
0
Comment
Question by:Zberteoc
  • 2
  • 2
  • 2
  • +2
9 Comments
 
LVL 38

Expert Comment

by:Aaron Tomosky
ID: 39217921
I've always used SQL to store the path of the file In a varchar. With tens of millions of records there is no reason to make your db also serve up files.
0
 
LVL 38

Assisted Solution

by:Jim P.
Jim P. earned 150 total points
ID: 39217924
Ok, you have the most important part specced out already: NEVER store binary data in a database.

My last company, a bank, used Hyland's SW called OnBase to do it.

The data files were in the 600GB range and the DB in the 80GB range a few years ago. Access times weren't bad. The VAR we worked with was good about getting us the metatag forms quickly along with the OCR for our custom forms.

I'm just commenting from experience of the one company that I worked with.

And just to reinforce the most important thing: You should NOT store binary data in a database. Our CIO built an Access database to just store the IT images and scans in the DB. It took 2 months of steady work to unload all the crap and rebuild his system after he retired.
0
 
LVL 29

Assisted Solution

by:Rich Weissler
Rich Weissler earned 125 total points
ID: 39219120
I have to concur with the other experts who have commented so far.  Even Sharepoint, which stores the binary files in the database by default runs into some technical roadblocks fairly quickly, and there are third party vendors who provide external BLOB storage.  Folks are strongly discouraged from Filestreams, and I've not heard anyone indicate that Filetables are better.

The benefits of Filestreams or internal BLOB storage (simplification and ease of keeping components in sync) are far outweighed by problems in scalability.
0
 
LVL 28

Expert Comment

by:Ryan McCauley
ID: 39219596
I've not seen the benefits of using File Stream when we attempted to roll it out - yes, the files are stored outside the database, but they're still part of the backups and still appear inline in the table when you query it, so there didn't really seem to be any advantage.

That said, I'd put them on a file share on your network and then store the network path. I've seen folder browsing get problematic when there are more than about 10,000 files in a single folder, so you may want to consider some kind of sub-folder situation. If you're never browsing it manually, then don't worry about it, but even something like "26 sub folders, one for each letter, and then use the first character of the filename to put it in the relevant folder" might help. Though if you're talking about 10's of millions of documents anyways, then I don't know that this extra level matters much at all.

I'd shy away from saying NEVER store binary files in the database - I've seen it done without issue a number of times - but the other experts are right that it can lead to scalability concerns pretty quickly, and when other options are available, it should generally be avoided.
0
NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

 
LVL 38

Expert Comment

by:Jim P.
ID: 39219635
I'd shy away from saying NEVER store binary files in the database - I've seen it done without issue a number of times - but the other experts are right that it can lead to scalability concerns pretty quickly, and when other options are available, it should generally be avoided.

Ok, so I may have overstated it a bit. ;-) I can see it for something like company logos and letterheads. The pictures of just the CxOs for the annual report. The standard instruction PDF for forms, maybe. Doing it for declared document imaging system -- No. Really bad idea.
0
 
LVL 26

Author Comment

by:Zberteoc
ID: 39219691
Thank all of you for your responses. I agree that documents should be kept outside the database. I was mainly curios if the new FILESTREAM feature in SQL 2008, which keeps the files in the system files but links them to the table and row,  is worth using. I can't see the immediate benefits in order to switch from folder and file name storing in a varchar column in a table.  

Also a word about the format would be appreciated. Wen it comes to blueprints, such as for buildings, photos plus regular written and/or typed documents, what format would be preferable? We thought of making all PDF after scanned.
0
 
LVL 38

Assisted Solution

by:Aaron Tomosky
Aaron Tomosky earned 50 total points
ID: 39219810
pdf is better than tiff only if it's turned into vector (ocr). If it's just a tiff embedded in a pdf that still helps it display correctly but the filesize stays large.
0
 
LVL 28

Accepted Solution

by:
Ryan McCauley earned 175 total points
ID: 39219881
It's fine on a small scale - even something like a website where images are stored in the database is okay, though I'd probably shy away from even that. Agreed, though, that it's a terrible choice as a large-scale document management system. I'm not clear on the value added by Filestream, actually - even after implementing it for something because it seemed like a good idea at the time that would address the same types of concerns the OP has here, we ran into issues that seemed exactly as if I'd just stored them in tables as an inline VARBINARY(MAX). Maybe there's some special use case that I'm not aware of.

Agreed on the PDF - it might be helpful for consistency's sake, but you won't see any file size savings. Where PDF shines is where it's a container for actual content, like drawings or text - if you're just using it to hold images, you won't see much (if any) savings.
0
 
LVL 26

Author Comment

by:Zberteoc
ID: 39229450
Thank you all for your valuable input. I realize this is a general question with no really wrong answer so I will split the points.
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I previously wrote an article addressing the use of UBCD4WIN and SARDU. All are great, but I have always been an advocate of SARDU. Recently it was suggested that I go back and take a look at Easy2Boot in comparison.
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…

896 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now