?
Solved

Best Practice SQL2005 table design for document images

Posted on 2009-04-23
13
Medium Priority
?
501 Views
Last Modified: 2012-05-06
We have been developing a number of in-house document management solutions over the last couple of years and are very comfortable with the storing of documents in the SQL2005 image field type.

However we are about to extend our solution to a department that recevies / generates documents at a volume that is at least a magnitude above what we have been accomodating to date.

I am interested to see if someone can recomend some best practice table design guidelines for the storing of large volumes of records with image data.

For each document record we will have a single image field and a number of fields used to categorise and index the document (doc title, refenernce number, source, notes, etc).

My question is; anticipating that we will have high volumes of documents should I look to design my table structures to split the document table into two seperate tables with a one to one relationship, one with the document attributes, and one with just the document image?

Would this help with index maintenance and storage if I were to place the "Document Image" table on a seperate file group?

Is the TEXTIMAGE_ON option on the Create Table statement good enough for achieving the same thing?

Is there perhaps a better design approach that someone is aware of?

We are utilising SQL2005 Standard, there are no immediate plans to move to SQL2008.
Thanks in advance!
0
Comment
Question by:BespokePaul
  • 4
  • 4
  • 2
  • +1
11 Comments
 
LVL 22

Expert Comment

by:8080_Diver
ID: 24220590
SQL Server 2005 has an improvement over SS2000 when it comes to handling embedded images such as you are looking at.  SQL Server can automatically handle the data in a manner that SQL Server had to have application assistance with . . . you actually store the image on a file server and a pathed filename to that location in the database.  This lets you have the best of both worlds, in a sense.  By storing the data in this manner, your table isn't bulked up beyond belief, yet you have access to the data as though the image is actually in the table.
Think about it for a moment, if you have, say, 1,000,000 entries and each entry has a 3 megabyte image in it, you just consumed 3 million megabytes of storage and you don't have an index or any other data yet.
Look into the way SQL Server 2005 handles the VarBinary(MAX) columns and look into the settings for the manner in which it is stored.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24220893
>>SQL Server 2005 has an improvement over SS2000 when it comes to handling embedded images such as you are looking at. <<
I suspect you are talking about the FILESTREAM data type, but that is a SQL Server 2008 feature.  The hybrid version you are referring to did not exist in SQL Server 2005.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24220897
For information on the new SQL Server 2008 data types see here:
SQL Server 2008: The New Data Types
http://www.simple-talk.com/sql/learn-sql-server/sql-server-2008-the-new-data-types/
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 

Author Comment

by:BespokePaul
ID: 24221041
THanks everyone, I am aware of the new filestream data type in MS SQL2008. however it is not an option at this point as we are restricted for the near term to MS SQL 2005.

What I am trying to determine is the best way to struct the image fields/tables so that they can be easily scaled and also to potentially improve the SQL DB engines ability to search the index fields without having the large image files "gumming up" the works.
0
 
LVL 11

Expert Comment

by:Amila Hendahewa
ID: 24221682
I also agree wth 8080_Diver:
what we do is to store the images in a file server and store the image path in the database. We have two tables one with index data and the doc ID (a unique identifier) other with the image path and doc ID. With this method we can even use SQL express to handle high volumes.
0
 
LVL 22

Accepted Solution

by:
8080_Diver earned 2000 total points
ID: 24224477
@acperkins,
After I posted that, I was doing some more reading and I realized that I had been reading about SS2008.  Sorry about that. :-/
@BespokePaul,
As I pointed out originally, I do not recommend embedding the images in the table.  I am currently having to work with that in one table and the filepath linking in another (later) table and I can tell you thatthere is a major performance difference . . . and these are both on SS2005.  Database backups take longer, accessing the data, once you have identified it, takes longer.  There is major database bloating.
However, if you feel absolutely compelled to go this route, then make sure that your indexes as covering indexes (i.e. INCLUDE all of the columns other than the image) and I would use an Identity column as your PK so that your data is physically sorted by that and not by any other fields.  I think, at that point, the manner in which the data that is actually in the table is stored is not going to matter a whole lot.  (If your doing a back up of the database, it all is going to be backed up, whether you have sgregated the image column or not. ;-)
I think the major point to consider is the impact handling the data in this manner is going to have on both the database and the network.  Unless you set up your indexes carefully, there will come a point where you will be accessing the table itself and, when you do, you will be pumping a lot of data out of the database and across the network for every record you touch.  This puts a greater load on the database server, the network, and anything else that is involved.
In the two tables I am dealing with, we ahve one that is set up as you are looking at setting yours up.  The other one simply passes around the pathed filename for the image which is stored on a separate fileserver.  the load on the database server is significantly different for accesses to the two tables.  The load on the network also is different because the transfer of the image is from a different server and, apparently, takes a slightly different route that doesn't clog the pipeline that is used in accessing the database server.  (One of the things our network guys apparently got right. ;-)
0
 

Author Comment

by:BespokePaul
ID: 24363481
Thank you for all the valuable comments.

Backups are not an issue for us, neither is network capacity. My biggest concern is interactive performance. While I recognise the benefits in storing documents outside of the database, I personally do not like that solution for a number of reasons.

I have applied normal best practice for table design, and understand about how to tune indexes. My question was directed towards how it is best to store images within a database table considering options like partitioning and using differing fiels groups etc.

As an interesting note for the group, I did discover a table hint that when used on creation automatically stores image files in a seperate named filegroup. I am going to use this approach for my solution as it will keep the files that store all the indexing data of the image and actual database indexes seperate from the large images.
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 24364484
I would raise the point that the request for "Best Practices" does not necessarily mean that one's plan has to be included in the "Best Practices".  I believe that, in general, there would be agreement that the "Best Practice" for handling images in SS2005 (and SS2000) would be to not store them in the database.  
The fact that the requested "Best Practices" contradict the OP's desires doesn't mean that they are not "Best Practices".
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24365462
>>I believe that, in general, there would be agreement that the "Best Practice" for handling images in SS2005 (and SS2000) would be to not store them in the database.<<
Absolutely.  Only in very rare cases should the database be used to store images.  It is just a nightmare ready to happen.  Unfortunately you only learn this with experience.  On the bright-side, if you ever implement it in the database you will become a very stong advocate for saving binary data, where it belongs:  In the file system.
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 24365544
Amen! to that! ;-)
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24402978
3) Award points to 8080_Diver #24224477
0

Featured Post

Configuration Guide and Best Practices

Read the guide to learn how to orchestrate Data ONTAP, create application-consistent backups and enable fast recovery from NetApp storage snapshots. Version 9.5 also contains performance and scalability enhancements to meet the needs of the largest enterprise environments.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

PaperPort is a popular document imaging/management product from Nuance Communications (http://www.nuance.com/). It is in widespread use by both individuals (http://www.nuance.com/for-individuals/by-product/paperport/index.htm) and businesses (http:/…
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
In this video, we show how to convert an image-only PDF file into a PDF Searchable Image file, that is, a file with both the image (typically from scanning) and text, which is created in an automated fashion with Optical Character Recognition (OCR) …
This video Micro Tutorial is the first in a two-part series that shows how to create and use custom scanning profiles in Nuance's PaperPort 14.5 (http://www.experts-exchange.com/articles/17490/). But the ability to create custom scanning profiles al…

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question