Choosing a clustered index fill factor

I have a new table that I am going populate with about 5 yrs of data.  The clustered index (and only index) will be on a datetime column. All data is is batch loaded during wee-hours when nobody is on the system. The data once populated will be very seldom need to be corrected, and after the initial load of historical data, very rarely will data need to be added in-between days.  The table will have up to several thousand rows added for any given day going forward.
Table row length totals 40 bytes.  Only about half of the days in a given year have any data at all, and when they do, it averages about 18000 rows.  If in the rare case of an update, the most likely scenario is that all of a day's data would be deleted and then reloaded.  So we want to optimize for query and for efficiency of storage.  My questions are:
1) what should my fill-factor be?
2) can I add my historical data ( by day, year etc) in any order or should I go oldest first or does it matter?
3) should load all my historical data before creating the index and then add it later?
4) once the historical data is loaded... should I drop the index, load a data and then and re-create it.. or just leave it set?
Any suggestions greatly appreciated!
Rich Schramm
richschrammAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Duane LawrenceCommented:
First I would suggest a partitioned table.

CREATE TABLE [dbo].[sales_fact_1999] (
   [date_key] datetime NOT NULL
CHECK ([date_key] BETWEEN '1999-01-01' AND '1999-12-31'),
   [product_key] [int] NOT NULL ,
   [customer_key] [int] NOT NULL ,
   [promotion_key] [int] NOT NULL ,
   [store_key] [int] NOT NULL ,
   [store_sales] [money] NULL ,
   [store_cost] [money] NULL ,
   [unit_sales] [float] NULL
)
go

-- Create the fact table for 2000
CREATE TABLE [dbo].[sales_fact_2000] (
   [date_key] datetime NOT NULL
CHECK ([date_key] BETWEEN '2000-01-01' AND '2000-12-31'),
   [product_key] [int] NOT NULL ,
   [customer_key] [int] NOT NULL ,
   [promotion_key] [int] NOT NULL ,
   [store_key] [int] NOT NULL ,
   [store_sales] [money] NULL ,
   [store_cost] [money] NULL ,
   [unit_sales] [float] NULL
)
go

--Create the UNION ALL view.
CREATE VIEW [dbo].[sales_fact]
AS
SELECT * FROM [dbo].[sales_fact_1999]
UNION ALL
SELECT * FROM [dbo].[sales_fact_2000]
go

--Now insert a few rows of data, for example:
INSERT INTO [sales_fact]
VALUES ('1999-01-25', 347, 8901, 0, 13, 5.3100, 1.8585, 3.0)

INSERT INTO [sales_fact]
VALUES ('1999-03-24', 576, 7203, 0, 13, 2.1000, 0.9450, 3.0)

INSERT INTO [sales_fact]
VALUES ('1999-06-04', 139, 7203, 0, 13, 5.3700, 2.2017, 3.0)

INSERT INTO [sales_fact]
VALUES ('2000-09-14', 396, 8814, 0, 13, 6.4800, 2.0736, 2.0)

INSERT INTO [sales_fact]
VALUES ('2000-11-13', 260, 8269, 0, 13, 5.5200, 2.4840, 3.0)


1. Let me look at this more
2. It should not matter, here is why "All data is is batch loaded during wee-hours when nobody is on the system"
3. Create the index when you create the table, here is why "All data is is batch loaded during wee-hours when nobody is on the system"
4. Leave the index in place, here is why "All data is is batch loaded during wee-hours when nobody is on the system"

Duane
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Duane LawrenceCommented:
1. I would specify a fill factor of 100 %

http://www.winnetmag.com/SQLServer/Articles/ArticleID/19851/pg/2/2.html

Duane
0
Scott PletcherSenior DBACommented:
1) 99-100
There are so few adds, there is no need for additional space in each data page; it would only be wasted, in fact.

2) definitely add the oldest first if possible.  Otherwise page splits will cause fragmentation and some wasted space and hurt performance.  If possible you should always clustered tables in cluster key order.

3) Only if you can't add the data in order.  That is, if the historical data to be added is sorted in cluster key order before being added, then you can create the clustered key ahead of time and during the load notify SQL that the data is in order.  Otherwise, don't add the clus index until after the load.  Btw, don't add any other indexes, if any, before adding the clus index; always build a clus index first and drop it last.

4) Just leave it in place.  Once again, it's best if you can add values in cluster key order and notify SQL that you are doing so.  This [almost] prevents page splits.
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

richschrammAuthor Commented:
I am curious as to the advantage of taking the partitioned table approach as suggested by Duane above. - What does it buy me and what is the cost of the view/union query across (eventually) 10 or 15 tables as opposed to  a single very large base table?
0
richschrammAuthor Commented:
Implemented the partitioned table solution. Its working perfectly. Thanks to all
0
Scott PletcherSenior DBACommented:
>>  What does [partitioning] buy me and what is the cost of the view/union query across (eventually) 10 or 15 tables as opposed to  a single very large base table? <<

It depends.  If they all are on the same drive, it doesn't buy you nearly as much.  If  you can put each of the partitions in a separate file group, and put each of those filegroups on a separate drive/array, then it buys you a *lot*.

Btw, you need to make sure the partitioning is working correctly.  Run a query that selects a few rows from the view but only for one partition and make sure the query plan references only that partition.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Storage Software

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.