MS SQL 2014 Partitioning Data

I am looking at partitioning some tables within a database based on an identifier within the tables. Looking at one table in isolation that contains Transactions I want to partition it based on blocks of customers. e.g. from ID 0 to 5000, 5001 to 10000, 10001 to 15000, etc. I have created a FileGroup called TransactionData and have partitioned everything into 20 ranges with the last partition containing anything outside the ranges. This appears to have moved data out of the .mdf file into the new .ndf file for the transactions.

My question is: Would it be better to have multiple .ndf files for each partition or is it fine to have just one containing all partitions relating to transactions?

At the moment its configured as:

TransactionData (FileGroup) = Partition1, Partition2, Partition3, Partition4, etc

Is there any advantage to this:

TransactionData = Partition1
TransactionData1 = Partition2
TransactionData2 = Partition3
TransactionData3 = Partition4

Is there any IO benefit or Loss with SQL in having more .ndf files containing smaller volumes of partitioned data?

The transaction data in total is around 10GB if that helps.
LVL 2
Mr KnackeredAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ste5anSenior DeveloperCommented:
What is your use case of that table? How does the typcially access looks like (query conditions)?

10GB? Not bigger? Then I don't see any need for paritioning the table.. Why do you want to partition it?
0
Mr KnackeredAuthor Commented:
Data is accessed mainly by CustomerID and transactions are pulled based on date ranges but always for specific CustomerID's. So fetching transactions between two dates for a CustomerID would be typical.

The current dataset being worked on is around 10GB for trans but this will be considerably larger based on different datasets. My thinking is that the Query Analyser would be able to fetch the data quicker if the table was partitioned say into 10 blocks based on ranges of CustomerID's. If the CustomerID was in partition 10 which held 2 million rows it would be quicker to fetch than in a unpartitioned table containing 20 million records. Also if the partitioned data was held in a separate FileGroup this could be held on a different disk array to the main database which might provide better performance.

Is it worth looking at partitioning data in this way or will it only be beneficial if the volumes are in the 100's of millions of records?
0
ste5anSenior DeveloperCommented:
hmm, what do you mean by:

The current dataset being worked on is around 10GB for trans but this will be considerably larger based on different datasets
?

The only advantage you will gain - I see so far - are smaller indices. Cause due to the CustomerID base access, only one partition would be used per query.

Your basic thinking is correct, physical reads from different spindles will give you some IO benefit. But you don't need a partitioned table for that. In this case moving the table would be sufficient.

But it's hard to tell. But to come back to your orginal questions..

Would it be better to have multiple .ndf files for each partition or is it fine to have just one containing all partitions relating to transactions?

Having one file per partition makes sense. Cause it allows a more flexible backup and recovery strategy. Especially when the transactions tables gets really big. And write access could benefit from it.

Is there any IO benefit or Loss with SQL in having more .ndf files containing smaller volumes of partitioned data?

I'm not aware of IO benefit as long as those partitions are on the same spindle. There are benefits in the internal file structure handling, but imho they are to small for normal data. E.g. spreading tempdb over at least four physical files even on the same spindle avoids some lock situations which arise due to object handling (create, delete). But I have not seen this having an effect in normal situtations.

And now for my doubts:
A transaction table should be organized by time. Cause the nomal access leads to older data read less. Thus having such partitions will lead to having the hot data in the buffer pool when the server RAM is large enough and the partitions are small enoug.

Also accessing transaction data by customer has for me the smell of a dataware house scenario. I had such kind of problems in the past. Here is a model review indicated. Maybe your use-cases have a real DW aspect. Then it should be modeled into an appropriate DW schema. Then the access to the transaction table is mitigated by using to those tables.
0
Scott PletcherSenior DBACommented:
Overall I believe you're thinking along the right general lines.

However, what is most critical for performance is the best clustered index on the table.  Since this table is so large, you will also likely partition on the clustering key, based on customer id ranges.  But the real key to performance is actually the underlying clustering.  You wouldn't even have to partition to get the performance gain, just cluster properly.

In this case, since it looks like you have a huge number of individual customer ids, you have a key clustering decision to make.  For SELECT performance, you want to cluster by ( customer_id, transaction_date ).  However, that will cause INSERTs to occur all over the table.  I'd still use that option, and reduce the FILLFACTOR some to allow for INSERTs, esp. if your insert activity is spread roughly evenly across all customers.

Alternative, you could cluster by ( transaction_date, customer_id ), but SELECT performance would still be very poor for anything but a very small date range.  It will reduce fragmentation on INSERTs, but that might be small consolation since the queries would still be deadly slow.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.