Solved

Handling large portions of data in SQL Server

Posted on 2011-02-13
4
366 Views
Last Modified: 2012-05-11
Hi,

we are trying to implement an archiving system that keeps up to 15 years of data. The data will be imported daily through files, each of them is 2 GB.

-      I am here concerned with the performance issues that may result during retrieving this data
-      This data is going to be transformed from files to relational DB in order to facilitate the reporting and the retrieval functionalities. each file will include 1 million - 3 million of records
-      15 users will connect to this data, in a parallel manner, in order to perform various search queries
-      No summary reports (statistical) is requested

Is it proper to use SQL Server to handle this? What about the Hardware specifications that should be considered? What is the best techniques that should be considered when structuring this data?

Thanks
0
Comment
Question by:asim252
  • 2
4 Comments
 
LVL 51

Accepted Solution

by:
Ted Bouskill earned 250 total points
ID: 34885240
Well if you can normalize the data it will help tremendously.  I took a metrics database that had over 100 million records and was about 100 GB. and by normalizing I got it down to less then 5 GB with no data loss.  Of course at a smaller size there is less disk I/O to retrieve records and it's less expensive to add more indexes so all the queries were substantially faster.  At that size I could snapshot the database to give directly to some people to analyze the data on their laptop.

If you are aggregating daily sums then create a simply data mart with daily totals but don't denormalize all the detailed data for because then you counter-act the benefit of the normalization.

For example, I had a data mart that stored daily metrics for bug counts (I used to work at a company with thousands of developers) and the datamart tables, had the day, product ID, number of new bugs, fixed bugs, closed bugs (and 2 or three others) but no other detail.  Then if a drill down occurred we could drill into other datamart tables (bugs per team) or even the raw data itself.

Without knowing more we can't predict the type of hardware required.
0
 
LVL 38

Assisted Solution

by:Jim P.
Jim P. earned 250 total points
ID: 34885371
Another question is how much has to be online?

We only needed the last three months online and the rest could be stored offline. So we moved the last years data to a secondary DB in March, backed it up and then from there dumped it to CD/DVD. Three copies are stored -- one onsite, the other at our DR offsite, the third in Iron Mountain.

Another thought -- would partitioning help?
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34889080
I believe that partitioning would be of tremendous help.  You could partition based upon the years (CurrentYear, CurrYrMinus1, CurrYrMinus2, etc.)  there are some scripts on line that could be adapted to automatically riple the partitions (removing the CurrYrMinus14).  

I also agree that, to the extent you can normalize the data, you probably should.  Normallizing the data should provide space and speed benefits, especially if you are going to provide some sort of web (intra- or inter-net) accessibility.
0
 
LVL 38

Expert Comment

by:Jim P.
ID: 34889248
You could partition based upon the years (CurrentYear, CurrYrMinus1, CurrYrMinus2, etc.)

Depending on the amount of data -- I would suggest doing it quarterly or monthly. It shouldn't significantly increase overhead significantly.
0

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

There are some very powerful Data Management Views (DMV's) introduced with SQL 2005. The two in particular that we are going to discuss are sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats.   Recently, I was involved in a discu…
by Mark Wills PIVOT is a great facility and solves many an EAV (Entity - Attribute - Value) type transformation where we need the information held as data within a column to become columns in their own right. Now, in some cases that is relatively…
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

820 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question