Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Handling large portions of data in SQL Server

Posted on 2011-02-13
4
Medium Priority
?
394 Views
Last Modified: 2012-05-11
Hi,

we are trying to implement an archiving system that keeps up to 15 years of data. The data will be imported daily through files, each of them is 2 GB.

-      I am here concerned with the performance issues that may result during retrieving this data
-      This data is going to be transformed from files to relational DB in order to facilitate the reporting and the retrieval functionalities. each file will include 1 million - 3 million of records
-      15 users will connect to this data, in a parallel manner, in order to perform various search queries
-      No summary reports (statistical) is requested

Is it proper to use SQL Server to handle this? What about the Hardware specifications that should be considered? What is the best techniques that should be considered when structuring this data?

Thanks
0
Comment
Question by:asim252
  • 2
4 Comments
 
LVL 51

Accepted Solution

by:
Ted Bouskill earned 1000 total points
ID: 34885240
Well if you can normalize the data it will help tremendously.  I took a metrics database that had over 100 million records and was about 100 GB. and by normalizing I got it down to less then 5 GB with no data loss.  Of course at a smaller size there is less disk I/O to retrieve records and it's less expensive to add more indexes so all the queries were substantially faster.  At that size I could snapshot the database to give directly to some people to analyze the data on their laptop.

If you are aggregating daily sums then create a simply data mart with daily totals but don't denormalize all the detailed data for because then you counter-act the benefit of the normalization.

For example, I had a data mart that stored daily metrics for bug counts (I used to work at a company with thousands of developers) and the datamart tables, had the day, product ID, number of new bugs, fixed bugs, closed bugs (and 2 or three others) but no other detail.  Then if a drill down occurred we could drill into other datamart tables (bugs per team) or even the raw data itself.

Without knowing more we can't predict the type of hardware required.
0
 
LVL 38

Assisted Solution

by:Jim P.
Jim P. earned 1000 total points
ID: 34885371
Another question is how much has to be online?

We only needed the last three months online and the rest could be stored offline. So we moved the last years data to a secondary DB in March, backed it up and then from there dumped it to CD/DVD. Three copies are stored -- one onsite, the other at our DR offsite, the third in Iron Mountain.

Another thought -- would partitioning help?
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34889080
I believe that partitioning would be of tremendous help.  You could partition based upon the years (CurrentYear, CurrYrMinus1, CurrYrMinus2, etc.)  there are some scripts on line that could be adapted to automatically riple the partitions (removing the CurrYrMinus14).  

I also agree that, to the extent you can normalize the data, you probably should.  Normallizing the data should provide space and speed benefits, especially if you are going to provide some sort of web (intra- or inter-net) accessibility.
0
 
LVL 38

Expert Comment

by:Jim P.
ID: 34889248
You could partition based upon the years (CurrentYear, CurrYrMinus1, CurrYrMinus2, etc.)

Depending on the amount of data -- I would suggest doing it quarterly or monthly. It shouldn't significantly increase overhead significantly.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently, when I was asked to create a new SQL 2005 cluster, Microsoft released a new service pack for MS SQL 2005 what is Service Pack 3. When I finished the installation of MS SQL 2005 I found myself troubled why the installation of SP3 failed …
by Mark Wills PIVOT is a great facility and solves many an EAV (Entity - Attribute - Value) type transformation where we need the information held as data within a column to become columns in their own right. Now, in some cases that is relatively…
this video summaries big data hadoop online training demo (http://onlineitguru.com/big-data-hadoop-online-training-placement.html) , and covers basics in big data hadoop .
When cloud platforms entered the scene, users and companies jumped on board to take advantage of the many benefits, like the ability to work and connect with company information from various locations. What many didn't foresee was the increased risk…

578 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question