Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Handling large portions of data in SQL Server

Posted on 2011-02-13
4
Medium Priority
?
386 Views
Last Modified: 2012-05-11
Hi,

we are trying to implement an archiving system that keeps up to 15 years of data. The data will be imported daily through files, each of them is 2 GB.

-      I am here concerned with the performance issues that may result during retrieving this data
-      This data is going to be transformed from files to relational DB in order to facilitate the reporting and the retrieval functionalities. each file will include 1 million - 3 million of records
-      15 users will connect to this data, in a parallel manner, in order to perform various search queries
-      No summary reports (statistical) is requested

Is it proper to use SQL Server to handle this? What about the Hardware specifications that should be considered? What is the best techniques that should be considered when structuring this data?

Thanks
0
Comment
Question by:asim252
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 51

Accepted Solution

by:
Ted Bouskill earned 1000 total points
ID: 34885240
Well if you can normalize the data it will help tremendously.  I took a metrics database that had over 100 million records and was about 100 GB. and by normalizing I got it down to less then 5 GB with no data loss.  Of course at a smaller size there is less disk I/O to retrieve records and it's less expensive to add more indexes so all the queries were substantially faster.  At that size I could snapshot the database to give directly to some people to analyze the data on their laptop.

If you are aggregating daily sums then create a simply data mart with daily totals but don't denormalize all the detailed data for because then you counter-act the benefit of the normalization.

For example, I had a data mart that stored daily metrics for bug counts (I used to work at a company with thousands of developers) and the datamart tables, had the day, product ID, number of new bugs, fixed bugs, closed bugs (and 2 or three others) but no other detail.  Then if a drill down occurred we could drill into other datamart tables (bugs per team) or even the raw data itself.

Without knowing more we can't predict the type of hardware required.
0
 
LVL 38

Assisted Solution

by:Jim P.
Jim P. earned 1000 total points
ID: 34885371
Another question is how much has to be online?

We only needed the last three months online and the rest could be stored offline. So we moved the last years data to a secondary DB in March, backed it up and then from there dumped it to CD/DVD. Three copies are stored -- one onsite, the other at our DR offsite, the third in Iron Mountain.

Another thought -- would partitioning help?
0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 34889080
I believe that partitioning would be of tremendous help.  You could partition based upon the years (CurrentYear, CurrYrMinus1, CurrYrMinus2, etc.)  there are some scripts on line that could be adapted to automatically riple the partitions (removing the CurrYrMinus14).  

I also agree that, to the extent you can normalize the data, you probably should.  Normallizing the data should provide space and speed benefits, especially if you are going to provide some sort of web (intra- or inter-net) accessibility.
0
 
LVL 38

Expert Comment

by:Jim P.
ID: 34889248
You could partition based upon the years (CurrentYear, CurrYrMinus1, CurrYrMinus2, etc.)

Depending on the amount of data -- I would suggest doing it quarterly or monthly. It shouldn't significantly increase overhead significantly.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

by Mark Wills Attending one of Rob Farley's seminars the other day, I heard the phrase "The Accidental DBA" and fell in love with it. It got me thinking about the plight of the newcomer to SQL Server...  So if you are the accidental DBA, or, simp…
Introduction: When running hybrid database environments, you often need to query some data from a remote db of any type, while being connected to your MS SQL Server database. Problems start when you try to combine that with some "user input" pass…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question