SQL Server 2005 ability and reporting with extreme normalilization of information.

Posted on 2011-03-24
Last Modified: 2012-05-11
Initially this web application was developed for internal use.  Their requests now are to get out reports.  As well as they are starting to launch this out a service.

I was hired in to re-architect it into ..NET from classic ASP.

So I have a sql server that has a table with 50 million rows for about 5 years of data.

This table houses the answers for every single question on an advanced web form.

Depending on the answers you can have a minimum of 183 rows added to this table PER single form saved.  And this can grow or shrink because every entered field on the form is created dynamically from what was entered by the user.

So one minute it can have 183 fields they fill in and then go to 210 then goto 190.  All answers must be stored regardless if they are filled in.

Basic query takes 11 seconds - My Development Box 8 GB RAM 7200 HD and QUAD core processor.  So no competition issues for processing power.

select YEAR(b.prStart), count (YEAR(b.prStart))
from PatRecs b
inner join Answers c on c.datapr = b.prid
inner join Users d on d.userid = c.datauser
group by YEAR(b.prStart), d.userlogo
order by YEAR(b.prStart)

Now I have put indexes on prid (clustered primary key), datauser nonclustered index, datapr clustered index with nothing else associated with it.

Our current live sever has 4 GB, windows 2003, sql server 2005, XEON 2.53 , raid array.

I am debating if I need to unnormalize some of this form. ( I think I am using the right term.)

And instead of doing answers by row, put everything that needs to be static into columns and then nromalize the multiple answers that can happen.

The main thing I am worried about is performance in respects to normal growth, accssing data recors and in respects to producing reports, totaling, ect..

50 million to me is alot of rows for 4 companies.  That means the the major company which is mine owns about 32 million of those rows over 5 years.

Which means small to mid-sized companies like us might need a minimum 6 million rows a year.

So, 100 companies could possibly add 600+ million rows to this table a year.

So, am I worrying over nothing? Should I be thinking all can be solved through just throwing a machine(s) at it?  If so what would the spec of that machine(s) be that will allow us to grow to that type of size.  Something reasonable?  Dual Xeon Processors, Raid 5 array with 4 15K HD (can grow to 6), 8 GB ram to start allowing it to go upto 2 TB?

Your help would be great be appreciated,

Angela Law

PS I have thought about doing archiving.  But still 600 Million records in one year is pretty big. for hundred companies, and they are wanting to make this national.  Also by law must keep a certain number of years available because it is the healthcare industry.
Question by:angelalaw
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 2

Expert Comment

ID: 35215357
Are you running SQL Server Enterprise Edition?  If so, I would definitely look at partitioning these tables.  Possibly by a date time field (such as question date or question month).  Also consider table/index compression to keep the size small,  especially for the older data.

Author Comment

ID: 35215442
We are locked in to sql server 2005 because of a main 3rd party software we are using.

Compression isn't available for 2005 I don't think.  Or am I wrong?

So, you think this definitely would be a problem if I left the design the way it currently is?

Expert Comment

ID: 35215565
You are correct, native compression isn't available in SQL 2005.  You would have to use a tool like RedGate SQL Storage Compress.

I think the design should work.  However, I would definitely consider implementing partitioning.
Another thought is do you have a method of pushing the archive data out to some sort of datawarehouse or archive database structure?
U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

LVL 75

Expert Comment

by:Anthony Perkins
ID: 35217126
>>However, I would definitely consider implementing partitioning.<<
That is provided they are using the Enterprise Edition as you alluded to before.  Even if they are using the Enterprise Edition, I would not consider 50 million rows as a big deal in a well designed database that would warrant partitioning.

Author Comment

ID: 35218842
Acperkins what is a well designed database?  PLease read below.  There are 3 main tables.

So what I am worried about is not the current 50 million rows I am worried about it expanding in nature and retrieving data for reporting.  Taking this applicaiton national.

Estimating 6 million rows added per company.  That means 100 companies gives us 600 million rows in one year for forms that are filled out.

Say we have to keep 5 years back worth of data available to our users.

Are you saying 600 million for one year or 3 billion for 5 years for 100 companies is not something to worry about as well?

Just to clarify currently the table PatRecs(128,000) ----One to Many relationship --> AnswersRecs(51 million) --> many to many ---> QuestionsRecs(1000)

So right now, doing any type of report querying that Joins PatRecs, QuestionRecs, and AnswerRecs  will take down the server, literally.

Even though the column that are used for the <on> part of the join are indexed.

I do appreciate your guys help.  I am not a DBA but I do know the results I am getting on my own machine.  And either we need to redesign the structure of the tables or we need some serious hardware to help.
LVL 75

Expert Comment

by:Anthony Perkins
ID: 35220373
So what SQL Server Edition are you using?

Author Comment

ID: 35220576
Edition = Standard Edition
Version =9.00.1406.00
LVL 75

Expert Comment

by:Anthony Perkins
ID: 35223194
Then you have no choice, partitioning is not supported.

Author Comment

ID: 35223706
sorry have no choice?

You mean I need to redo the way the 3 tables are designed and make it more static in nature for the web form instead of dynamic.

Just for the fun of it I did a query for all the answers for 3 months for my company.  It returned 2.25 million records in 6 minutes on my new development box and it consumed all computing resources to do it.  So if they wanted to get a three month total for various answers it would take that long and that many records to go through.

Sorry, if I am dragging on about this.  I am trying to get an idea of how big is big.

And if performance is an issue what is needed to resolve it.

Like with the 2.25 million records for one companies answers for 3 months.  It runs slow.  Can that be resolved with hardware? Upgrading To 2005 enterprise edition. (can't goto 2008)  If more companies where added thus more rows in that big table will the query go even slower?
LVL 75

Accepted Solution

Anthony Perkins earned 500 total points
ID: 35224717
>>sorry have no choice?<<
I am sorry I was not clear.  I meant partitioning was not an option, since the Standard Edition does not support that functionality.

Now what you have to do is focus on optimizing your queries.  This could mean re-designing your tables and indexes, but first of all you have to understand that returning 2.25 million rows is going to take a long time regardless of the hardware or design, you are limited not by the database, but simple network bandwidth.  That is why systems rarely return but a few hundred rows at a time, it just does not make sense to return 1000's of rows to the user, let alone millions.


Featured Post

SharePoint Admin?

Enable Your Employees To Focus On The Core With Intuitive Onscreen Guidance That is With You At The Moment of Need.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In SQL Server, when rows are selected from a table, does it retrieve data in the order in which it is inserted?  Many believe this is the case. Let us try to examine for ourselves with an example. To get started, use the following script, wh…
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
In this video we outline the Physical Segments view of NetCrunch network monitor. By following this brief how-to video, you will be able to learn how NetCrunch visualizes your network, how granular is the information collected, as well as where to f…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question