SQL Server 2005 ability and reporting with extreme normalilization of information.
Posted on 2011-03-24
Initially this web application was developed for internal use. Their requests now are to get out reports. As well as they are starting to launch this out a service.
I was hired in to re-architect it into ..NET from classic ASP.
So I have a sql server that has a table with 50 million rows for about 5 years of data.
This table houses the answers for every single question on an advanced web form.
Depending on the answers you can have a minimum of 183 rows added to this table PER single form saved. And this can grow or shrink because every entered field on the form is created dynamically from what was entered by the user.
So one minute it can have 183 fields they fill in and then go to 210 then goto 190. All answers must be stored regardless if they are filled in.
Basic query takes 11 seconds - My Development Box 8 GB RAM 7200 HD and QUAD core processor. So no competition issues for processing power.
select YEAR(b.prStart), count (YEAR(b.prStart))
from PatRecs b
inner join Answers c on c.datapr = b.prid
inner join Users d on d.userid = c.datauser
group by YEAR(b.prStart), d.userlogo
order by YEAR(b.prStart)
Now I have put indexes on prid (clustered primary key), datauser nonclustered index, datapr clustered index with nothing else associated with it.
Our current live sever has 4 GB, windows 2003, sql server 2005, XEON 2.53 , raid array.
I am debating if I need to unnormalize some of this form. ( I think I am using the right term.)
And instead of doing answers by row, put everything that needs to be static into columns and then nromalize the multiple answers that can happen.
The main thing I am worried about is performance in respects to normal growth, accssing data recors and in respects to producing reports, totaling, ect..
50 million to me is alot of rows for 4 companies. That means the the major company which is mine owns about 32 million of those rows over 5 years.
Which means small to mid-sized companies like us might need a minimum 6 million rows a year.
So, 100 companies could possibly add 600+ million rows to this table a year.
So, am I worrying over nothing? Should I be thinking all can be solved through just throwing a machine(s) at it? If so what would the spec of that machine(s) be that will allow us to grow to that type of size. Something reasonable? Dual Xeon Processors, Raid 5 array with 4 15K HD (can grow to 6), 8 GB ram to start allowing it to go upto 2 TB?
Your help would be great be appreciated,
PS I have thought about doing archiving. But still 600 Million records in one year is pretty big. for hundred companies, and they are wanting to make this national. Also by law must keep a certain number of years available because it is the healthcare industry.