Link to home
Start Free TrialLog in
Avatar of angelalaw
angelalawFlag for Afghanistan

asked on

SQL Server 2005 ability and reporting with extreme normalilization of information.

Initially this web application was developed for internal use.  Their requests now are to get out reports.  As well as they are starting to launch this out a service.

I was hired in to re-architect it into ..NET from classic ASP.

So I have a sql server that has a table with 50 million rows for about 5 years of data.

This table houses the answers for every single question on an advanced web form.

Depending on the answers you can have a minimum of 183 rows added to this table PER single form saved.  And this can grow or shrink because every entered field on the form is created dynamically from what was entered by the user.

So one minute it can have 183 fields they fill in and then go to 210 then goto 190.  All answers must be stored regardless if they are filled in.

Basic query takes 11 seconds - My Development Box 8 GB RAM 7200 HD and QUAD core processor.  So no competition issues for processing power.

select YEAR(b.prStart), count (YEAR(b.prStart))
from PatRecs b
inner join Answers c on c.datapr = b.prid
inner join Users d on d.userid = c.datauser
group by YEAR(b.prStart), d.userlogo
order by YEAR(b.prStart)

Now I have put indexes on prid (clustered primary key), datauser nonclustered index, datapr clustered index with nothing else associated with it.

Our current live sever has 4 GB, windows 2003, sql server 2005, XEON 2.53 , raid array.

I am debating if I need to unnormalize some of this form. ( I think I am using the right term.)

And instead of doing answers by row, put everything that needs to be static into columns and then nromalize the multiple answers that can happen.

The main thing I am worried about is performance in respects to normal growth, accssing data recors and in respects to producing reports, totaling, ect..

50 million to me is alot of rows for 4 companies.  That means the the major company which is mine owns about 32 million of those rows over 5 years.

Which means small to mid-sized companies like us might need a minimum 6 million rows a year.

So, 100 companies could possibly add 600+ million rows to this table a year.

So, am I worrying over nothing? Should I be thinking all can be solved through just throwing a machine(s) at it?  If so what would the spec of that machine(s) be that will allow us to grow to that type of size.  Something reasonable?  Dual Xeon Processors, Raid 5 array with 4 15K HD (can grow to 6), 8 GB ram to start allowing it to go upto 2 TB?

Your help would be great be appreciated,

Angela Law

PS I have thought about doing archiving.  But still 600 Million records in one year is pretty big. for hundred companies, and they are wanting to make this national.  Also by law must keep a certain number of years available because it is the healthcare industry.
Avatar of lundnak
lundnak

Are you running SQL Server Enterprise Edition?  If so, I would definitely look at partitioning these tables.  Possibly by a date time field (such as question date or question month).  Also consider table/index compression to keep the size small,  especially for the older data.
Avatar of angelalaw

ASKER

We are locked in to sql server 2005 because of a main 3rd party software we are using.

Compression isn't available for 2005 I don't think.  Or am I wrong?

So, you think this definitely would be a problem if I left the design the way it currently is?
You are correct, native compression isn't available in SQL 2005.  You would have to use a tool like RedGate SQL Storage Compress.

I think the design should work.  However, I would definitely consider implementing partitioning.
Another thought is do you have a method of pushing the archive data out to some sort of datawarehouse or archive database structure?
Avatar of Anthony Perkins
>>However, I would definitely consider implementing partitioning.<<
That is provided they are using the Enterprise Edition as you alluded to before.  Even if they are using the Enterprise Edition, I would not consider 50 million rows as a big deal in a well designed database that would warrant partitioning.
Acperkins what is a well designed database?  PLease read below.  There are 3 main tables.

So what I am worried about is not the current 50 million rows I am worried about it expanding in nature and retrieving data for reporting.  Taking this applicaiton national.

Estimating 6 million rows added per company.  That means 100 companies gives us 600 million rows in one year for forms that are filled out.

Say we have to keep 5 years back worth of data available to our users.

Are you saying 600 million for one year or 3 billion for 5 years for 100 companies is not something to worry about as well?

Just to clarify currently the table PatRecs(128,000) ----One to Many relationship --> AnswersRecs(51 million) --> many to many ---> QuestionsRecs(1000)

So right now, doing any type of report querying that Joins PatRecs, QuestionRecs, and AnswerRecs  will take down the server, literally.

Even though the column that are used for the <on> part of the join are indexed.

I do appreciate your guys help.  I am not a DBA but I do know the results I am getting on my own machine.  And either we need to redesign the structure of the tables or we need some serious hardware to help.
So what SQL Server Edition are you using?
Edition = Standard Edition
Version =9.00.1406.00
Then you have no choice, partitioning is not supported.
sorry have no choice?

You mean I need to redo the way the 3 tables are designed and make it more static in nature for the web form instead of dynamic.

Just for the fun of it I did a query for all the answers for 3 months for my company.  It returned 2.25 million records in 6 minutes on my new development box and it consumed all computing resources to do it.  So if they wanted to get a three month total for various answers it would take that long and that many records to go through.

Sorry, if I am dragging on about this.  I am trying to get an idea of how big is big.

And if performance is an issue what is needed to resolve it.

Like with the 2.25 million records for one companies answers for 3 months.  It runs slow.  Can that be resolved with hardware? Upgrading To 2005 enterprise edition. (can't goto 2008)  If more companies where added thus more rows in that big table will the query go even slower?
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial