Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 247
  • Last Modified:

SQL Server 2005 ability and reporting with extreme normalilization of information.

Initially this web application was developed for internal use.  Their requests now are to get out reports.  As well as they are starting to launch this out a service.

I was hired in to re-architect it into ..NET from classic ASP.

So I have a sql server that has a table with 50 million rows for about 5 years of data.

This table houses the answers for every single question on an advanced web form.

Depending on the answers you can have a minimum of 183 rows added to this table PER single form saved.  And this can grow or shrink because every entered field on the form is created dynamically from what was entered by the user.

So one minute it can have 183 fields they fill in and then go to 210 then goto 190.  All answers must be stored regardless if they are filled in.

Basic query takes 11 seconds - My Development Box 8 GB RAM 7200 HD and QUAD core processor.  So no competition issues for processing power.

select YEAR(b.prStart), count (YEAR(b.prStart))
from PatRecs b
inner join Answers c on c.datapr = b.prid
inner join Users d on d.userid = c.datauser
group by YEAR(b.prStart), d.userlogo
order by YEAR(b.prStart)

Now I have put indexes on prid (clustered primary key), datauser nonclustered index, datapr clustered index with nothing else associated with it.

Our current live sever has 4 GB, windows 2003, sql server 2005, XEON 2.53 , raid array.

I am debating if I need to unnormalize some of this form. ( I think I am using the right term.)

And instead of doing answers by row, put everything that needs to be static into columns and then nromalize the multiple answers that can happen.

The main thing I am worried about is performance in respects to normal growth, accssing data recors and in respects to producing reports, totaling, ect..

50 million to me is alot of rows for 4 companies.  That means the the major company which is mine owns about 32 million of those rows over 5 years.

Which means small to mid-sized companies like us might need a minimum 6 million rows a year.

So, 100 companies could possibly add 600+ million rows to this table a year.

So, am I worrying over nothing? Should I be thinking all can be solved through just throwing a machine(s) at it?  If so what would the spec of that machine(s) be that will allow us to grow to that type of size.  Something reasonable?  Dual Xeon Processors, Raid 5 array with 4 15K HD (can grow to 6), 8 GB ram to start allowing it to go upto 2 TB?

Your help would be great be appreciated,

Angela Law

PS I have thought about doing archiving.  But still 600 Million records in one year is pretty big. for hundred companies, and they are wanting to make this national.  Also by law must keep a certain number of years available because it is the healthcare industry.
0
angelalaw
Asked:
angelalaw
  • 4
  • 4
  • 2
1 Solution
 
lundnakCommented:
Are you running SQL Server Enterprise Edition?  If so, I would definitely look at partitioning these tables.  Possibly by a date time field (such as question date or question month).  Also consider table/index compression to keep the size small,  especially for the older data.
0
 
angelalawAuthor Commented:
We are locked in to sql server 2005 because of a main 3rd party software we are using.

Compression isn't available for 2005 I don't think.  Or am I wrong?

So, you think this definitely would be a problem if I left the design the way it currently is?
0
 
lundnakCommented:
You are correct, native compression isn't available in SQL 2005.  You would have to use a tool like RedGate SQL Storage Compress.

I think the design should work.  However, I would definitely consider implementing partitioning.
Another thought is do you have a method of pushing the archive data out to some sort of datawarehouse or archive database structure?
0
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

 
Anthony PerkinsCommented:
>>However, I would definitely consider implementing partitioning.<<
That is provided they are using the Enterprise Edition as you alluded to before.  Even if they are using the Enterprise Edition, I would not consider 50 million rows as a big deal in a well designed database that would warrant partitioning.
0
 
angelalawAuthor Commented:
Acperkins what is a well designed database?  PLease read below.  There are 3 main tables.

So what I am worried about is not the current 50 million rows I am worried about it expanding in nature and retrieving data for reporting.  Taking this applicaiton national.

Estimating 6 million rows added per company.  That means 100 companies gives us 600 million rows in one year for forms that are filled out.

Say we have to keep 5 years back worth of data available to our users.

Are you saying 600 million for one year or 3 billion for 5 years for 100 companies is not something to worry about as well?

Just to clarify currently the table PatRecs(128,000) ----One to Many relationship --> AnswersRecs(51 million) --> many to many ---> QuestionsRecs(1000)

So right now, doing any type of report querying that Joins PatRecs, QuestionRecs, and AnswerRecs  will take down the server, literally.

Even though the column that are used for the <on> part of the join are indexed.

I do appreciate your guys help.  I am not a DBA but I do know the results I am getting on my own machine.  And either we need to redesign the structure of the tables or we need some serious hardware to help.
0
 
Anthony PerkinsCommented:
So what SQL Server Edition are you using?
0
 
angelalawAuthor Commented:
Edition = Standard Edition
Version =9.00.1406.00
0
 
Anthony PerkinsCommented:
Then you have no choice, partitioning is not supported.
0
 
angelalawAuthor Commented:
sorry have no choice?

You mean I need to redo the way the 3 tables are designed and make it more static in nature for the web form instead of dynamic.

Just for the fun of it I did a query for all the answers for 3 months for my company.  It returned 2.25 million records in 6 minutes on my new development box and it consumed all computing resources to do it.  So if they wanted to get a three month total for various answers it would take that long and that many records to go through.

Sorry, if I am dragging on about this.  I am trying to get an idea of how big is big.

And if performance is an issue what is needed to resolve it.

Like with the 2.25 million records for one companies answers for 3 months.  It runs slow.  Can that be resolved with hardware? Upgrading To 2005 enterprise edition. (can't goto 2008)  If more companies where added thus more rows in that big table will the query go even slower?
0
 
Anthony PerkinsCommented:
>>sorry have no choice?<<
I am sorry I was not clear.  I meant partitioning was not an option, since the Standard Edition does not support that functionality.

Now what you have to do is focus on optimizing your queries.  This could mean re-designing your tables and indexes, but first of all you have to understand that returning 2.25 million rows is going to take a long time regardless of the hardware or design, you are limited not by the database, but simple network bandwidth.  That is why systems rarely return but a few hundred rows at a time, it just does not make sense to return 1000's of rows to the user, let alone millions.

0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 4
  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now