Solved

SQL Server 2005 ability and reporting with extreme normalilization of information.

Posted on 2011-03-24
11
231 Views
Last Modified: 2012-05-11
Initially this web application was developed for internal use.  Their requests now are to get out reports.  As well as they are starting to launch this out a service.

I was hired in to re-architect it into ..NET from classic ASP.

So I have a sql server that has a table with 50 million rows for about 5 years of data.

This table houses the answers for every single question on an advanced web form.

Depending on the answers you can have a minimum of 183 rows added to this table PER single form saved.  And this can grow or shrink because every entered field on the form is created dynamically from what was entered by the user.

So one minute it can have 183 fields they fill in and then go to 210 then goto 190.  All answers must be stored regardless if they are filled in.

Basic query takes 11 seconds - My Development Box 8 GB RAM 7200 HD and QUAD core processor.  So no competition issues for processing power.

select YEAR(b.prStart), count (YEAR(b.prStart))
from PatRecs b
inner join Answers c on c.datapr = b.prid
inner join Users d on d.userid = c.datauser
group by YEAR(b.prStart), d.userlogo
order by YEAR(b.prStart)

Now I have put indexes on prid (clustered primary key), datauser nonclustered index, datapr clustered index with nothing else associated with it.

Our current live sever has 4 GB, windows 2003, sql server 2005, XEON 2.53 , raid array.

I am debating if I need to unnormalize some of this form. ( I think I am using the right term.)

And instead of doing answers by row, put everything that needs to be static into columns and then nromalize the multiple answers that can happen.

The main thing I am worried about is performance in respects to normal growth, accssing data recors and in respects to producing reports, totaling, ect..

50 million to me is alot of rows for 4 companies.  That means the the major company which is mine owns about 32 million of those rows over 5 years.

Which means small to mid-sized companies like us might need a minimum 6 million rows a year.

So, 100 companies could possibly add 600+ million rows to this table a year.

So, am I worrying over nothing? Should I be thinking all can be solved through just throwing a machine(s) at it?  If so what would the spec of that machine(s) be that will allow us to grow to that type of size.  Something reasonable?  Dual Xeon Processors, Raid 5 array with 4 15K HD (can grow to 6), 8 GB ram to start allowing it to go upto 2 TB?

Your help would be great be appreciated,

Angela Law

PS I have thought about doing archiving.  But still 600 Million records in one year is pretty big. for hundred companies, and they are wanting to make this national.  Also by law must keep a certain number of years available because it is the healthcare industry.
0
Comment
Question by:angelalaw
  • 4
  • 4
  • 2
11 Comments
 
LVL 7

Expert Comment

by:lundnak
ID: 35215357
Are you running SQL Server Enterprise Edition?  If so, I would definitely look at partitioning these tables.  Possibly by a date time field (such as question date or question month).  Also consider table/index compression to keep the size small,  especially for the older data.
0
 

Author Comment

by:angelalaw
ID: 35215442
We are locked in to sql server 2005 because of a main 3rd party software we are using.

Compression isn't available for 2005 I don't think.  Or am I wrong?

So, you think this definitely would be a problem if I left the design the way it currently is?
0
 
LVL 7

Expert Comment

by:lundnak
ID: 35215565
You are correct, native compression isn't available in SQL 2005.  You would have to use a tool like RedGate SQL Storage Compress.

I think the design should work.  However, I would definitely consider implementing partitioning.
Another thought is do you have a method of pushing the archive data out to some sort of datawarehouse or archive database structure?
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 35217126
>>However, I would definitely consider implementing partitioning.<<
That is provided they are using the Enterprise Edition as you alluded to before.  Even if they are using the Enterprise Edition, I would not consider 50 million rows as a big deal in a well designed database that would warrant partitioning.
0
 

Author Comment

by:angelalaw
ID: 35218842
Acperkins what is a well designed database?  PLease read below.  There are 3 main tables.

So what I am worried about is not the current 50 million rows I am worried about it expanding in nature and retrieving data for reporting.  Taking this applicaiton national.

Estimating 6 million rows added per company.  That means 100 companies gives us 600 million rows in one year for forms that are filled out.

Say we have to keep 5 years back worth of data available to our users.

Are you saying 600 million for one year or 3 billion for 5 years for 100 companies is not something to worry about as well?

Just to clarify currently the table PatRecs(128,000) ----One to Many relationship --> AnswersRecs(51 million) --> many to many ---> QuestionsRecs(1000)

So right now, doing any type of report querying that Joins PatRecs, QuestionRecs, and AnswerRecs  will take down the server, literally.

Even though the column that are used for the <on> part of the join are indexed.

I do appreciate your guys help.  I am not a DBA but I do know the results I am getting on my own machine.  And either we need to redesign the structure of the tables or we need some serious hardware to help.
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 35220373
So what SQL Server Edition are you using?
0
 

Author Comment

by:angelalaw
ID: 35220576
Edition = Standard Edition
Version =9.00.1406.00
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 35223194
Then you have no choice, partitioning is not supported.
0
 

Author Comment

by:angelalaw
ID: 35223706
sorry have no choice?

You mean I need to redo the way the 3 tables are designed and make it more static in nature for the web form instead of dynamic.

Just for the fun of it I did a query for all the answers for 3 months for my company.  It returned 2.25 million records in 6 minutes on my new development box and it consumed all computing resources to do it.  So if they wanted to get a three month total for various answers it would take that long and that many records to go through.

Sorry, if I am dragging on about this.  I am trying to get an idea of how big is big.

And if performance is an issue what is needed to resolve it.

Like with the 2.25 million records for one companies answers for 3 months.  It runs slow.  Can that be resolved with hardware? Upgrading To 2005 enterprise edition. (can't goto 2008)  If more companies where added thus more rows in that big table will the query go even slower?
0
 
LVL 75

Accepted Solution

by:
Anthony Perkins earned 500 total points
ID: 35224717
>>sorry have no choice?<<
I am sorry I was not clear.  I meant partitioning was not an option, since the Standard Edition does not support that functionality.

Now what you have to do is focus on optimizing your queries.  This could mean re-designing your tables and indexes, but first of all you have to understand that returning 2.25 million rows is going to take a long time regardless of the hardware or design, you are limited not by the database, but simple network bandwidth.  That is why systems rarely return but a few hundred rows at a time, it just does not make sense to return 1000's of rows to the user, let alone millions.

0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Oracle Pivot 2 35
SQL Syntax:  How to Find Commonality Among Similar Results 2 34
mySql Syntax 7 33
MSSQL 2014 Query Synthax 8 38
by Mark Wills Attending one of Rob Farley's seminars the other day, I heard the phrase "The Accidental DBA" and fell in love with it. It got me thinking about the plight of the newcomer to SQL Server...  So if you are the accidental DBA, or, simp…
Introduction Hopefully the following mnemonic and, ultimately, the acronym it represents is common place to all those reading: Please Excuse My Dear Aunt Sally (PEMDAS). Briefly, though, PEMDAS is used to signify the order of operations (http://en.…
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…
This video explains how to create simple products associated to Magento configurable product and offers fast way of their generation with Store Manager for Magento tool.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now