Solved

query in one big table vs many different tables

Posted on 2013-01-11
9
403 Views
Last Modified: 2013-02-05
for reporting purpose , in sql server, which scenario is better?

one server that has many records for different categories, with one column making the difference.. (indicator telling what category of row of record it is)

vs

creating a separate table for each category..

for example, the column that makes the difference could have data like
'car','bus','bike','suv','trailer'.. 15 more such entries.. they are all char type.

the table will result in millions of records regardless..

will one big table with proper index be the way to go? or is partitioning more beneficial for SELECT report queries?
0
Comment
Question by:25112
  • 2
  • 2
  • 2
  • +3
9 Comments
 
LVL 26

Assisted Solution

by:Chris Luttrell
Chris Luttrell earned 100 total points
ID: 38769182
As usual, "It Depends?"
Are you going to be selecting the records together comparing accross categories or what?
If yes, then in general for speed in reporting purposes denormalizing into fewer tables is faster.
If you are always going to only be working with a single category or if there are going to be huge differences in the related columns for each category, then you might should spilt them up.
Also look at DW principles.  A Star Schema design with a Fact table with Denormalized Dimension tables around it is usually the desired design for reporting.
0
 
LVL 21

Expert Comment

by:Alpesh Patel
ID: 38769548
In that case you have predefined value in that column then create a Filtered index.

It will improve performance.


If not the case then better to create diff table for good performance.

But maintainability is the drawback.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 38770240
PatelAlpesh,

I suspect you posted in the wrong thread as the comment you posted makes no sense in the context of the question posted by the author.
0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 200 total points
ID: 38772403
If properly indexed, in general you will be able to get good performance from everything in one table.

Twenty+ of separate tables makes the code much harder to maintain.  And SQL must load a lot more table definitions to parse the code.
0
Zoho SalesIQ

Hassle-free live chat software re-imagined for business growth. 2 users, always free.

 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 200 total points
ID: 38773118
It does depend.... Probably how many different categories apply to any one record.

If you simply have a large lookup of categories, then agree with ScottPletcher, check your indexes carefully and no real need to break it up.

You could use partitioning, but really, if the data to be always used is spread across multiple partitions, then you are not gaining the true benefit of partitioning. Think of a partition as "this is all I need to see at this point".

For example, partitioning works great on chronological data where you only want to see "current month" or review "january" etc. The partition can help reduce the overall volume. But if your partition is on something that really cannot be grouped properly, then there is no real advantage and can create other problems (IMO).

If you have an EAV (entity attribute value) model then it can get painful if you are using it to record multiple attributes. Some normalisation will help.

With an EAV model, you often end up with duplicate rows of that entity with the only variation being the attribute and value - the data set becomes vertical. That can make it difficult for reporting purposes (unless you crosstab, tablix, pivot etc).

In that case I would recommend serious investigation about your use of attributes and possibly converting them to individual tables.

Now, if not an EAV, but a row could have multiple categories, then you need to consider the data set in a similar way as the EAV type model as per CGluttrell's post.

Maybe you can elaborate a little more and show some (made up) examples of what you actually have if it is different to the above scenarios.
0
 
LVL 5

Author Comment

by:25112
ID: 38794416
>>Are you going to be selecting the records together comparing accross categories or what?
No. They will all be processed seperately.

>>huge differences in the related columns for each category
only one or rarely a few columns will be different, most of the time.

>>how many different categories apply to any one record.
only one category for each record.

>>Twenty+ of separate tables makes the code much harder to maintain.
If it were only 5 or 10, would you suggest seperating would be better.. would you draw a line in terms of how many tables will change the decision on that.? I realize this is where Mark was referring to partitioning benefits and disadvantages.

can you commend on this:

what if the category column can be coverted into a date field (an appropriate lookup is available with correlating category to a distinct date).. will that help in resolving a firm answer for this?

each category of data is 2 m records.

thank you-
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 200 total points
ID: 38794508
OK, let's consider the following query...

SELECT col1,col2,col3,categoryID
FROM tbl_datasource D
INNER JOIN tbl_categories C on D.categoryID = C.ID

If that is the type of query, then it is going to be difficult to stray away from a single table. The datasource has only one column for categoryID

If those categories are somehow reflective of a period where by the following query would implicitly limit the selection, then you might consider a different approach with dates

SELECT col1,col2,col3,categoryID
FROM tbl_datasource D
INNER JOIN tbl_categories C on D.categoryID = C.ID
WHERE D.somedate > '2013-01-01'

If however you can somehow group the categories based on "type" somehow (you mention making it about 5 tables) as in :

SELECT col1,col2,col3,categoryID
FROM tbl_datasource D
LEFT JOIN tbl_categories C1 on D.categoryID = C1.ID and C1.TYPE = D.COL3
LEFT JOIN tbl_categories C2 on D.categoryID = C2.ID and C1.TYPE = D.COL3
LEFT JOIN tbl_categories C3 on D.categoryID = C3.ID and C1.TYPE = D.COL3
WHERE D.somedate > '2013-01-01'

Then you might consider breaking up categories into as many different "types"

If you had your queries broken up something more like :

SELECT col1,col2,col3,categoryID
FROM tbl_datasource D
INNER JOIN tbl_categories C1 on D.categoryID = C1.ID and C1.TYPE = D.COL3
WHERE D.somedate > '2013-01-01'
AND D.COL3 = 'Cars'

Then you might consider either partitioning, or, breaking up into seperate tables so the above would become something more like :

SELECT col1,col2,col3,categoryID
FROM tbl_datasource D
INNER JOIN tbl_car_categories CARS on D.categoryID = CARS.ID
WHERE D.somedate > '2013-01-01'

So, you can see a few different types and styles of queries, and of course they might not match what you really need, but based on the comment only 1 category per record, that is what I am imagining at the moment.

The number of rows shouldnt be a problem as such, so long as the query plan uses indexes rather than table scans, then it should be OK.

Why do you see a need to maybe break up categories ?

Do you have a simplified query (something like the above examples) as to how it is currently being used ?
0
 
LVL 69

Accepted Solution

by:
ScottPletcher earned 200 total points
ID: 38794569
>> If it were only 5 or 10, would you suggest seperating would be better <<

Nah.  Anything beyond a firm two, i.e. 3 or more, and you've got make a final decision, which for me is always a single table if possible.

I wouldn't object to creating separate views for each category, and using only those view names instead of the actual underlying table name(s) in queries.  SQL won't care, and it will give you more flexibility in the underlying table(s).


>> each category of data is 2 m records. <<

Even 40M rows is not that bad for modern servers if they are properly indexed, in particular the best clustering index.

So, make "category" the lead column in the index, followed by whatever best suits the data, and you should be fine.
0
 
LVL 5

Author Comment

by:25112
ID: 38858180
really appreciate your input... i will provide feedback as i implement these ideas and performance on queries

thanks much.
0

Featured Post

Complete Microsoft Windows PC® & Mac Backup

Backup and recovery solutions to protect all your PCs & Mac– on-premises or in remote locations. Acronis backs up entire PC or Mac with patented reliable disk imaging technology and you will be able to restore workstations to a new, dissimilar hardware in minutes.

Join & Write a Comment

How to leverage one TLS certificate to encrypt Microsoft SQL traffic and Remote Desktop Services, versus creating multiple tickets for the same server.
Ever needed a SQL 2008 Database replicated/mirrored/log shipped on another server but you can't take the downtime inflicted by initial snapshot or disconnect while T-logs are restored or mirror applied? You can use SQL Server Initialize from Backup…
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now