Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

query in one big table vs many different tables

Posted on 2013-01-11
9
407 Views
Last Modified: 2013-02-05
for reporting purpose , in sql server, which scenario is better?

one server that has many records for different categories, with one column making the difference.. (indicator telling what category of row of record it is)

vs

creating a separate table for each category..

for example, the column that makes the difference could have data like
'car','bus','bike','suv','trailer'.. 15 more such entries.. they are all char type.

the table will result in millions of records regardless..

will one big table with proper index be the way to go? or is partitioning more beneficial for SELECT report queries?
0
Comment
Question by:25112
  • 2
  • 2
  • 2
  • +3
9 Comments
 
LVL 26

Assisted Solution

by:Chris Luttrell
Chris Luttrell earned 100 total points
ID: 38769182
As usual, "It Depends?"
Are you going to be selecting the records together comparing accross categories or what?
If yes, then in general for speed in reporting purposes denormalizing into fewer tables is faster.
If you are always going to only be working with a single category or if there are going to be huge differences in the related columns for each category, then you might should spilt them up.
Also look at DW principles.  A Star Schema design with a Fact table with Denormalized Dimension tables around it is usually the desired design for reporting.
0
 
LVL 21

Expert Comment

by:Alpesh Patel
ID: 38769548
In that case you have predefined value in that column then create a Filtered index.

It will improve performance.


If not the case then better to create diff table for good performance.

But maintainability is the drawback.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 38770240
PatelAlpesh,

I suspect you posted in the wrong thread as the comment you posted makes no sense in the context of the question posted by the author.
0
Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

 
LVL 69

Assisted Solution

by:Scott Pletcher
Scott Pletcher earned 200 total points
ID: 38772403
If properly indexed, in general you will be able to get good performance from everything in one table.

Twenty+ of separate tables makes the code much harder to maintain.  And SQL must load a lot more table definitions to parse the code.
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 200 total points
ID: 38773118
It does depend.... Probably how many different categories apply to any one record.

If you simply have a large lookup of categories, then agree with ScottPletcher, check your indexes carefully and no real need to break it up.

You could use partitioning, but really, if the data to be always used is spread across multiple partitions, then you are not gaining the true benefit of partitioning. Think of a partition as "this is all I need to see at this point".

For example, partitioning works great on chronological data where you only want to see "current month" or review "january" etc. The partition can help reduce the overall volume. But if your partition is on something that really cannot be grouped properly, then there is no real advantage and can create other problems (IMO).

If you have an EAV (entity attribute value) model then it can get painful if you are using it to record multiple attributes. Some normalisation will help.

With an EAV model, you often end up with duplicate rows of that entity with the only variation being the attribute and value - the data set becomes vertical. That can make it difficult for reporting purposes (unless you crosstab, tablix, pivot etc).

In that case I would recommend serious investigation about your use of attributes and possibly converting them to individual tables.

Now, if not an EAV, but a row could have multiple categories, then you need to consider the data set in a similar way as the EAV type model as per CGluttrell's post.

Maybe you can elaborate a little more and show some (made up) examples of what you actually have if it is different to the above scenarios.
0
 
LVL 5

Author Comment

by:25112
ID: 38794416
>>Are you going to be selecting the records together comparing accross categories or what?
No. They will all be processed seperately.

>>huge differences in the related columns for each category
only one or rarely a few columns will be different, most of the time.

>>how many different categories apply to any one record.
only one category for each record.

>>Twenty+ of separate tables makes the code much harder to maintain.
If it were only 5 or 10, would you suggest seperating would be better.. would you draw a line in terms of how many tables will change the decision on that.? I realize this is where Mark was referring to partitioning benefits and disadvantages.

can you commend on this:

what if the category column can be coverted into a date field (an appropriate lookup is available with correlating category to a distinct date).. will that help in resolving a firm answer for this?

each category of data is 2 m records.

thank you-
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 200 total points
ID: 38794508
OK, let's consider the following query...

SELECT col1,col2,col3,categoryID
FROM tbl_datasource D
INNER JOIN tbl_categories C on D.categoryID = C.ID

If that is the type of query, then it is going to be difficult to stray away from a single table. The datasource has only one column for categoryID

If those categories are somehow reflective of a period where by the following query would implicitly limit the selection, then you might consider a different approach with dates

SELECT col1,col2,col3,categoryID
FROM tbl_datasource D
INNER JOIN tbl_categories C on D.categoryID = C.ID
WHERE D.somedate > '2013-01-01'

If however you can somehow group the categories based on "type" somehow (you mention making it about 5 tables) as in :

SELECT col1,col2,col3,categoryID
FROM tbl_datasource D
LEFT JOIN tbl_categories C1 on D.categoryID = C1.ID and C1.TYPE = D.COL3
LEFT JOIN tbl_categories C2 on D.categoryID = C2.ID and C1.TYPE = D.COL3
LEFT JOIN tbl_categories C3 on D.categoryID = C3.ID and C1.TYPE = D.COL3
WHERE D.somedate > '2013-01-01'

Then you might consider breaking up categories into as many different "types"

If you had your queries broken up something more like :

SELECT col1,col2,col3,categoryID
FROM tbl_datasource D
INNER JOIN tbl_categories C1 on D.categoryID = C1.ID and C1.TYPE = D.COL3
WHERE D.somedate > '2013-01-01'
AND D.COL3 = 'Cars'

Then you might consider either partitioning, or, breaking up into seperate tables so the above would become something more like :

SELECT col1,col2,col3,categoryID
FROM tbl_datasource D
INNER JOIN tbl_car_categories CARS on D.categoryID = CARS.ID
WHERE D.somedate > '2013-01-01'

So, you can see a few different types and styles of queries, and of course they might not match what you really need, but based on the comment only 1 category per record, that is what I am imagining at the moment.

The number of rows shouldnt be a problem as such, so long as the query plan uses indexes rather than table scans, then it should be OK.

Why do you see a need to maybe break up categories ?

Do you have a simplified query (something like the above examples) as to how it is currently being used ?
0
 
LVL 69

Accepted Solution

by:
Scott Pletcher earned 200 total points
ID: 38794569
>> If it were only 5 or 10, would you suggest seperating would be better <<

Nah.  Anything beyond a firm two, i.e. 3 or more, and you've got make a final decision, which for me is always a single table if possible.

I wouldn't object to creating separate views for each category, and using only those view names instead of the actual underlying table name(s) in queries.  SQL won't care, and it will give you more flexibility in the underlying table(s).


>> each category of data is 2 m records. <<

Even 40M rows is not that bad for modern servers if they are properly indexed, in particular the best clustering index.

So, make "category" the lead column in the index, followed by whatever best suits the data, and you should be fine.
0
 
LVL 5

Author Comment

by:25112
ID: 38858180
really appreciate your input... i will provide feedback as i implement these ideas and performance on queries

thanks much.
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Job - date manual 1 35
AJAX pass along a variable 3 46
union query column need default text 2 15
Access join syntax when converting to T-SQL query 4 33
Confronted with some SQL you don't know can be a daunting task. It can be even more daunting if that SQL carries some of the old secret codes used in the Ye Olde query syntax, such as: (+)     as used in Oracle;     *=     =*    as used in Sybase …
Composite queries are used to retrieve the results from joining multiple queries after applying any filters. UNION, INTERSECT, MINUS, and UNION ALL are some of the operators used to get certain desired results.​
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question