We help IT Professionals succeed at work.

We've partnered with Certified Experts, Carl Webster and Richard Faulkner, to bring you two Citrix podcasts. Learn about 2020 trends and get answers to your biggest Citrix questions!Listen Now

x

Table Schema Best practices

cdfllc
cdfllc asked
on
Medium Priority
382 Views
Last Modified: 2012-08-14
Hi, I have used mySQL before and never had any issues, however, mydatabase was small scale (<100,000 rows)
So I never really considered the impact of my table schema too much...

Lets take the example from phpbb - there are three important tables I am concerned with in this question (trying to keep it simple and somewhat theoretical):
user,topic,post

Lets say I have 1 million users, each user creates 10 topics, and each topic has 10 posts to it.

user table now has 1 million rows
topics table now has 10 million rows
posts table now has 100 million rows

Is this the common practice? I mean, is this the best way for performance?
Maybe in order to answer that question, you need more information - such as what kinds of queries I will want to do...

Here are some examples (pretty obvious, but...)
1. I would want to show all the users topics - so I would just do a select where (topic.user_id = loggedinuser)
2. Show the posts for a particular topic - so I would just do a select where (post.topic_id = selectedtopic)


I mean, is this the way very large sites are setup, like friendster, etc...?
Or is there another way out there that scales that is a "trade secret" for DBA's?


thanks!
Comment
Watch Question

Top Expert 2006
Commented:
This is a pretty standard normalized data structure, yes.  Indexing will play a key role in performance of the application that depends on this data.  In some cases, you might want to create a summary table for specific tasks (say, watched_topics, which can contain up to 25 different topics per user, etc.).

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts
Top Expert 2005
Commented:
A million rows isn't too big for MySQL. 10 million is useable, but you'll need to be careful with your indexes and consider each query carefully.  100 million might be getting a little big for a single table, but that depends more on the OS imposed limit for file sizes than MySQL's limits.  For that table, and possibly the topics table, consider using a MERGE table type, where the data is physically split among several different tables (and therefore files), but you can logically view them (or a subset of them) as one.  The manual at mysql.com explains how to set that up, and shows its limitations.

Author

Commented:
todd, so I imagine that the hypothetical "watched_topics" table contains this information: user_id, topic_id
So if I wanted to show the users watched topics,
then I would just:    select from topics where the topic_id in (SELECT from watched_topics where user_id = loggedinuser)

Something like that?

Is that faster than having a flag in the topics table to have something like this:
SELECT from topics where user_id = loggedinuser AND watch_flag = 1
Top Expert 2006

Commented:
Yes - you could also do this with a straight join instead of a subquery:

SELECT t.* FROM topics t INNER JOIN watched_topics w ON (w.topic_id=t.topic_id)
WHERE w.user_id= loggedinuser;

You don't want to have a watch_flag on the topics table because there should be only one entry in that table per distinct topic.  Whether a topic is watched by a user is specific to a user/topic combination, so you want to keep that out of the topics table.

In fact, you don't want to have a user_id column on the topics table, either.  Think of a topic in abstract terms - they are independent of users.  For example, if there is a topic of "Computer Hardware", is it any different for user_1 than user_2?  Not in general terms, no.  If you want to associate topics to users, you need another table (the watched_topics table is an example of this) to define the associations.  But keep the topics table pure - don't include information it does not need to include.

Author

Commented:
:) todd, you're right - I wasn't really thinking it through there... about the watched topics. I was more focused on the AND watch_flag = 1 part.
Which, I think I found another answer to this type of question in another post -
if you have the "AND watch_flag = 1" part - I think they said that it would have to do a table scan to find all matches where the flag = 1 -- is that correct?

I guess I am making it harder than it really is - it just seems too easy...
I just remember we tried to create something like this where I used to work - we had this huge join table (user_id, related_user_id, relationship_type)
and the DBA said it wouldn't scale - it's haunted me ever since :)
Top Expert 2006

Commented:
I don't think that adding the watch_flag = 1 would REQUIRE a full table scan - it depends on the indexes.  If you have appropriate indexes, you can do it without a full table scan.

Even extremely large tables can be joined efficiently if the SQL makes good use of indexes.  I cringe to think of a DBA saying that well-normalized table definitions won't scale well - I'd love to see the alternative!  There are times (particularly for reporting), where de-normalized data is more efficient.  But generally you want to use normalized data to generate de-normalized data in summary (rollup) tables for more efficient reporting, while keeping and using the normalized data in normal applications.

I'm sure this is how EE does it.  Check it out - the points summary tables in the left-hand column are likely generated from summary tables that are updated periodically.  If you look at an actual expert profile, the point total may be slightly different.  That data is pulled in real time, which is easy to do efficiently because you know what you are looking for (a particular expert's point total).  But the leaderboard is a different thing - you don't want to calculate the total points per a topic area for each user for a given period of time EVERY time a user visits this page.  Periodically refreshed summary data is sufficient.

Author

Commented:
yeah, the DBA couldn't ever give us any alternatives! ;)
I think it was an excuse for something else... anyways thanks for the help todd!
I am planning on splitting the points 400 for you and 100 for snoyes, since you both contributed...
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.