Database Design (Performance Choice)

I am building an app with another developer at the moment, and we are struggling to agree on the design of a particular function :-)

One part of our app allows users to comment and like posts (similar to facebook)

So we have the following tables (in relation to this part of the app) in our database:

- users (id, fullname, email, username, etc)
- posts (id, user_id, message, etc)
- replies (id, user_id, message, etc)
- likes (id, user_id, post_id, etc)

However, the other developer feels that in order to load the posts quicker, we should also store the username and number of likes in the posts table too (in addition to the other tables), so that the app loads the data faster, as it wouldn’t have to query all of the tables for the all the data for a post, so it could get the username, no. of likes from the 'posts' table alone.

I firmly believe that you should never store the same data twice in a database, and that you should use an id to link all of the pieces of data together from different tables.

What do you think is the best approach?
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

I'm on your side, but also I think it is better to store (posts, replies, likes) in one table with a record flag and parent id to point replies and likes to the posts or replies, this case with proper indexes there will be no much performance differences, plus maintaining  correct count in the table will be extra overhead
Chris StanyonWebDevCommented:
I think your approach is a more technically sound one, and the idea of no-repeat is a fundamental concept of database normalisation. That said, once you understand the principles of normalisation, there are instances when breaking the rules is a better choice.

As for performance, having your approach will still only require a single query to the database to select the post, username and number of likes -  you'd just have a JOINed query with an aggregate function to get the count, so performance is unlikely to be affected.

If you start breaking the rules of normalisation, you may very well find yourself heading down a dead-end street, meaning that future scalability is limited or even impossible

My thoughts -stick with your ideas

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
oo7mlAuthor Commented:
Thanks guys.

@farwest: can you elaborate on your thinking please.

Do you mean have one table called 'content' and with several fields to facilitate for posts, replies and likes?
Powerful Yet Easy-to-Use Network Monitoring

Identify excessive bandwidth utilization or unexpected application traffic with SolarWinds Bandwidth Analyzer Pack.

yes, this way you use a flag for content type for example
contType = 1 ' for posts
contType = 2 ' for replies
contType = 11 ' for likes
contType = 12 ' for dislikes
parentContentID to link replies, likes with posts

and  (of course same message text field is used for both posts and replies and even you can extend likes to let user enter the reason why he/she like it to dislike it )

this also will save development time in forms and reports and statistics since you need only to filter, and let you build tree structure with post, replies and even replies for replies
oo7mlAuthor Commented:

But then a like would end up having several extra fields that it does not need, but posts needs them?
Chris StanyonWebDevCommented:
Already starting to get away from a normalised database here - duplicating data / using magic numbers / multi-use tables etc. An hour working with that and you'd run into problems!!
I think this apply only on message text field
using Varchar types only consume space when needed,
no noticeable impact on DB is expected
@Chris in theory what you are saying is correct, but in my opinion for this specific case the is is the best normalization level I can think of to balance with performance and complicity
Jason C. LevineDon't talk to me.Commented:
I'm with Chris on this one but beware of over-joins.  Your example above is fine...simple and only one to three joins ever needed.  But if you are going to have dozens or hundreds of tables, all joined at once, monitor performance carefully.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.