How to create a best scalable database design for an online product catalog website

I'm trying to build a website like Here if you click on
"Categories --> Electronics --> TV --> Plasma"
It will take you to a screen, where you can refine your search and also see products.

The website I'm trying to build will allow users to add products and I want these products to be browsable and searchable like

I'm a beginner in database design and this is my first attempt at working on something this big. I've created smaller websites with a max of 5 tables. Based on just browsing that website, I've come up with this design shown below. I'm only concerned about how to capture product attributes and have only shown necessary table fields below. Please let me know if i'm way off the track.

Any input will be appreciated. I've included my comments below each table to describe the use. Its a long post but I wanted to make it very clear what I'm trying to accomplish.
Thanks a lot.
Comments: This table will be pre-populated with all the categories, I plan to have in the site.
Table: Category
CategoryID | ParentID | Name
1	  | 0	    | Electronics
2          | 1        | TV
10	  | 0	    | Home
11	  | 10       | Furniture
12	  | 11       | Sofa
Comments: This table will be prepopulated with all the brand names for all categories
Table : Brand
BrandID | Name
200     | Sony
201     | Panasonic
202     | Samsung
300     | Ikea
301     | LazBoy
Comments: This table will capture, which brands belong to which categories. So on the site, when the user clicks on TV category, I will look in this table to see what are the different brands that belong to this category and list out all the brands. It will be a pre-populated table
Table: Category_Brand
CategoryID | BrandID
2	  | 200
2	  | 201
2	  | 202
12         | 300
Comments: This table will capture the different attributes for each category. The reason I added categoryID in this table is for the same reason - If the user clicks on TV category, I want to be able to present that refine search form at the top using which the user can narrow the search by Type or Screen Size etc. It will be a pre-populated table
Table: Attribute_Master
AttributeID | CategoryID | Name
10	   | 2	       | Type
11	   | 2	       | Screen Size
12	   | 2	       | Frequency
13	   | 2	       | Price
14	   | 2	       | Model
20	   | 12	       | Material
21	   | 12	       | Color
Comments: This table captures the different choices that each of those attributes can have. For instance , TV can be of types - LCD,Plasma etc. It will be a pre-populated table
Table: Attribute_Choices
ChoiceID | AttributeID	| choiceValue
100	| 10		| LCD
101	| 10		| Plasma
102	| 10		| Projection
103	| 11		| 39" or less
104	| 11		| 40" - 44"
105	| 11		| 50" - 54"
106	| 12		| 100 Hertz
107	| 12		| 60 Hertz
108	| 12		| 50 Hertz
109	| 13		| 1000 - 2000
110	| 13		| 2000 - 4000
200	| 20		| Metal
201	| 20		| Wooden
202	| 20		| Leather
Comments: Now when a user adds a new listing, this table will be populated. For instance if the user is entering a new TV listing, the form will require the user to select Category,Type,ScreenSize,Frequency etc from a prepopulated drop down list which will be populated using the attribute_choices table.
Table: Product
ProductID | CategoryID | BrandID | Name      | Price
1         | 2          | 200     | Sony LCD  | 1500
Comments: This table actually stores the choices selected by the user in the form. Like if he selected TV type to be LCD - a choice ID of 100 will be entered here.
Table: Product_Attribute_Choices
ProductID | AttributeID | ChoiceID
1	 | 10          | 100	  
1	 | 11	      | 104	  
1	 | 12	      | 106	  
1	 | 13	      | 109	  
Comments: Now for Category TV, if you look in Attribute_Master, it also has another attribute called Model - but Model does not have a pre-defined list of choices. The user will type that in the form. That kind of information will be stored in this table.
Table: Product_Attribute_Values
ProductID | AttributeID | AttributeValue
1	 | 14	      | XYZ

Open in new window

Who is Participating?

Improve company productivity with a Business Account.Sign Up

dqmqConnect With a Mentor Commented:
One of the main goals of the relational model is for the database design to enforce the business rules.  Your latest design omits a business rule. For example, it permits this:

   Color, White
   Color, Black

  Television, Black
  Refrigerator, White

   Television, White   <-- oops!

I realize that you could have edits in your code to prevent this, but that misses the point of a relational design, which is for the database structure to prevent it.

Consider the alternative that I have attached. The advantages of this design are that, while an attribute choice may apply to multiple categories,  a product and its attribute choices must be in the same category.   The design also assures that the product name is unique and that there are no duplicate values for an attribute choice.  

   AttributeID  (PK)
   Name         (UK)
   CategoryID    (PK, FK1 -->Category.CategoryID)
   AttributeID   (PK, FK2 -->Attribute.AttributeID)
   ChoiceID      (PK)
   AttributeID   (UK,  FK --> CategoryAttribute.AttributeID)
   CategoryID    (UK,  FK --> CategoryAttribute.CategoryID)
   ChoiceValue   (UK)
   ProductID   (PK, FK1 -->Product.ProductID)
   BrandID     (PK, FK1 -->Product.BrandID
   CategoryID  (PK, FK1 -->Product.CategoryID,
                    FK2 -->AttributeChoices.CategoryID)
   AttributeID (PK, FK2 -->AttributeChoices.AttributeID)
   ChoiceValue (PK, FK2 -->AttributeChoices.ChoiceValue)
  ProductID  (PK, UK1)
  CategoryID (UK1, FK --> CategoryBrand.CategoryID)
  BrandID    (UK1, FK --> CategoryBrand.BrandID)
  Name       (UK2)   

Open in new window

First, I strongly urge you to do a logical design before you jump into a table design.  After considering my suggestions, the reasons should be evident. In the logical design, it would be very helpful to identify the primary keys.

Based on my suppositions about primary keys, I do have some  suggestions:

1. The recursive relationship between category table and itself supports a one-many hierarchy between categories.  That's fine.  But wouldn't it be better to support a many-many between categories.  That way, for example, you can use the same category in multiple hierarchies:
   home -> furniture ->  chair
   office ->  furniture ->  chair

To do this, remove parentID from the category table and add another table with CategoryID and ParentID for the primary key. This allows you to represent a category with more than one parent.
2. You need to propogate CategorieID to it's child entities.  Attibute_Choices needs to inherit categoryID from Attribute Master.  Product_Attribute_Choices needs to inherit it from both Product and Attribute_Choices. Without that unification it would be possible to assign attribute_choices from one category to a product in another.  For example, in your sample data would incorrectly allow this:   An LCD TV with the attribute of a sofa.

Table: Product_Attribute_Choices
ProductID | AttributeID | ChoiceID
1       | 20             | 200

3. Do you not really have a many-many between Attribute and category?  I mean does not the color attribute apply to more than one category?  So, table Attribute_Master is not in 3NF.  I suggest removing CategoryID from that table. Create an intersection table called CategoryAttribute with CategoryID and AttributeID as the PK.  

4. Product Attribute Values also needs the CategoryID. Same reason as #2.

gubloooAuthor Commented:
Thanks for your valuable input Dennis - Really Appreciate it. As I'm a beginner, my responses might seem a little silly but please bear with me.

1) I like the idea about splitting the category table. Excellent Suggestion - Thanks

2) I'm a little confused about points 2 and 4. I'm going to break it down.
a) "Attibute_Choices needs to inherit categoryID from Attribute Master. "
-> So when you say inherit, your saying I shoud add categoryID to Attribute_Choices table
But why do I need to do that. I'm sure I'm overlooking something.
-If I want to know all the attributes of Category TV - I can get it from Attribute_Master table
select * from Attribute_Master where categoryID=2
-If I want to know the choices of these attributes - I can join Attribute_Master and Attribute_choices
select * from Attribute_Master AM, Attribute_Choices AC
where AM.AttributeID=AC.AttributeID
And categoryID=2
So what is the point of adding categoryID to Attribute_Choices table?

b) "Product_Attribute_Choices needs to inherit it from both Product and Attribute_Choices. "
I guess I'm a little confused when you say inherit. Are you saying I should be adding categoryID in product_Attribute_choices table as well.
Again I'm not sure why? - This table is populated when the user fills out the "Add Listing" form and submits. So in the add listing form, he selects the category TV- so once he selects the category TV, based on the number of attributes the TV category has in Attribute_Master table (5 in this case) - 5 new questions are posted on the form - Like "Type" - "Screen Size" etc and these questions will be prepopulated as a drop down list with values from Attribute_Choices table. So when he submits the form, I know the exact ChoiceID's he selected to populate the Attribute_Product_Choices table. So how will I end up with values like
Table: Product_Attribute_Choices
ProductID | AttributeID | ChoiceID
1       | 20             | 200

3) Makes Sense. Thats a very good point. That way an attribute can apply to more than one category.

Once again - thanks a lot Dennis. Its past midnight now - I will read your response again in the morning - and maybe everything will be clear then :)
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

gubloooAuthor Commented:
Primary Keys for tables I have so far - without including the changes you mentioned

Category - CategoryID
Brand - BrandID
Attribute_Master - AttributeID
Attribute_Choices - ChoiceID
Product - ProductID
Product_Attribute_Choices - ProductID and AttributeID
Product_Attribute_Values - ProductID and AttributeID

>>>>a) "Attibute_Choices needs to inherit categoryID from Attribute Master. "
-> So when you say inherit, your saying I shoud add categoryID to Attribute_Choices table


>>>>So what is the point of adding categoryID to Attribute_Choices table?
The purpose of #2 and #4 are support referential integrity. Without those attributes, you are missing part of the "natural" keys and therefore cannot support a foreign key to them.  

It's may be difficult for you to appreciate this right now, but it's important not to drop natural key attributes when you introduce surrogate keys.  That is why I advised you to do a logical design before the physical design.  If you jump right to the physical design, then it's easy to miss some of the business relationships.

Consider this abbreviated logical design which does NOT have surrogate keys:

   CategoryID (PK)

   AttributeID (PK)
   CategoryID (FK--> Category.CategoryID)

  AttributeID (PK, FK --> Attribute.AttributeID)
  CategoryID(PK, FK --> Attribute.CategoryID)

  ProductID (PK, FK1 --> Product.ProductID)
  AttributeID (PK, FK2 -->Attribute.AttributeID)
  CategoryID (FK1 --> Product.CategoryID, FK2 -->AttributeChoice.CategoryID)
  ChoiceNo (FK2 --> AttributeChoice.ChoiceNo)

Notice how CategoryID propagates to child tables. Notice how CategoryID only appears once in the ProductAttributeChoice table, while at the same time it particpates in two foreign keys!!!  That's called "unification" and it asserts a business rule:  The attribute choices for a product must be from the same category as the product.

Still with me?  I have two more points before I climb off my soapbox.

First, I don't object to introducing surrogate keys (though, I'm not sure how much your design gains from it).  However, when introducing them, it's a mistake to discard (all or part of) the natural keys--they are still needed to assert the business rules they represent.  But, to keep the referential integrity, here's a sample of what I proposed:

  AttributeID  (PK)
  CategoryID  (PK,UK,FK -->Category.CategoryID)
  Name           (UK)

 The UK prevents a category from have two attributes with the same name
 The overloaded PK is one way to preserve the natural key without adding an index

  ChoiceID (PK, UK)
  AttributeID  (UK,  FK --> Attribute_Master.AttributeID)
  CategoryID  (UK, FK --> Attribute_Master.AttributeID)
  The UK supports RI to the Product_Attribute_Choice table.  In effect that supports the business rule that the attribute choices for a product must come from the same category as the product.
  I could just as well have made those three columns the PK (overloading the PK just as I did for the Attribute_Choices table).  I wanted to show you this way, which is technically more correct, but requires an additional index.

Second, you are correct that you don't need to propagate ChoiceID to satisfy retrieval requests because you can always find it by following the relationships.  (Though it does sometimes make queries easier).   You are also correct that the your application will "know" its ChoiceID context, so it's not needed for that either.

However, think about that for a minute:  if the application "knows" it's ChoiceID context when inserting the row but does not save it with the row, then you are in effect, you are in effect throwing away that fact.

Maybe this will shed some light.  What would the select look like that returns a product attribute choice and its category?  Well, you could write that select two different ways: one that joins to the category table through product or another that joins to the category table through the attribute_master.  With your design, it's possible to get different answers from (poor quality) data.  With the the natural keys preserved, it is not.


In summary, the reason to propagate ChoiceID is for the design to enforce a higher level of data quality as defined by the business rules.







gubloooAuthor Commented:
Thanks a bunch for your response. Really appreciate the time you took in explaining in detail - That helped me a lot in understanding. I think I'm one step away from finalizing my database approach.

So what we have now is:
   CategoryID (PK)

   AttributeID  (PK)
   CategoryID   (PK,UK,FK -->Category.CategoryID)
   Name         (UK)

   ChoiceID     (PK, UK)
   AttributeID  (UK,  FK --> Attribute_Master.AttributeID)
   CategoryID   (UK, FK --> Attribute_Master.AttributeID)

   ProductID    (PK, FK1 --> Product.ProductID)
   AttributeID  (PK, FK2 -->Attribute.AttributeID)
   CategoryID   (FK1 --> Product.CategoryID, FK2 -->AttributeChoice.CategoryID)
   ChoiceNo     (FK2 --> AttributeChoice.ChoiceNo)
I was looking at your first reply and one of the suggestions u made was removing categoryID from attribute_master table and creating an intersection table CategoryAttributes....If I follow that approach, the design changes as follows -

   AttributeID (PK)
   Name        (UK)
   CategoryID  (PK,FK -->Category.CategoryID)
   AttributeID (PK, FK -->Attribute.AttributeID)

Now should Attribute_choices change as follows
   ChoiceID     (PK, UK)
   AttributeID  (UK,  FK --> Attribute_Master.AttributeID)

And create another table called CategoryChoice
   CategoryID  (PK,FK -->CategoryAttribute.CategoryID)
   ChoiceID    (PK,FK -->Attribute_choices.ChoiceID)

The reason I thought of that was because as you mentioned, the color attribute can apply to more than 1 category say TV and Sofa - (TV comes in black and silver color and Sofa comes in black and brown). Similarly the choice black for attribute color can also apply to multiple categories. Am I on the right track.

Category and ProductAttributeChoice tables will stay the same except that CategoryID in ProductAttributeChoice table will be a foreign key referring to CategoryChoice Table...

Once again I really appreciate your feedback in guiding me in the right direction. Hopefully I will finalize my design after your response.
gubloooAuthor Commented:
Thankyou Dennis. I will move forward with your suggested approach. Appreciate the time you put in. thanks again
gubloooAuthor Commented:
Very Helpful
gubloooAuthor Commented:
Hello again,
Going back to my previous posts about having a seperate table to store all the choices and allowing an attribute choice to apply to multiple categories - I came up with this approach. I've eliminated the need for the seperate brand table.

   AttributeID  (PK)
   Name         (UK)

   ChoiceID (PK)
   AttributeID (PK, FK -->Attribute.AttributeID)
   ChoiceValue   (UK)
   CategoryID  (PK, FK1 -->Category.CategoryID)
   AttributeID   (PK, FK1 -->Attribute.AttributeID
                              FK2 --> Attribute_Choices.AttributeID
  ChoiceID       (      FK1 --> Attribute_Choices.ChoiceID)
   ProductID   (PK, FK1 -->Product.ProductID)
   CategoryID  (PK, FK1 -->Product.CategoryID,
                              FK2 -->CategoryAttributeChoices.CategoryID)
   AttributeID (PK, FK2 -->CategoryAttributeChoices.AttributeID)
   ChoiceID (PK, FK2 -->CategoryAttributeChoices.ChoiceID)
  ProductID  (PK, UK1)
  CategoryID (UK1, FK1 -->Product.CategoryID,
                              FK2 -->CategoryAttributeChoices.CategoryID)
  Name       (UK2)  

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.