Solved

How to create a best scalable database design for an online product catalog website

Posted on 2007-11-28
9
367 Views
Last Modified: 2010-08-05
I'm trying to build a website like kelkoo.co.uk. Here if you click on
"Categories --> Electronics --> TV --> Plasma"
It will take you to a screen, where you can refine your search and also see products.

The website I'm trying to build will allow users to add products and I want these products to be browsable and searchable like kelkoo.co.uk

I'm a beginner in database design and this is my first attempt at working on something this big. I've created smaller websites with a max of 5 tables. Based on just browsing that website, I've come up with this design shown below. I'm only concerned about how to capture product attributes and have only shown necessary table fields below. Please let me know if i'm way off the track.

Any input will be appreciated. I've included my comments below each table to describe the use. Its a long post but I wanted to make it very clear what I'm trying to accomplish.
Thanks a lot.
Comments: This table will be pre-populated with all the categories, I plan to have in the site.

 

Table: Category

CategoryID | ParentID | Name

1	  | 0	    | Electronics

2          | 1        | TV

10	  | 0	    | Home

11	  | 10       | Furniture

12	  | 11       | Sofa

----------------------------------------------------------

 

Comments: This table will be prepopulated with all the brand names for all categories

 

Table : Brand

BrandID | Name

200     | Sony

201     | Panasonic

202     | Samsung

300     | Ikea

301     | LazBoy

----------------------------------------------------------

 

Comments: This table will capture, which brands belong to which categories. So on the site, when the user clicks on TV category, I will look in this table to see what are the different brands that belong to this category and list out all the brands. It will be a pre-populated table

 

Table: Category_Brand

CategoryID | BrandID

2	  | 200

2	  | 201

2	  | 202

12         | 300

---------------------------------------------------------

 

Comments: This table will capture the different attributes for each category. The reason I added categoryID in this table is for the same reason - If the user clicks on TV category, I want to be able to present that refine search form at the top using which the user can narrow the search by Type or Screen Size etc. It will be a pre-populated table

 

Table: Attribute_Master

AttributeID | CategoryID | Name

10	   | 2	       | Type

11	   | 2	       | Screen Size

12	   | 2	       | Frequency

13	   | 2	       | Price

14	   | 2	       | Model

20	   | 12	       | Material

21	   | 12	       | Color

----------------------------------------------------------

 

Comments: This table captures the different choices that each of those attributes can have. For instance , TV can be of types - LCD,Plasma etc. It will be a pre-populated table

 

Table: Attribute_Choices

ChoiceID | AttributeID	| choiceValue

100	| 10		| LCD

101	| 10		| Plasma

102	| 10		| Projection

103	| 11		| 39" or less

104	| 11		| 40" - 44"

105	| 11		| 50" - 54"

106	| 12		| 100 Hertz

107	| 12		| 60 Hertz

108	| 12		| 50 Hertz

109	| 13		| 1000 - 2000

110	| 13		| 2000 - 4000

200	| 20		| Metal

201	| 20		| Wooden

202	| 20		| Leather

---------------------------------------------------------

 

Comments: Now when a user adds a new listing, this table will be populated. For instance if the user is entering a new TV listing, the form will require the user to select Category,Type,ScreenSize,Frequency etc from a prepopulated drop down list which will be populated using the attribute_choices table.

 

Table: Product

ProductID | CategoryID | BrandID | Name      | Price

1         | 2          | 200     | Sony LCD  | 1500

----------------------------------------------------------

 

Comments: This table actually stores the choices selected by the user in the form. Like if he selected TV type to be LCD - a choice ID of 100 will be entered here.

 

Table: Product_Attribute_Choices

ProductID | AttributeID | ChoiceID

1	 | 10          | 100	  

1	 | 11	      | 104	  

1	 | 12	      | 106	  

1	 | 13	      | 109	  

---------------------------------------------------------

 

Comments: Now for Category TV, if you look in Attribute_Master, it also has another attribute called Model - but Model does not have a pre-defined list of choices. The user will type that in the form. That kind of information will be stored in this table.

 

Table: Product_Attribute_Values

ProductID | AttributeID | AttributeValue

1	 | 14	      | XYZ

Open in new window

0
Comment
Question by:gublooo
  • 6
  • 3
9 Comments
 
LVL 42

Expert Comment

by:dqmq
ID: 20371516
First, I strongly urge you to do a logical design before you jump into a table design.  After considering my suggestions, the reasons should be evident. In the logical design, it would be very helpful to identify the primary keys.

Based on my suppositions about primary keys, I do have some  suggestions:

1. The recursive relationship between category table and itself supports a one-many hierarchy between categories.  That's fine.  But wouldn't it be better to support a many-many between categories.  That way, for example, you can use the same category in multiple hierarchies:
   home -> furniture ->  chair
   office ->  furniture ->  chair

To do this, remove parentID from the category table and add another table with CategoryID and ParentID for the primary key. This allows you to represent a category with more than one parent.
   
2. You need to propogate CategorieID to it's child entities.  Attibute_Choices needs to inherit categoryID from Attribute Master.  Product_Attribute_Choices needs to inherit it from both Product and Attribute_Choices. Without that unification it would be possible to assign attribute_choices from one category to a product in another.  For example, in your sample data would incorrectly allow this:   An LCD TV with the attribute of a sofa.

Table: Product_Attribute_Choices
ProductID | AttributeID | ChoiceID
1       | 20             | 200


3. Do you not really have a many-many between Attribute and category?  I mean does not the color attribute apply to more than one category?  So, table Attribute_Master is not in 3NF.  I suggest removing CategoryID from that table. Create an intersection table called CategoryAttribute with CategoryID and AttributeID as the PK.  

4. Product Attribute Values also needs the CategoryID. Same reason as #2.




0
 

Author Comment

by:gublooo
ID: 20371849
Thanks for your valuable input Dennis - Really Appreciate it. As I'm a beginner, my responses might seem a little silly but please bear with me.

1) I like the idea about splitting the category table. Excellent Suggestion - Thanks

2) I'm a little confused about points 2 and 4. I'm going to break it down.
a) "Attibute_Choices needs to inherit categoryID from Attribute Master. "
-> So when you say inherit, your saying I shoud add categoryID to Attribute_Choices table
But why do I need to do that. I'm sure I'm overlooking something.
-If I want to know all the attributes of Category TV - I can get it from Attribute_Master table
select * from Attribute_Master where categoryID=2
-If I want to know the choices of these attributes - I can join Attribute_Master and Attribute_choices
select * from Attribute_Master AM, Attribute_Choices AC
where AM.AttributeID=AC.AttributeID
And categoryID=2
So what is the point of adding categoryID to Attribute_Choices table?

b) "Product_Attribute_Choices needs to inherit it from both Product and Attribute_Choices. "
I guess I'm a little confused when you say inherit. Are you saying I should be adding categoryID in product_Attribute_choices table as well.
Again I'm not sure why? - This table is populated when the user fills out the "Add Listing" form and submits. So in the add listing form, he selects the category TV- so once he selects the category TV, based on the number of attributes the TV category has in Attribute_Master table (5 in this case) - 5 new questions are posted on the form - Like "Type" - "Screen Size" etc and these questions will be prepopulated as a drop down list with values from Attribute_Choices table. So when he submits the form, I know the exact ChoiceID's he selected to populate the Attribute_Product_Choices table. So how will I end up with values like
Table: Product_Attribute_Choices
ProductID | AttributeID | ChoiceID
1       | 20             | 200

3) Makes Sense. Thats a very good point. That way an attribute can apply to more than one category.

Once again - thanks a lot Dennis. Its past midnight now - I will read your response again in the morning - and maybe everything will be clear then :)
Cheers
0
 

Author Comment

by:gublooo
ID: 20375141
Primary Keys for tables I have so far - without including the changes you mentioned

Category - CategoryID
Brand - BrandID
Attribute_Master - AttributeID
Attribute_Choices - ChoiceID
Product - ProductID
Product_Attribute_Choices - ProductID and AttributeID
Product_Attribute_Values - ProductID and AttributeID

Thanks
0
 
LVL 42

Expert Comment

by:dqmq
ID: 20392006
>>>>a) "Attibute_Choices needs to inherit categoryID from Attribute Master. "
-> So when you say inherit, your saying I shoud add categoryID to Attribute_Choices table

Yes

>>>>So what is the point of adding categoryID to Attribute_Choices table?
The purpose of #2 and #4 are support referential integrity. Without those attributes, you are missing part of the "natural" keys and therefore cannot support a foreign key to them.  

It's may be difficult for you to appreciate this right now, but it's important not to drop natural key attributes when you introduce surrogate keys.  That is why I advised you to do a logical design before the physical design.  If you jump right to the physical design, then it's easy to miss some of the business relationships.

Consider this abbreviated logical design which does NOT have surrogate keys:

Category
   CategoryID (PK)
   Name

Attribute
   AttributeID (PK)
   CategoryID (FK--> Category.CategoryID)
   Name

AttributeChoice
  AttributeID (PK, FK --> Attribute.AttributeID)
  CategoryID(PK, FK --> Attribute.CategoryID)
  ChoiceNo(PK)
  ChoiceValue

ProductAttributeChoice
  ProductID (PK, FK1 --> Product.ProductID)
  AttributeID (PK, FK2 -->Attribute.AttributeID)
  CategoryID (FK1 --> Product.CategoryID, FK2 -->AttributeChoice.CategoryID)
  ChoiceNo (FK2 --> AttributeChoice.ChoiceNo)

Notice how CategoryID propagates to child tables. Notice how CategoryID only appears once in the ProductAttributeChoice table, while at the same time it particpates in two foreign keys!!!  That's called "unification" and it asserts a business rule:  The attribute choices for a product must be from the same category as the product.

Still with me?  I have two more points before I climb off my soapbox.

---------------------------------------
First, I don't object to introducing surrogate keys (though, I'm not sure how much your design gains from it).  However, when introducing them, it's a mistake to discard (all or part of) the natural keys--they are still needed to assert the business rules they represent.  But, to keep the referential integrity, here's a sample of what I proposed:

Attribute_Master
  AttributeID  (PK)
  CategoryID  (PK,UK,FK -->Category.CategoryID)
  Name           (UK)

Note:
 The UK prevents a category from have two attributes with the same name
 The overloaded PK is one way to preserve the natural key without adding an index

Attribute_Choices
  ChoiceID (PK, UK)
  AttributeID  (UK,  FK --> Attribute_Master.AttributeID)
  CategoryID  (UK, FK --> Attribute_Master.AttributeID)
  ChoiceValue
 
Note:
  The UK supports RI to the Product_Attribute_Choice table.  In effect that supports the business rule that the attribute choices for a product must come from the same category as the product.
     
  I could just as well have made those three columns the PK (overloading the PK just as I did for the Attribute_Choices table).  I wanted to show you this way, which is technically more correct, but requires an additional index.

------------------
Second, you are correct that you don't need to propagate ChoiceID to satisfy retrieval requests because you can always find it by following the relationships.  (Though it does sometimes make queries easier).   You are also correct that the your application will "know" its ChoiceID context, so it's not needed for that either.

However, think about that for a minute:  if the application "knows" it's ChoiceID context when inserting the row but does not save it with the row, then you are in effect, you are in effect throwing away that fact.

Maybe this will shed some light.  What would the select look like that returns a product attribute choice and its category?  Well, you could write that select two different ways: one that joins to the category table through product or another that joins to the category table through the attribute_master.  With your design, it's possible to get different answers from (poor quality) data.  With the the natural keys preserved, it is not.

-------------

In summary, the reason to propagate ChoiceID is for the design to enforce a higher level of data quality as defined by the business rules.


--dQmQ

 
     





 







 
   















 



 




0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 

Author Comment

by:gublooo
ID: 20409168
Thanks a bunch for your response. Really appreciate the time you took in explaining in detail - That helped me a lot in understanding. I think I'm one step away from finalizing my database approach.

So what we have now is:
Category
   CategoryID (PK)
   Name

Attribute_Master
   AttributeID  (PK)
   CategoryID   (PK,UK,FK -->Category.CategoryID)
   Name         (UK)

Attribute_Choices
   ChoiceID     (PK, UK)
   AttributeID  (UK,  FK --> Attribute_Master.AttributeID)
   CategoryID   (UK, FK --> Attribute_Master.AttributeID)
   ChoiceValue

ProductAttributeChoice
   ProductID    (PK, FK1 --> Product.ProductID)
   AttributeID  (PK, FK2 -->Attribute.AttributeID)
   CategoryID   (FK1 --> Product.CategoryID, FK2 -->AttributeChoice.CategoryID)
   ChoiceNo     (FK2 --> AttributeChoice.ChoiceNo)
************************************************************************************
I was looking at your first reply and one of the suggestions u made was removing categoryID from attribute_master table and creating an intersection table CategoryAttributes....If I follow that approach, the design changes as follows -

Attribute_Master
   AttributeID (PK)
   Name        (UK)
 
CategoryAttribute
   CategoryID  (PK,FK -->Category.CategoryID)
   AttributeID (PK, FK -->Attribute.AttributeID)

Now should Attribute_choices change as follows
   ChoiceID     (PK, UK)
   AttributeID  (UK,  FK --> Attribute_Master.AttributeID)
   ChoiceValue

And create another table called CategoryChoice
CategoryChoice
   CategoryID  (PK,FK -->CategoryAttribute.CategoryID)
   ChoiceID    (PK,FK -->Attribute_choices.ChoiceID)

The reason I thought of that was because as you mentioned, the color attribute can apply to more than 1 category say TV and Sofa - (TV comes in black and silver color and Sofa comes in black and brown). Similarly the choice black for attribute color can also apply to multiple categories. Am I on the right track.

Category and ProductAttributeChoice tables will stay the same except that CategoryID in ProductAttributeChoice table will be a foreign key referring to CategoryChoice Table...

Once again I really appreciate your feedback in guiding me in the right direction. Hopefully I will finalize my design after your response.
Thankyou
0
 
LVL 42

Accepted Solution

by:
dqmq earned 500 total points
ID: 20413929
One of the main goals of the relational model is for the database design to enforce the business rules.  Your latest design omits a business rule. For example, it permits this:

AttributeChoice
   Color, White
   Color, Black

CategoryChoice
  Television, Black
  Refrigerator, White

ProductAttributeChoice
   Television, White   <-- oops!

I realize that you could have edits in your code to prevent this, but that misses the point of a relational design, which is for the database structure to prevent it.

Consider the alternative that I have attached. The advantages of this design are that, while an attribute choice may apply to multiple categories,  a product and its attribute choices must be in the same category.   The design also assures that the product name is unique and that there are no duplicate values for an attribute choice.  





Attribute_Master

   AttributeID  (PK)

   Name         (UK)
 

CategoryAttribute

   CategoryID    (PK, FK1 -->Category.CategoryID)

   AttributeID   (PK, FK2 -->Attribute.AttributeID)
 

AttributeChoices

   ChoiceID      (PK)

   AttributeID   (UK,  FK --> CategoryAttribute.AttributeID)

   CategoryID    (UK,  FK --> CategoryAttribute.CategoryID)

   ChoiceValue   (UK)
 

ProductAttributeChoice

   ProductID   (PK, FK1 -->Product.ProductID)

   BrandID     (PK, FK1 -->Product.BrandID

   CategoryID  (PK, FK1 -->Product.CategoryID,

                    FK2 -->AttributeChoices.CategoryID)

   AttributeID (PK, FK2 -->AttributeChoices.AttributeID)

   ChoiceValue (PK, FK2 -->AttributeChoices.ChoiceValue)
 

Product

  ProductID  (PK, UK1)

  CategoryID (UK1, FK --> CategoryBrand.CategoryID)

  BrandID    (UK1, FK --> CategoryBrand.BrandID)

  Name       (UK2)   

  

  

Open in new window

0
 

Author Comment

by:gublooo
ID: 20417721
Thankyou Dennis. I will move forward with your suggested approach. Appreciate the time you put in. thanks again
0
 

Author Closing Comment

by:gublooo
ID: 31411566
Very Helpful
0
 

Author Comment

by:gublooo
ID: 20423912
Hello again,
Going back to my previous posts about having a seperate table to store all the choices and allowing an attribute choice to apply to multiple categories - I came up with this approach. I've eliminated the need for the seperate brand table.

Attribute_Master
   AttributeID  (PK)
   Name         (UK)

Attribute_Choices
   ChoiceID (PK)
   AttributeID (PK, FK -->Attribute.AttributeID)
   ChoiceValue   (UK)
 
CategoryAttributeChoices
   CategoryID  (PK, FK1 -->Category.CategoryID)
   AttributeID   (PK, FK1 -->Attribute.AttributeID
                              FK2 --> Attribute_Choices.AttributeID
  ChoiceID       (      FK1 --> Attribute_Choices.ChoiceID)
 
ProductAttributeChoice
   ProductID   (PK, FK1 -->Product.ProductID)
   CategoryID  (PK, FK1 -->Product.CategoryID,
                              FK2 -->CategoryAttributeChoices.CategoryID)
   AttributeID (PK, FK2 -->CategoryAttributeChoices.AttributeID)
   ChoiceID (PK, FK2 -->CategoryAttributeChoices.ChoiceID)
 
Product
  ProductID  (PK, UK1)
  CategoryID (UK1, FK1 -->Product.CategoryID,
                              FK2 -->CategoryAttributeChoices.CategoryID)
  Name       (UK2)  

Thanks
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Introduction HyperText Transfer Protocol (http://www.ietf.org/rfc/rfc2616.txt) or "HTTP" is the underpinning of internet communication.  As a teacher of web development I have heard many questions, mostly from my younger students who have come to t…
Foolproof security solutions has become one of the key necessities of every e-commerce or Internet banking website. If you too own an online shopping site then its vital for you to equip your web portal with customer security features that can allow…
Learn how to set-up custom confirmation messages to users who complete your Wufoo form. Include inputs from fields in your form, webpage redirects, and more with Wufoo’s confirmation options.
Learn how to set-up PayPal payment integration in your Wufoo form. Allow your users to remit payment through PayPal upon completion of your online form. This is helpful for collecting membership payments, customer payments, donations, and more.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now