Solved

Categories & Subcategories for ecommerce database

Posted on 2007-04-03
19
2,670 Views
Last Modified: 2007-04-05
Hi database experts,

I need a little hand-holding here. I am designing a SQL Server 2005 database for an online store and I'm struggling with my Products & Categories tables design and how everything should work together.

I have a Products table with a sampling of columns show here

Products
 prod_name
 prod_price
 prod_type
 prod_cat
 prod_subcat

Category
 catID
 cat_name

SubCat
 subcatID
 subcat_name

I need help here with linking this all together. It's easy that the "Products Table and the Category Table get linked together. But what about subcategories A category/subcategory sample would be like so:

Electronics= Category
 Chips= SubCat
 Modules= SubCat
 Tuners= SubCat

Filters= Category
 Filters/Elements= SubCat
 Pre-Filters= SubCat
 Filter Inserts= SubCat

My question is this: should each subcategory be related to the main Category in the database? IOW, when someone enters a new "Products" record, selecting a Category of "Electronics", is there a way to restrict him only to relevant subcategories of "Chips, Modules, or Tuners"?

I hope I am making myself clear. The categories and subcategories will be used as linking navigation in the final website.
0
Comment
Question by:thinkwelldesigns
  • 9
  • 5
  • 3
  • +2
19 Comments
 
LVL 2

Expert Comment

by:kathik
ID: 18847167
The SubCat table needs a foreign key, CatID, that points back to the Category table.   When you build your app, make sure that the Subcat dropdown box is dynamically filtered based on the category that was chosen.
0
 
LVL 42

Expert Comment

by:dqmq
ID: 18847255
Products
 prodID(PK)
 prod_name (UK)
 prod_price
 prodtypeID(FK)
 
Category
 catID(PK)
 cat_name(UK)

SubCat
 subcatID(PK)
 subcat_name (UK)
 catID (FK)

ProductCategory
  ProductSubCategoryID(PK)
   prodID(FK1,UK1)
   subCatID(FK2,UK1)

ProductType
   ProdTypeID(PK)
   ProdType_Name (UK)
 
 
That structure allows you to drill down from Category to SubCategory.  It also allows the same product to be put in two different subcategories.

PS. Expect this to be the tip of the iceberg before you are done.

   

 


0
 
LVL 2

Expert Comment

by:poogy21
ID: 18849150
I've spent days, weeks and months over such questions myself..  And you'll soon find that it really doesn't matter.   Every shopping cart can be programed differently for different schemas..

But since you're looking for an answer, I'll give you 3! (how's that for your money?! ;)

1.  The Simple Solution:
Your Product Data Goes Here:
ProductsTable
- ProductID (PK)
- ProductDescription
- DepartmentID
- CategoryID

Your Discriptive (Definition Tables) Go Here.
DepartmentsTable
- DepartmentID (PK)
- Description

CagegoryTable
- CategoryID (PK)
- Description.

The above example is a one (product) to one (category + department) relationship.


Another Example:
Parent Child Solution:

ProductTable
- ProductID (PK)
- ProductDescription
- CategoryID

CagegoryTable
- CategoryID (PK)
- Description
- ParentCategory


This example will allow you to link your product to one category, which is nested in a parent category.

Lastly, a many-to-many example:

ProductTable
- ProductID
- Description

CategoryTable
- CategoryID
- Description

ProductCategoryLink
- ProductID
- CategoryID

in this example I can link whatever product to what ever category..  
a category can represent a department, sub category, or special section, special product types.. etc..  
it's just harder to manage in my experiance..

(I know I said 3 solutions.. but I couldn't resist giving you one more alternative.. )
Try using XML for site category structure..

<site>
<department name="Electronics" categoryid="1000">
  <category name="audio" categoryid="1001">
     <category name="video" categoryid="1002">
  </category>
<department>
<department name="Music" categoryid="2000">
  <category name="jazz" categoryid="2001">
     <category name="soft jazz" categoryid="2002">
     <category name="hard-core jazz" categoryid="2003">
  </category>
  <category name="rock" categoryid="2011">
     <category name="soft rock" categoryid="2012">
     <category name="hard-core rock" categoryid="2013">
  </category>
</department>
</site>


Than, just link products to categories in many to many!


Enjoy!

0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 18849393
I believe most of the problem you are having comes from the fact that your schema is not normalized...

Instead of the design you have been using, I suggest you consider the following design

PK: Primary Key
SK: Surrogate Key
FK: Foreign Key
NN:Not Null

Products
 prod_name (PK UNIQUE, NN)
 prod_price (NN)
 prod_type (FK --> Product_Type)

Product_Type
ProductTypeName(PK, NN UNIQUE)

Category
catid (SK counters - UNIQUE, NN)
cat_name (Natural PK - UNIQUE, NN)

CategoryBelong
catid_parent (FK --> Category  --> id of the upper level category, NN)
catid_child (FK --> Category, NN)

Now the answer to your questions...

<<should each subcategory be related to the main Category in the database?>>
No.  Representing hierarchies in relational modeling supposes a 1:0 cardinality between the element and the lineage.

<< IOW, when someone enters a new "Products" record, selecting a Category of "Electronics", is there a way to restrict him only to relevant subcategories of "Chips, Modules, or Tuners"?>>
That's a presentation issue not a db design issue.  You can load in a data adapter all the CategoryBelong dataset then load the correponding child records on *catid_child*.


And yes...Do not use XML: it is the worst thing you can do in db design.

Hope this helps...
0
 
LVL 9

Author Comment

by:thinkwelldesigns
ID: 18849758
Racimo, I have a little question here on the table structure you recommended:

Products
 prod_name (PK UNIQUE, NN)
 prod_price (NN)
 prod_type (FK --> Product_Type)

Product_Type
ProductTypeName(PK, NN UNIQUE)

Are you saying that the Product Type Table only hold the Name of the product type and not a numerical PK column as well? I was assuming that all tables should have a numerical PK which is the related column.

I'll probably have more questions on your answer.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 18849789
<<Are you saying that the Product Type Table only hold the Name of the product type and not a numerical PK column as well?>>
Yes.

<<I was assuming that all tables should have a numerical PK which is the related column.>>
Then you were assuming wrongly.  Primary keys is a logical concept not to be mistaken with incremential numeric indexes which are at most surrogate key implementations.  It certainly does not have to be numeric but people choose numeric values mostly for convenience and performance when the pk gets too long.  In any case, when the key is not too long (say below varchar(50) as a rule of thumb) it is better to choose a natural key over a surrogate key.  What matters most is that you implement however unicity over the natural primary key and make sure it is never NULL (else it is not a good candidate key)...

Hope this helps...
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 18849800
<<I have a little question here on the table structure you recommended:>>Actually this is not a table structure but a logical design to be implemented.  The table design (a physical implementation) would include data types and specify all constraints to be applied...
0
 
LVL 9

Author Comment

by:thinkwelldesigns
ID: 18849835
Thanks for that explanation of PK's and table structure vs. table design. I'll have to go work on something else now, but I'll be back!

Thanks for your help everyone. It's becoming clearer to me how this is to work.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 18849949
poogy21,

The solution proposed for representing hierarchies in RM is not correct (it's a known problem).  

CagegoryTable
- CategoryID (PK)
- Description
- ParentCategory

Explanation: the relationship beween some element and a sub element is a 1:0 cardinality because the highest level does *not* have parent but the schema you suggest forces a parent for the highest level.  As you generally have to put a NULL or an incoherent value in ParentCategory it basically breaks the Principle of Closed World in rules of design...

Hope this clarifies some tricky aspects with hierarchies...
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 42

Expert Comment

by:dqmq
ID: 18853334
Time for me to chime back in.

Poogy21 says>
>I've spent days, weeks and months over such questions myself..  And you'll soon find that it really doesn't matter.

I don't know about "weeks and months" , but such design issues do deserve a lot of attention.  They deserve attention because they help us understand the business and they help us deliver a database that supports it.  I couldn't disagree more about how much it matters!  The database design constrains the business. Doing it well is an important step that leads to developing a system that supports the business, grows with the business, and lowers TCO. Changing data designs after-the-fact is expensive and often leads to compensating mechanisms in the programming layer (also expensive) or in the business itself.

Racimo,
While the question specifically asked for table designs, not logical designs, you are right to steer the discussion to the latter.  We need to understand the business requirements first and that is what logical design is all about. I've noticed (especially in the SQL Server world) many practitioners skip the logical design and start constructing tables without really understanding the business implications.  The trend is disturbing, to say the least.

As for the logical design you proposed, I have this feedback.  Your model abandons the two-level category/subcategory hierarchy in the sense it was proposed.  Instead, you have proposed a "network" ontology that supports unlimited subcategory levels and subcategories that belong to multiple categories.  And it avoids the problem of permitting NULLS/special value in the parent FK of the top most categories.  Very nice, very flexible.

The bad news is that your design overlooks the relationship between Product and Category which is the essence of what needs to happen.  I suggest anticipating the business requirement for a Product assigned to multiple categories.  That means we need adding a ProductBelong entity to the design:

ProductBelong
  CatID (PK,FK1)
  Prod_Name (PK,FK2)

Also, we need a simlar kind of PK on CategoryBelong:

CategoryBelong
catid_parent (PK,FK1 --> Category  --> id of the upper level category, NN)
catid_child (PK,FK2 --> Category, NN)

Finally, a question which I am interested in for my own benefit. I noticed in your logical design that you DID use a surrogate key on CategoryBelong.  Could you explain why, especially after your followup assertions about SK not needed in logical design (for Product_Type).  The issue I'm struggling with is when/when not to introduce surrogate keys in the logical model as opposed to deferring that matter to the physical model.  If you want to take that question offline, my email is MillerDQ@verizon.net

 




0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 18856582
<<While the question specifically asked for table designs, not logical designs, you are right to steer the discussion to the latter.  We need to understand the business requirements first and that is what logical design is all about. I've noticed (especially in the SQL Server world) many practitioners skip the logical design and start constructing tables without really understanding the business implications.  The trend is disturbing, to say the least.>>
If you ask me, I believe this *trend* is no news.  The non separation between the logical and physical layer at design time is simply a consequence of ignorance of data fundamentals.  As CJ DATE stated:

*Intellectual agility and experience may replace fundamental knowledge of database fundamentals only in a *limited* manner*

<<As for the logical design you proposed, I have this feedback.  Your model abandons the two-level category/subcategory hierarchy in the sense it was proposed.  Instead, you have proposed a "network" ontology that supports unlimited subcategory levels and subcategories that belong to multiple categories.  And it avoids the problem of permitting NULLS/special value in the parent FK of the top most categories.  Very nice, very flexible.>>
Representing all types of hierarchies/networks RM is easy.  The proposed method is known and demonstrate the superiority of RM over any other type of modeling approach.

<<The bad news is that your design overlooks the relationship between Product and Category which is the essence of what needs to happen.  I suggest anticipating the business requirement for a Product assigned to multiple categories.  That means we need adding a ProductBelong entity to the design:>>
You are correct, thanks for pointing that out.  I forgot the foreign key on Product.  I do not however   recall the questionner stating the necessity for establishing an M:N cardinality between Products and Category (namely can a Product belong to several Categories).   So assuming M:N would necessarily be a subjective interpretation.  I believe in fact that only the questionner can answer as only he has a fulll knowledge of business requirements.  

Based on what's written I would say that a category  FK on Product is sufficient.

Products
 prod_name (PK UNIQUE, NN)
 prod_price (NN)
 prod_type (FK --> Product_Type)
 prod_catid(FK --> Category)

Note that my main intent here was merely to provide general guidelines to the questionner to allow him to avoid the mistake of putting the parent column as a self referential fk on category (which basically break 1NF).

Giving db design advice online can *at best* be informative because online media is not effective into communicating exhaustive business requirement required information to support precise design.  Given the little information provided by the questionner and the complexity of formalizing segments of reality into a decent conceptual model, evaluating the quality of design online has limited interest for that precise reason: unless one knows all requirements it is impossible to give sound objective and precise advice online.

<<Finally, a question which I am interested in for my own benefit. I noticed in your logical design that you DID use a surrogate key on CategoryBelong.  Could you explain why, especially after your followup assertions about SK not needed in logical design (for Product_Type).  The issue I'm struggling with is when/when not to introduce surrogate keys in the logical model as opposed to deferring that matter to the physical model.  If you want to take that question offline, my email is MillerDQ@verizon.net>>
I do not see what allows you to draw the conclusion that I used a surrogate key on CategoryBelong. (I would have in that case probably created an additional attribute named CategoryBelongid).

The CategoryBelong PK is obviously a natural key being the concatenation of (catid_parent+ catid_child).
A surrogate key is not the same as a concatenated key.

<<Could you explain why, especially after your followup assertions about SK not needed in logical design (for Product_Type).  The issue I'm struggling with is when/when not to introduce surrogate keys in the logical model as opposed to deferring that matter to the physical model.  If you want to take that question offline, my email is MillerDQ@verizon.net>>
I believe I have already answered that question but I will rephrase it more explicitly.  On a purely *logical* standpoint, sk's have the advantage of *familiarity* over cumbersome natural pk's (that maybe the concatenation of several attributes or single too long attribute).  Therefore, in logical perspective, there is no *obligation* to use sk over natural pk.

On a *physical* standpoint however, one *must* take in consideration the limitation of the technology that implements the design.  In direct image systems (ORACLE, DB2 and SQL Server), establishing concatenated and excessively long natural pk's can severely degrade performance.  Therefore using systems generated sk (generally incremential counters) may help alleviate such performance problems.
 In such physical perspective, it is hard to say however to say whether it's an obligation or a preference.

Hope this helps...
0
 
LVL 9

Author Comment

by:thinkwelldesigns
ID: 18858458
<< I do not however   recall the questionner stating the necessity for establishing an M:N cardinality between Products and Category (namely can a Product belong to several Categories)."

One product will reside in one category.


<<Giving db design advice online can *at best* be informative because online media is not effective into communicating exhaustive business requirement required information to support precise design.

I understand what you're saying, Racimo. What you've been offering is very helpful and I'm reviewing it and polishing up my understanding of normalization to try to make sure I'm following you.

This whole project has suffered an enormous amount of feature creep, and I'm finding it a bit of heavy sledding. :-|
0
 
LVL 9

Author Comment

by:thinkwelldesigns
ID: 18858495
I think I have gotten my database in better normalization shape. In fact, I know I have. But I have one more question from dqmq's post earlier.

I understand what he's saying in each table except:

Products
 prodID(PK)
 prod_name (UK)
 prod_price
 prodtypeID(FK)
 
Category
 catID(PK)
 cat_name(UK)

SubCat
 subcatID(PK)
 subcat_name (UK)
 catID (FK)

ProductCategory
  ProductSubCategoryID(PK)
   prodID(FK1,UK1)
   subCatID(FK2,UK1)

ProductType
   ProdTypeID(PK)
   ProdType_Name (UK)


THIS ONE:
ProductCategory
  ProductSubCategoryID(PK)
   prodID(FK1,UK1)
   subCatID(FK2,UK1)
 
Why should this be a table? Is it even necessary at all to be included if I make the Products Table:


Products
 prodID(PK)
 prod_name (UK)
 prod_price
 prodtypeID(FK)
 prodcat(FK)
 prodsubcat(FK)
 
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 18858666
<<One product will reside in one category.>>
In that case a M:1 cardinality is to be applied.  It is safe to say based onto what you have stated that you should get to the following schema.

Products
 prod_name (PK UNIQUE, NN)
 prod_price (NN)
 prod_type (FK --> Product_Type)
prod_categoryid (FK--> Category)

Product_Type
ProductTypeName(PK, NN UNIQUE)

Category
catid (SK counters - UNIQUE, NN)
cat_name (Natural PK - UNIQUE, NN)

CategoryBelong
catid_parent (FK --> Category  --> id of the upper level category, NN)
catid_child (FK --> Category, NN)


<<I think I have gotten my database in better normalization shape. In fact, I know I have. But I have one more question from dqmq's post earlier. >>
I am afraid there is no such thing as a *normalization shape*.  Normalization is a precise set of rules to be applied in design and your schema is either normalized or it is not( There is no *in middle*).  Normalization = optimization.  In your case you really do *not* need a subcategory table.  It would create additional unecessary complexity.

I have posted a solution that is known and that can be considered as a correct logical solution to the problem you described..  Up to you to apply the guidelines I provided you.

Good luck with da dezign...
0
 
LVL 23

Accepted Solution

by:
Racim BOUDJAKDJI earned 500 total points
ID: 18858838
Just before I forget.  If your Category and ProductType name is not too long (below 30 chars), you may also consider the folliowing design...It will be much simpler than alternative designs.

Products
   prod_name (PK UNIQUE, NN)
   prod_price (NN)
   prod_typename(NN)
   prod_categoryname(NN)

CategoryBelong
   categoryname_parent (NN)
   categoryname_child(NN)

(PK = categoryname_parent+ categoryname_child)

Hope this helps...
0
 
LVL 9

Author Comment

by:thinkwelldesigns
ID: 18859271
You've gotten me up and going, Racimo. I thank you very much for your great assistance.
0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 18859467
Glad I could help.  Designing poor db's is easy, designing correct db's *is* a difficult task.

Regards...
0
 
LVL 42

Expert Comment

by:dqmq
ID: 18859699
thinkwell,
The purpose of ProductCategory is to allow a product to be in more than one sub-category.  For example, you might have:
  Electronics  -->  Camera Supplies  -->  AAA Battery
  Electronics  -->  Radio Supplies  -->  AAA Battery


Doing it like you suggest:
Products
 prodID(PK)
 prod_name (UK)
 prod_price
 prodtypeID(FK)
 prodcat(FK)
 prodsubcat(FK)

Has a couple business implications.  
1. For my batteries example above, you would need multiple products for the same AAA Battery if you want it to be in multiple subcategories.  I realize, that you did not call out that requirement, so it goes beyond your stated business need.  However, sometimes it's good to think ahead; there is little disadvantage to providing that flexibilty now, even if it's not immediately needed.

2. You still miss the point of having prodcat and prodsubcat both in the Product table. If prodsubcat has a prodcat, then you put that relationship in ProdSubCat, not in Product.  As you designed it prodcat and prodsubcat are independent classifications.  

3. That design also overlooks Racimo's design change. He suggests that neither prodsubcat nor prodcat be in the Products table.  In otherwords, his design allows you to put the same subcategory in multiple categories.  While, you didn't call out that requirement, either, I think it is forward looking, much like ProductCategory (or ProductBelong in my variation to Racimo's design).

Racimo,
I understand the difference between composite key and surrogate key. I just made a mistake in my question.  I meant to ask, why did you put a surrogate key on Category, when for the reasons you gave it is not needed in the logical design?





0
 
LVL 23

Expert Comment

by:Racim BOUDJAKDJI
ID: 18859854
<<I understand the difference between composite key and surrogate key. I just made a mistake in my question.  I meant to ask, why did you put a surrogate key on Category, when for the reasons you gave it is not needed in the logical design?>>
Because, even if logical design does not put any obligation onto using a sk, I assumed a Category may be cumbersome to use it would be preferable to use a sk.  As I stated, on a logicalstandpoint sk have the only advantage of familiarity.  I also subconsciously assumed that maximum length of Category attribute would be a long varchar(you can call it a bias).  Therefore, an sk in logical design would almost become almost an obligation one implemented under SQL Server.  As a rule of thumb on SQLServer, I systematically use sk's on natural pk that have chances of being above 30 characters.  As you can see the choice is also motivated by the limitations of SQL Server.

Regards...
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.
Viewers will learn how to use the UPDATE and DELETE statements to change or remove existing data from their tables. Make a table: Update a specific column given a specific row using the UPDATE statement: Remove a set of values using the DELETE s…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now