Database Design

I have a very unusual situation and wanted to get some thoughts from people on how they might handle this. Normally when you build a product database you have a table of items and in that table you might have fields that have the following information:

Manufacturer (int)

And then you might create another table with manufacturers and then use the PK_ID field to fill the manufacturers field in the table above.

However, what if multiple manufacturers make an item. I have a situation where a table consists of about 5.9 million rows. The first table has all the basic fields in it like Name, Size, Weight, Color and that is consistant across all the items regardless of manufacturer. However, there can be a number of manufacturers and they all have unique data of their own that must be displayed in a single result set. So a release date might be a field, since maybe the manufacturers released the items at different times. So far I have had to have two 5.9 million row tables join each other by a PK and a FK in order to have multiple manufacturers for given item, but this is incredibly slow.

Anybody ever encounter this issue before? Any thoughts on how this should be setup.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Peter HartCommented:
sounds like you are trying to design a 'one size fits all' database for all your manufacturers.
maybe each manufacturer has its own database(table) design since they all seem to have different criteria and variables.
Hello tarrigo,

I'm no db expert but my thoughts are you would have three tables..

Table 1 -  Manufacturer ID, Manufacturer Name
Table 2 - Manufacturer ID, Item ID
Table 3 - Item ID, Item Name, Size, Weight, Color

Say you have the same item let's say item 23 that is made by three manufacturers you won't duplicate any rows....The item exists once and each manufacturer exists once.  Then Table 2 links those tables to be useful.  Your queries, reports, etc will have to take this into account or alternately you could use a view to abstract this from anyone outside of the dba.



Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Seems to me you have two issues.  One easy, one more challenging.

The original tables you described represent a 1:many relationship between manufacturer and item--a manufacture produces many items, but an item only has one manufacturer.   But what you want is a many:many--a manufacturer produces many items and and item is produced by many manufacturers.  To represent that, use an "intersection" or "associative" table (two names for the same thing).  It looks like this:



That design is standard practice and very common. It's also the easy answer. Hard answer to follow.

The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

tarrigoAuthor Commented:
Yeah, I am in a way. You should know that the table has nothing to do with color and sizes and so on, just creating a shortened example. I am just a bit perplexed by it because I have never come across an item in the real world like a car that is made by multiple manufacturers. Usually it is just one manufacturer and so you could easily ask through a query to see all cars that match a given manufacturer. However the result that needs to come back would show all the size, name, and then all the specifics about the manufacturer.

So you might see a result like.

Name                                 Color            Weight                    Release Date
Chevrolet Sierra                Blue             10,000 lbs.              March 2008
Chevrolet Tahoe                Blue             10,000 lbs.              December 2007
From what it sounds like, you have a Product table where some products only differ by their manufacturer id? Well in that case extract the manufacturer id from the table and create a third table called ProductJoinManufacturer consisting of a unique index on ProductID and ManufacturerID. If this is truly the case, this should cut down the size of your intial Product table seeming that it would be more normalized.
tarrigoAuthor Commented:
In response to dgmg. A linking table was my first choice, but the amount of items being 5.9+ plus is makeing it too slow. When I bring the manufacturer data into the main table, it is very fast because there is no join. I am considering denormalizing for performance.
The harder part of your question is this.  What if you want to store different information about some items than others?  And what if the information varies by manufacturer?  Those are interesting and challenging design problems.

Your example about release date is a case where different manufacturers have a different value for the same item.  That's easy--you put that field in the association table.

But what if some items have fields that others do not.  For example, a cell phone has battery life, but a land phone does not.  There are several ways to handle it:

1.  Generalization hierarchy
2.  Nullable fields (some people avoid at all costs)
3.  User defined fields (also adamantly avoided in some circles)
4.  More granular entity definitions

>In response to dgmg. A linking table was my first choice, but the amount of items being 5.9+ plus is makeing it too slow. When I bring the manufacturer data into the main table, it is very fast because there is no join. I am considering denormalizing for performance.

I seriously doubt you need to denormalize that for performance. There are better ways to address the performance issue.  With good indexing, decent SQL, and possibly partitioning, the joins should perform quite adequately.

In fact, I rather suspect that kind of denormalization may hinder performance for some kinds of access--certainly for updates of product information where you then have to propagate changes to multiple rows.  Consider that you will be repeating all the common product information for each manufacturer that produces the product.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Internet / Email Software

From novice to tech pro — start learning today.