asked on

database/table design and storing hierarchy information

I'm about to design a table (hierarchy | tree) since will be dealing with products where each product line has different levels and format. Also, I must add that going to use hierarchyid as data type.

Just as an example:

product line # 1 can break into:

1)division e.g. division A
1-1)group
1-2)sub-group
1-3)set
1-4)sub-set
1-5)style
1-n) rest of the lower levles are attributes

another division can break into:

1)division e.g. division B
1-1)matrials
1-2)set
1-3)sub-set
1-n) again, rest of the lower levles are attribute

what worries me that this will led us to a huge size of data/table sepecailly because each product has its own attributes which are broken into few groups e.g. evnironmental, mechanical and etc (listed as 1-n in above). Most of the division have a general set of attributs that can be access via a lookup table. So I'm thinking of storing these attributes into seprate (normal tables or non-hierarchical) this way tree level can decrease.

I will appreciate it if experts could advise me on this topics.

Regards,
amit

Nico Bontenbal

If I understand correctly you have a ProductHierarchy table that contains the tree with the Division/materials/groups/sub-groups/sets etc. At some (or all) levels of this tree you can have attributes attached. But there attributes are non-hierarchical. If that's so I wouldn't use a hierarchyid for this. But an extra table for the Attributes with a foreign key to the ProductHierarchy table.

Mark Wills

Hmmm... Interesting...

I tend to agree with Nicobo.

The hierarchy pattern is really well suited to the BOM type contructs whereby a Bill points to another Bill (and so on) when ultimately points to a specific item. So, the Bill is really nothing more than a collection or an ownership chain.

From what I can read into your example, it is more like a collection of attributes, and really, sub-group should probably belong to group (so we only need sub-group as an attribute) and the same again for set and sub-set. While it might be relatively easy to consider the set / sub-set / item to be a bill, it is a little more difficult to see that same hierarchical pattern for the model you have described so far. Which means that maybe we need a bit more discussion to better understand your model.

But is is sounding a bit more like starting at a unique product level and then trying to use hierarchies to drill down into different collections of attributes... Maybe a more simplified EAV type relationship is required ? On first glance it sounds a bit more like an XML style structure being converted to hierarchy, so maybe a more traditional approach is better suited.

Especially if that is what you are already considering for some of the division attributes - but then which node in your hierarchy represents a division so you can then grab the appropriate attributes at the appropriate level (that is always fun to put in a join). In other words, mixed models might become more difficult and maybe size in that case is a reasonable cost for the overall complexity of the model. And that is what you really need to consider. How easy / hard / responsive will the model be when it comes down to user demand to get access to the data.

Most importantly, if you really consider the design alternatives, if your model is really well suited to hierarchy patterns, then, it probably means less storage. Think of the number of different tables, each with primary (or natural) keys, data, and constriants for every different kind of attribute... If you are thinking that the hierarchical model you are exploring is going to be hugely expensive in terms of space, then maybe it isnt well suited after all.

Where the hierarchyID really pays dividends is when you really do need to traverse a tree to get to various nodes. Remember that the complete path / hierarchy is stored as the hierarchyID so things like getancestor and getdescendant are pretty quick and efficient.

But then there is a few new query terms that you have to start to introduce, and that can make relatively simple tasks a bit more involved for all your subsequent accesses. A small "gotcha" is making sure that there are no orphans, so, you normally have to put in a referential constraint to make sure a heirachy is either root or belongs to another. That sounds fine in principle, but then go delete an entry which does have descendants...

I think there is more to your designs, but so far, it doesnt really sound very hierarchical...

There is a great example comparing two approaches and a general discussion : http://msdn.microsoft.com/en-us/magazine/cc794278.aspx

And a must read is from BOL where it talks about relative strengths and examples (and links) : http://technet.microsoft.com/en-us/library/bb677173.aspx

akohan

ASKER

Hello Nicobo,

Thanks for your comment, exactly that is my plan storing attributes in different table(s). However, one remaining question is that what would be the foreign key?

This is my plan but indeed need an expert like you to comment on that. I'm using hierarchyid as a data type in the table. Let's say an item could be traced as ;

item A (its hierarchy-id is 0x1)
item A1 (its hierarchy-id is 0x6)
item A11 (its hierarchy-id is 0x12)
item A111 (its hierarchy-id is 0x32)
item A1111 (its hierarchy-id is 0x50)

now, I'm thinking of storing 0x16123250 in attribute table as foreign key so this way I would know what attribute belongs to which item.

Can you give me some insight into this if I'm on a right path? this is the first time I'm using hierarchy id.

Regards,
ak

Nico Bontenbal

Yes, I think that's the correct way to do this. You'll have a foreign key in the attributes table of type hierarchyid. In this column you store the hierarchyid of the item the attribute belongs to. I suppose the hierarchyid is the primary key of the item table, and a foreign key usually refers to the primary key in the foreign table.

Mark Wills

Well, that is kinda what happens when you use Hierarchies. It builds the heirarchy id *knowing* where it lives. It is not a simple matter of insert. You must take control.

While you can ignore the relationships, the *norm* is to insert the new node by looking or nominating its parent (or another child of the parent). You bassically use the getdescendant() function to generate the new hierarchy ID for the node being inserted.

Have a quick look at the tutorial : http://technet.microsoft.com/en-us/library/bb677174(v=sql.100).aspx

Mark Wills

Oh, and that example (the MSDN link) I posted previously did have / does show a foreign key set up...

akohan

ASKER

Yes, I have gone over the example you gave me this last Friday and give me ideas since. just as last question in this regard ... the article claims that hierarchyid might not be unique!

here is the statement: (under Limitations of hierarchyid)

It is up to the application to manage concurrency in generating and assigning hierarchyid values. There is no guarantee that hierarchyid values in a column are unique unless the application uses a unique key constraint or enforces uniqueness itself through its own logic.

Reference: http://msdn.microsoft.com/en-us/library/bb677173%28v=sql.105%29.aspx

So I'm creating two indices on the table as we see in the link http://msdn.microsoft.com/en-us/library/bb677173%28v=sql.105%29.aspx (following code)

CREATE UNIQUE INDEX ProductID_Depth_First 
ON Product(ProductID) ;
GO

Open in new window

I'm hoping this will avoid having duplication on productID since I need to use have a unique piece of data. of course, it is not primary key but still unique.

Any comments on this?

Regards.

Mark Wills

I would be inclined to create a clustered index primary key using a surrogate (like identity) and then index on your hierarchy. Using heirarchy as the product id you will have to manage it and make very sure the code is always inserting by finding the correct place to insert.

Having the unique constraint there will throw an error if there is a potential dupe, but all that means is you need to then find the correct place to insert, if in fact it is something new. So, you end up having to do the coding anyway.

And I would be inclined to do it as a stored procedure so you can get the max(id) as the current known max child and use that to make the the new (getdescandant) ID.

That's kinda what I was saying earlier that all of a sudden there is a fair bit of SQL that you need to consider. For example, it is not a straight insert, you must insert with respect to the hierarchyID of the existing / known entries.

Does that make sense ?

akohan

ASKER

I guess it does. However, I might need a confirmation on index part, so you are saying it would be more efficient to create a Primary Key on identity field (in my case, product-id) then create a 2nd index the the field which is hierarchyid type.

Assuming this my table structure:

CREATE TABLE Product
(
Node hierarchyid,
Level AS Node.GetLevel(),
Product_Id int UNIQUE NOT NULL,
Product_Name varchar(50) NOT NULL,
Product_Style tinyint
) ;

am I correct? if not please advise.

Thanks.

ASKER CERTIFIED SOLUTION

Mark Wills

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

akohan

ASKER

Thank you Mark,

Just to discuss few things:

>>Wouldnt worry about getlevel() if it is needed in a query, then use that function as/when needed rather than a calculated column all the time. GOOD POINT I will remove it.

>>And if Product_ID is unique and not null then make it the primary key. That is another concern I have. The product I'm using is nothing but an integer data type and it is incremental (identity 1,1) so will be important to make it primary key? I was thinking making Node column primary key since there is no guarantee (by nature) it will stay unique. any comments on this?

In my case, the real part number being used is the concatenation of Name column after going to depth (8 levels)or more ( in some product lines) so your guess was right and Product_ID is a not a natural key and I already have made Product_ID identity 1,1 which I have done it but not primary key so I make this primary key then I won't be able to make Node column primary key ... any advice?

Regards,
akohan

Mark Wills

OK, if product_id is identity, then make it the primary key and clustered.

Then you add your index for "Node". That index will use the PK information to point to data - and - because PK is clustered, the data is held on the leaf of the clustered index.

When accessing via "Node" it is indexed and will have the row pointer directly to the data, so, happy days :)

akohan

ASKER

thanks to all.