Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


database/table design and storing hierarchy information

Posted on 2012-04-12
Medium Priority
Last Modified: 2012-06-27
I'm about to design a table (hierarchy | tree) since will be dealing with products where each product line has different levels and format. Also, I must add that going to use hierarchyid as data type.

Just as an example:

product line # 1 can break into:

1)division e.g. division A
1-n) rest of the lower levles are attributes

another division can break into:

1)division e.g. division B
1-n) again, rest of the lower levles are attribute

what worries me that this will led us to a huge size of data/table sepecailly because each product has its own attributes which are broken into few groups e.g. evnironmental, mechanical and etc (listed as 1-n in above). Most of the division have a general set of attributs that can be access via a lookup table. So I'm thinking of storing these attributes into seprate (normal tables or non-hierarchical) this way tree level can decrease.

I will appreciate it if experts could advise me on this topics.

Question by:akohan
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 2
LVL 22

Expert Comment

by:Nico Bontenbal
ID: 37841605
If I understand correctly you have a ProductHierarchy table that contains the tree with the Division/materials/groups/sub-groups/sets etc. At some (or all) levels of this tree you can have attributes attached. But there attributes are non-hierarchical. If that's so I wouldn't use a hierarchyid for this. But an extra table for the Attributes with a foreign key to the ProductHierarchy table.
LVL 51

Expert Comment

by:Mark Wills
ID: 37842874
Hmmm... Interesting...

I tend to agree with Nicobo.

The hierarchy pattern is really well suited to the BOM type contructs whereby a Bill points to another Bill (and so on) when ultimately points to a specific item. So, the Bill is really nothing more than a collection or an ownership chain.

From what I can read into your example, it is more like a collection of attributes, and really, sub-group should probably belong to group (so we only need sub-group as an attribute) and the same again for set and sub-set. While it might be relatively easy to consider the set / sub-set / item to be a bill, it is a little more difficult to see that same hierarchical pattern for the model you have described so far. Which means that maybe we need a bit more discussion to better understand your model.

But is is sounding a bit more like starting at a unique product level and then trying to use hierarchies to drill down into different collections of attributes... Maybe a more simplified EAV type relationship is required ? On first glance it sounds a bit more like an XML style structure being converted to hierarchy, so maybe a more traditional approach is better suited.

Especially if that is what you are already considering for some of the division attributes - but then which node in your hierarchy represents a division so you can then grab the appropriate attributes at the appropriate level (that is always fun to put in a join). In other words, mixed models might become more difficult and maybe size in that case is a reasonable cost for the overall complexity of the model. And that is what you really need to consider. How easy / hard / responsive will the model be when it comes down to user demand to get access to the data.

Most importantly, if you really consider the design alternatives, if your model is really well suited to hierarchy patterns, then, it probably means less storage. Think of the number of different tables, each with primary (or natural) keys, data, and constriants for every different kind of attribute... If you are thinking that the hierarchical model you are exploring is going to be hugely expensive in terms of space, then maybe it isnt well suited after all.

Where the hierarchyID really pays dividends is when you really do need to traverse a tree to get to various nodes. Remember that the complete path / hierarchy is stored as the hierarchyID so things like getancestor and getdescendant are pretty quick and efficient.

But then there is a few new query terms that you have to start to introduce, and that can make relatively simple tasks a bit more involved for all your subsequent accesses. A small "gotcha" is making sure that there are no orphans, so, you normally have to put in a referential constraint to make sure a heirachy is either root or belongs to another. That sounds fine in principle, but then go delete an entry which does have descendants...

I think there is more to your designs, but so far, it doesnt really sound very hierarchical...

There is a great example comparing two approaches and a general discussion :

And a must read is from BOL where it talks about relative strengths and examples (and links) :

Author Comment

ID: 37852167
Hello Nicobo,

Thanks for your comment, exactly that is my plan storing attributes in  different table(s). However, one remaining question is that what would be the foreign key?

This is my plan but indeed need an expert like you to comment on that. I'm using hierarchyid as a data type in the table. Let's say an item could be traced as ;

item A     (its hierarchy-id is 0x1)
   item A1     (its hierarchy-id is 0x6)
      item A11    (its hierarchy-id is 0x12)
         item A111    (its hierarchy-id is 0x32)
            item A1111    (its hierarchy-id is 0x50)

now, I'm thinking of storing 0x16123250 in attribute table as foreign key so this way I would know what attribute belongs to which item.

Can you give me some insight into this if I'm on a right path? this is the first time I'm using hierarchy id.

Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

LVL 22

Expert Comment

by:Nico Bontenbal
ID: 37852661
Yes, I think that's the correct way to do this. You'll have a foreign key in the attributes table of type hierarchyid. In this column you store the hierarchyid of the item the attribute belongs to. I suppose the hierarchyid is the primary key of the item table, and a foreign key usually refers to the primary key in the foreign table.
LVL 51

Expert Comment

by:Mark Wills
ID: 37853216
Well, that is kinda what happens when you use Hierarchies. It builds the heirarchy id *knowing* where it lives. It is not a simple matter of insert. You must take control.

While you can ignore the relationships, the *norm* is to insert the new node by looking or nominating its parent (or another child of the parent). You bassically use the getdescendant() function to generate the new hierarchy ID for the node being inserted.

Have a quick look at the tutorial :
LVL 51

Expert Comment

by:Mark Wills
ID: 37853226
Oh, and that example (the MSDN link) I posted previously did have / does show a foreign key set up...

Author Comment

ID: 37853863
Yes, I have gone over the example you gave me this last Friday and give me ideas since.  just as last question in this regard ... the article claims that hierarchyid might not be unique!

here is the statement: (under Limitations of hierarchyid)

It is up to the application to manage concurrency in generating and assigning hierarchyid values. There is no guarantee that hierarchyid values in a column are unique unless the application uses a unique key constraint or enforces uniqueness itself through its own logic.


So I'm creating two indices on the table as we see in the link (following code)

ON Product(ProductID) ;

Open in new window

I'm hoping this will avoid having duplication on productID since I need to use have a unique piece of data.  of course, it is not primary key but still unique.

Any comments on this?

LVL 51

Expert Comment

by:Mark Wills
ID: 37854029
I would be inclined to create a clustered index primary key using a surrogate (like identity) and then index on your hierarchy. Using heirarchy as the product id you will have to manage it and make very sure the code is always inserting by finding the correct place to insert.

Having the unique constraint there will throw an error if there is a potential dupe, but all that means is you need to then find the correct place to insert, if in fact it is something new. So, you end up having to do the coding anyway.

And I would be inclined to do it as a stored procedure so you can get the max(id) as the current known max child and use that to make the the new (getdescandant) ID.

That's kinda what I was saying earlier that all of a sudden there is a fair bit of SQL that you need to consider. For example, it is not a straight insert, you must insert with respect to the hierarchyID of the existing / known entries.

Does that make sense ?

Author Comment

ID: 37856967
I guess it does. However, I might need a confirmation on index part, so you are saying it would be more efficient to create a Primary Key on identity field (in my case, product-id) then create a 2nd index the the field which is hierarchyid type.

Assuming this my table structure:

   Node hierarchyid,
   Level AS Node.GetLevel(),
   Product_Id int UNIQUE NOT NULL,
   Product_Name varchar(50) NOT NULL,
   Product_Style tinyint
) ;

am I correct? if not please advise.

LVL 51

Accepted Solution

Mark Wills earned 1000 total points
ID: 37858567
Yep, that is more like it.

Wouldnt worry about getlevel() if it is needed in a query, then use that function as/when needed rather than a calculated column all the time.

And if Product_ID is unique and not null then make it the primary key.

If it is not a natural key (like a product code that people remember and likely to use) then you might consider making it a sequence or identity. In which case, also make it clustered.

Author Comment

ID: 37858886
Thank you Mark,

Just to discuss few things:

>>Wouldnt worry about getlevel() if it is needed in a query, then use that function as/when needed rather than a calculated column all the time. GOOD POINT I will remove it.

>>And if Product_ID is unique and not null then make it the primary key. That is another concern I have. The product I'm using is nothing  but an integer data type and it is incremental (identity 1,1)  so will be important to make it primary key? I was thinking making Node column primary key since there is no guarantee (by nature) it will stay unique. any comments on this?

In my case, the real part number being used is the concatenation of Name column after going to depth (8 levels)or more ( in some product lines) so your guess was right and Product_ID is a not a natural key and I already have made Product_ID identity 1,1 which I have done it but not primary key so I make this primary key then I won't be able to make Node column primary key ... any advice?

LVL 51

Expert Comment

by:Mark Wills
ID: 37859027
OK, if product_id is identity, then make it the primary key and clustered.

Then you add your index for "Node". That index will use the PK information to point to data - and - because PK is clustered, the data is held on the leaf of the clustered index.

When accessing via "Node" it is indexed and will have the row pointer directly to the data, so, happy days :)

Author Closing Comment

ID: 37876502
thanks to all.

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will learn how to fix  “Cannot install SQL Server 2014 Service Pack 2: Unable to install windows installer msi file” error ?
A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question