Solved

Retrieving hierarchical data from stored procedure

Posted on 2004-04-14
48
1,123 Views
Last Modified: 2010-05-18
I have a table where I am maintaing hierarchy like icategoryid and iparentID.
I want to create stored procedure which will give records in order the tree should be seen.
e.g. table contains values as -
icatid iparentid
1      NULL
2      NULL
3      1
4      2
5      3

I want output like
icatid iparentid
1      NULL
3      1
5      3
2      NULL
4      2

Thanks in advance
0
Comment
Question by:AratiBogam
  • 15
  • 11
  • 9
  • +3
48 Comments
 
LVL 6

Accepted Solution

by:
billy21 earned 50 total points
Comment Utility
Select icatid, iparentid
From MyTable
Order By icatid, IsNull(iparentid,-100)
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
Where I have used -100 you have to make sure that is smaller than any other number that may appear for iparentid.


0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
Actually now I look more closely, I can't see any order in your result.

If you want to specify the records in a specific order, the fields you have provided us with don't show that order.  You will have to add a new field and manually enter the order you wish to see that record in.  Then order by that field.
0
 
LVL 39

Expert Comment

by:appari
Comment Utility

what is the criteria of the output?
is there some specified rule to get the output, explain.
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
icatid iparentid  Order_By
1      NULL           1
3      1                2
5      3                3
2      NULL           4
4      2                5

Unless the order can be somehow calculated you need to include a new field to show the order.

Then the query is:
Select icatid, iparentid
From MyTable
Order By Order_By
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
I can see the iparentid is the parent record's icatid.  I can see that if the iparentid is null then it's a top level object.  The order of the records is based on its relatives.  When the last relative is entered you need to start from another top level record.

I don't see how we could do this without populating a temporary table although I'm looking into another way.


0
 

Author Comment

by:AratiBogam
Comment Utility
Yes, What you have understood is right. I am trying to get parent and all its child nodes from the table. The difficult part is order in which I am trying to get results. It is ok if I can get output using temporary table.
I think we can use recursive method or stored procedure but not able to figure out how this can be done.
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
Sorry, as there is no way of calculating what the record before the current record was this cannot be done with a query.  You need to populate a temporary table.
0
 

Author Comment

by:AratiBogam
Comment Utility
Ok. How can we get these results using temporary table? I am looking for a stored procedure.
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
I'm working on it now...
0
 
LVL 39

Expert Comment

by:appari
Comment Utility

whats your sqlserver's version. is it is SQL 2000? then we can try creating UDF.
0
 

Author Comment

by:AratiBogam
Comment Utility
Yes it is SQL 2000 server.
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
Your choice we can create a temp table or a udf that returns a table.  the code will be basically the same.
0
 
LVL 39

Assisted Solution

by:appari
appari earned 50 total points
Comment Utility
try this

create function GetParentID(@CatID int) returns int as
begin
declare @PID int

select @PID=isnull(parentID,0) from test where CatID=@CatID

while @PID<>0
begin
  select @CatID=@PID
  select @PID=isnull(parentID,0) from test where CatID=@CatID
end
return @CatID
end


select catID, parentID ,
dbo.GetParentID(catID)
from test
order by 3 asc,2


i supposed both catid and parentid are int type. if not change the types to appropriate ones.
0
 

Author Comment

by:AratiBogam
Comment Utility
Ok. I will prefer udf. Can you please give me the code?
0
 
LVL 39

Expert Comment

by:appari
Comment Utility

i think in your table field names are icatID and iParentID and table name i dont know, change them to your names.
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
Create Procedure ProcTest
AS
      Create Table #Temp
      (
            icatID Int,
            iParentID Int
      )


      Declare @ParentID Int
      Declare @CatID Int

      Declare C1 Cursor
      For
      Select Distinct iparentid
      From MyTable
      Order by IsNull(iParentID,0)

      Fetch Next From C1
      Into @ParentID
      WHILE @@FETCH_STATUS = 0
      Begin
            Declare C2 Cursor
            For
            Select icatid
            From MyTable
            Where iParentID = @ParentID
            Order By icatID
            Fetch Next From C2
            Into @CatID

            While @@Fetch_Status = 0
            Begin
      
                  Insert Into #temp(icatid, Iparentid)
                  Values(@ParentID,@CatID)
                  Fetch Next From C2
                  Into @CatID
            End
            deallocate C2
            Fetch Next From C1
            INTO @ParentID
      End
      Deallocate C1`

      Select * From #Temp
Go
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
I don't think the previously posted UDF will work.  My SP had a bug. The insert into statement was inserting parent id into the catid column and vice versa.  I've corrected it here...

Create Procedure ProcTest
AS
     Create Table #Temp
     (
          icatID Int,
          iParentID Int
     )


     Declare @ParentID Int
     Declare @CatID Int

     Declare C1 Cursor
     For
     Select Distinct iparentid
     From MyTable
     Order by IsNull(iParentID,0)

     Fetch Next From C1
     Into @ParentID
     WHILE @@FETCH_STATUS = 0
     Begin
          Declare C2 Cursor
          For
          Select icatid
          From MyTable
          Where iParentID = @ParentID
          Order By icatID
          Fetch Next From C2
          Into @CatID

          While @@Fetch_Status = 0
          Begin
     
               Insert Into #temp(icatid, Iparentid)
               Values(@CatID,@ParentID)
               Fetch Next From C2
               Into @CatID
          End
          deallocate C2
          Fetch Next From C1
          INTO @ParentID
     End
     Deallocate C1`

     Select * From #Temp
Go
0
 
LVL 11

Assisted Solution

by:vc01778
vc01778 earned 50 total points
Comment Utility
You can traverse the tree (actually you have more trees than one) without using a cursor like this:


-- Your data:
create table t1(icategoryid int, iparentID int)
insert into t1 values(1, NULL)
insert into t1 values(2, NULL)
insert into t1 values(3, 1)
insert into t1 values(4, 2)
insert into t1 values(5, 3)


-- The procedure

create procedure traverse as
 set nocount on
 declare @icategoryid int, @iparentID int, @level int, @root int
 declare @stack table(icategoryid int, iparentID int, level int)
 declare @tmp table(icategoryid int, iparentID int)
 
 set @root = -1
 while 1=1 begin
   select top 1 @root = icategoryid from t1
   where iparentID is null and icategoryid not in (select icategoryid from @tmp)
   if @@rowcount = 0 break
   insert into @stack(icategoryid, iparentID, level) values(@root, null, 1)

   -- find all the children
   set @level = 1
   while @level > 0 begin
      if exists (select 1 from @stack where level = @level) begin
          select top 1 @icategoryid = icategoryid, @iparentID=iparentID
          from @stack
          where level=@level
          insert into @tmp values(@icategoryid, @iparentID)
          delete from @stack
          where level = @level and icategoryid = @icategoryid
          insert into @stack
          select icategoryid, iparentID, @level + 1
          from t1
          where iparentID = @icategoryid
          if @@rowcount > 0 set @level= @level + 1
      end
      else set @level = @level - 1
   end
   
 end

 -- show the result
 select * from @tmp

-------------------------- end of procedure


-- execute:

traverse

-- Result:

1      NULL
3      1
5      3
2      NULL
4      2


VC
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
I've seen people do such things to avoid cursors.  I know cursors should be avoided but I'd have to question whether loading data into a table and deleting it row by row is any more efficient.

Looking at the UDF posted earlier, I think it can work.

Problem is the order by clause.  It should be changed as so:

select catID, parentID ,
dbo.GetParentID(catID)
from test
order by 3 asc,1

But then you have to question whether firing a new query this number of times per resulting row is any more efficient than a cursor or the method VC suggests.  I'd say not.
0
 
LVL 11

Expert Comment

by:vc01778
Comment Utility
@billy21,

The problem with the UDF as well as with you code is that none of them works.

The UDF solution relies on the parentid implicit order and will break,  for example,  on this:

--
1      NULL
2      NULL
3      1
25      2
5      25
6      5

-- result *wrong

1      NULL      1
3      1      1
2      NULL      2
25      2      2
6      5      2
5      25      2


-- correct result:
1      NULL
3      1
2      NULL
25      2
5      25
6      5


VC
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
You're right about the udf.  But what is wrong with my sp?  The logic looks flawless to me.  If there is an issue surely it is merely syntactical.  I didn't attempt to run it here and I rarely ever use cursors.
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
You're right, mine doesn't work at all.  I just ran it here.  Points should go to VC.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
Does this go more than 3 levels deep?
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
For 3 (or 4, perhaps even 5 or 6 on a small and/or well-indexed table), you can something like this single query:


SELECT yt1.iCatId, yt1.iParentId
FROM yourTable yt1
LEFT OUTER JOIN yourTable yt2 ON yt1.iParentId = yt2.iCatId
LEFT OUTER JOIN yourTable yt3 ON yt2.iParentId = yt3.iCatId
ORDER BY COALESCE(yt3.iCatId, yt2.iCatId, yt1.iCatId),
      CASE WHEN yt3.iCatId IS NULL OR yt2.iCatId IS NULL THEN yt1.iCatId ELSE yt2.iCatId END,
      yt1.iCatId
0
 
LVL 11

Expert Comment

by:vc01778
Comment Utility
@


For the dataset:

--
1     NULL
2     NULL
3     1
25     2
5     25
6     5


Here's the output from your query which obviously incorrect:

1      NULL
3      1
2      NULL
5      25
25      2
6      5

-- correct result:
1     NULL
3     1
2     NULL
25     2
5     25
6     5


VC
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
OOPS ... the ORDER BY conditions may need some more revision.  But a single query will still be *much* more efficient than a cursor or some type of loop.
0
 
LVL 11

Expert Comment

by:vc01778
Comment Utility
@ ScottPletcher,

I would not probably spend much time on it since what you are trying to do is theoretically impossible (getting the desired result in one query).  

VC
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
CORRECTION  to ORDER BY:


SELECT yt1.iCatId, yt1.iParentId,
    COALESCE(yt3.iCatId, yt2.iCatId, yt1.iCatId)
FROM yourTable yt1
LEFT OUTER JOIN yourTable yt2 ON yt1.iParentId = yt2.iCatId
LEFT OUTER JOIN yourTable yt3 ON yt2.iParentId = yt3.iCatId
ORDER BY
    COALESCE(yt3.iCatId, yt2.iCatId, yt1.iCatId),
      CASE WHEN yt3.iCatId IS NULL AND yt2.iCatId IS NULL THEN NULL
           WHEN yt3.iCatId IS NULL THEN yt1.iCatId ELSE yt2.iCatid END,
      CASE WHEN yt3.iCatId IS NULL OR yt2.iCatId IS NULL THEN NULL ELSE yt1.iCatId END
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
>> is theoretically impossible (getting the desired result in one query). <<

What are you basing that on?  That's not a sarcastic question.  If there is some rule or something that will let me know when a query can't be written, that would be very helpful indeed so that I don't waste my time.  Personally, I haven't spent enough time on this to state this from "brute force" and I don't know of any method to allow me to determine that heuristically or inductively or deductively.
0
 
LVL 69

Assisted Solution

by:ScottPletcher
ScottPletcher earned 50 total points
Comment Utility
By the way, remember that I qualified "3 levels deep".  Each additional level would require an additional LOJ and ORDER BY logic.
0
 
LVL 11

Expert Comment

by:vc01778
Comment Utility
Pls. consider this:

create table t1(icategoryid int, iparentID int)
insert into t1 values(1, NULL)
insert into t1 values(70, NULL)
insert into t1 values(3, 1)
insert into t1 values(25, 70)
insert into t1 values(5, 25)
insert into t1 values(6, 5)

-- your result:

1      NULL      1
3      1      1
6      5      25
70      NULL      70
25      70      70
5      25      70

-- correct result:

1      NULL
3      1
70      NULL
25      70
5      25
6      5

VC
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
That goes FOUR levels.  I twice explicitly stated that the code would not work for anything beyond three levels (although an additional LOJ could be added that would).
0
 
LVL 11

Expert Comment

by:vc01778
Comment Utility
@ScottPletcher,

Ah,  sorry, I missed the 3 level limitation although you mentioned it *twice*.  Of course,  I meant the theoretical impossibilty for arbitrary nesting.

Rgds.

VC
0
 
LVL 11

Expert Comment

by:vc01778
Comment Utility
@ScottPletcher,

I am not sure why you need two outer joins if you limit the depth to three levels only.  One outer join should suffice:

SELECT yt1.iCatId,   yt1.iparentID l2
FROM yourTable yt1  left join yourTable yt2 ON yt1.iparentID = yt2.iCatId
order by
         coalesce(yt2.iparentID, yt1.iparentID, yt1.iCatId),
         coalesce(yt2.iparentID, yt1.iparentID) ,
         case when yt2.iparentID is not null then yt1.iparentID end


VC
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
That may be true.  Great, that makes 4-5 levels more feasible :-).
0
 
LVL 11

Expert Comment

by:vc01778
Comment Utility
@ScottPletcher,

Well,  it's not 'may be',  it is true.  That's what was confusing about your solution -- I thoght you were attempting to traverse 4 levels,  not three.

<quote>
"Great, that makes 4-5 levels more feasible "
</quote>

I doubt it very much because instead of one traversal you'll be doing 3 - 4 outer joins with horrible performance.

VC
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
I think the most sensible solution would be to employe a numbering system telling you the exact order you want these in and update it using a cursor but actually store the data in an order_by field.  Using a UDF you could get it into a single query but it won't be any more efficient than using a cursor.  in fact I'd say it would be less efficient.  For every iteration through the cursor you will have entire result sets being returned using the UDF idea and a Loop is absolutely unavoidable no matter what system you employ.
0
 

Expert Comment

by:csharp_is_best
Comment Utility
User-defined fucntions are certainly the way to go, but cursors and manual processing of each child row are way too inefficient.  I know of a way to handle unlimited level s of hierarchy, sorting properly by parent/child relatinship, etc.  See if this helps:

http://robgamble.com/robgamble/tree/tree.html
0
 
LVL 6

Expert Comment

by:billy21
Comment Utility
Something nobody seems to have considered is storing the data hierarchically instead of relationally.  Why not store this data in XML format?  XML files can be queried in SQL server using OpenXML if you need to do that too.
0
 

Expert Comment

by:csharp_is_best
Comment Utility
Sure I considered it.  It doesn't perform well.  The example used in this thread is tiny, but some tree hierarchies are hundreds or thousands of rows tall.  Imagine if you will an organization chart of a major corporation with 5000 employees.

The reason the aforementioned techniques are flawed is that they assume you must gather up each row in the "temp" table one-by-one, like you would in a VB or Java application.  The technique I outlined only makes 4 queries if the tre is 4 levels deep, 6 queries if the tree is 6 levels deep.  You could pull back the entire IBM corporate hierarchy instantaneously with this technique, with keys for sorting properly.

Take a look, you may find this helpful:

http://robgamble.com/robgamble/tree/tree.html
0
 
LVL 11

Expert Comment

by:vc01778
Comment Utility
@csharp_is_best,

I looked at you example and there are several problems with your approach.

1.  The original SP and the respective UDF perform a breadth-first traversal.  That's not what the original poster wanted.  His goal was to perform a depth-first traversal (that's what probably majority of applications use).

2.  The UDF solution uses a table in memory instead of a temporary table (which is impossible).  For large hierarchies,  one can simply run out of memory.

3.  You do suggest a way to convert your DFS into BFS by using the 'lineage' column (commonly called 'path').
But,  although you claim tha "The technique I outlined only makes 4 queries if the tre is 4 levels deep, 6 queries if the tree is 6 levels deep.",  it's quite misleading since,  in fact,  your SQL performs a join and two correlated 'exists' subqueries.  In addition,  an extra sort step is required which by itself may be quite costly.  In short,  the approach you describe is hardly more efficient than the trivial SP I showed earlier.


VC
0
 
LVL 69

Expert Comment

by:ScottPletcher
Comment Utility
>> 2.  The UDF solution uses a table in memory instead of a temporary table (which is impossible).  For large hierarchies,  one can simply run out of memory. <<

SQL will automatically page to disk as needed; running out of memory won't prevent the result from being built, although it will affect performance considerably.
0
 
LVL 11

Expert Comment

by:vc01778
Comment Utility
@ScottPletcher

I agree.  But one can also run out of page space ;).

Seriously,  the performance may suffer not just considerably but the system wil be doing nothing but paging (thrashing) -- witnessed personally.

Regards.

VC
0
 

Assisted Solution

by:csharp_is_best
csharp_is_best earned 50 total points
Comment Utility
VC,

Good observations, thanks for taking the time to give some feedback on the article.

Your proc handles the case where there are multiple root nodes in a tree, where my code can't.  It assumes the user will be interested in one node and its children /  parents.  This makes my approach completely useless for a report where more than one tree is required ( as in the original question ).

>>  The UDF solution uses a table in memory instead of a temporary table (which
>>  is impossible).  For large hierarchies,  one can simply run out of memory

I noticed that your original stored proc posted used TABLE variables as well, and would have
suffered all the same memory constriants as the UDFs.  

>>  But, although you claim that "The technique I outlined only makes 4 queries
>>  if the tree is 4 levels deep, 6 queries if the tree is 6 levels deep.",  it's quite
>>  misleading since, in fact, your SQL performs a join and two correlated 'exists'
>>  subqueries.  In addition, an extra sort step is required which by itself may be
>>  quite costly.  In short,  the approach you describe is hardly more efficient
>>  than the trivial SP I showed earlier.

I figured you had a good point, so I put the code to a real-life test.  I created a table, filled
it with dozens of levels of hierarchical data (random number of children per node).  Then I
tested the table with your procedure and my procedure.  I made sure both procedures used table variables, and my version used the lineage column for sorting.

Both procedures acheived instantaneous response times for trees with up to 500 records.  Both
procedures took 3 seconds to process trees with 1000.  3000 records = 13 seconds.  5000 records = 75 seconds.  You get the picture.  When Queries took in excess of 4+ minutes I quit comparing.  Both procedrues were fairly equivalent in performance (+/- 5%).

Then I decided to index.  And it was good.

I added an index to ParentID for the tree data.  This caused your procedure to process huge (6000 items) hierarchies in 5 seconds, where mine puttered along at 4+ minutes.  Wow!  This is because you seek data primarily based on the parent ID, where my "misleading joins" were still doing table scans on the Record IDs.

I then added the "primary key" attribute to the Record ID field in both our TABLE variables, which has the side effect of adding a clustered index (as you know, this is the only kind of index you can get on TABLE variables).  This made my procedure jump in performance due to better joins.  Everything under 3000 rows came back in less than a second (truly instantaneous UI response time), where your procedure took 1 - 2 seconds.

45,000 items took yours 17 seconds, where mine took 7 seconds.  The final test showed that with 100,000 items in a tree, your procedure took 38 seconds to return, and mine took 22 seconds.  Without indexing the two approaches are roughly equivalent in your favor.  *WITH* indexing the two approaches are roughly equivalent in my favor.

The net sum of my research was that your technique can handle multiple trees, and my technique is really no more efficient.  At least the "extra sort step" isn't so costly after all.

0
 
LVL 11

Expert Comment

by:vc01778
Comment Utility
Very good indeed.  I appreciate your taking time to benchmark the stuff.

By the way,  you may be interested to know that DB2 implements a BFS with its 'WITH' construct.  The same 'with' statement has been promised in Yukon.

vc
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now