Solved

I need a SQL statement to select a person and their buddies and their buddies buddies

Posted on 2010-09-17
21
524 Views
Last Modified: 2013-12-19
I have a table in Oracle that contains buddy list information. It contains the user's buddies. The columns are "username" and "buddy". Data looks like this (listing Mary and Sue's buddies:

USERNAME   BUDDY
Mary              Bob
Mary              Sue
Mary              Bill
Sue                Bill
Sue                Joe

 I am trying to write a query that starts with a specific username and finds all of their buddies. Then I want to take the result set and find all the buddies for each person in the result set; and possibly do it once more. I would like to do this in a single SQL statement. I thought it might be a CONNECT BY query, but this isn't really a hierarchy, since everyone could be a parent and a child of everyone else.
0
Comment
Question by:rostara
  • 8
  • 5
  • 3
  • +3
21 Comments
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 33705199
You can use a recursive common table expression (cte -- WITH).
http://iggyfernandez.wordpress.com/2010/02/28/i-heart-recursive-subquery-factoring/

Kdo wrote this EE article for DB2, but the explanation of CTE recursion should be portable to Oracle and aid in your understanding of the concept.
http://www.experts-exchange.com/Database/DB2/A_3618-Recursive-SQL-in-DB2-Converting-rows-to-columns.html

See below.  Note you can change ub.username to bl.username if you want the results to come reflect all your buddies as friends of whomever you selected first, but given you are selecting by a specific username you would probably just do a final select of buddy anyway...

You can add in a level column that starts at 0 or 1 for the first select and then in the recursive piece do level+1 so you can control how many levels deep you go.  I am more of a MS SQL person where there is a MAXRECURSION option that I can't remember if is available for Oracle.

Hope that makes sense.
with buddyList as (
   select username, buddy
   from user_buddies
   where username = 'Mary'

   union all 

   select ub.username, ub.buddy
   from buddyList bl
   join user_buddies ub on ub.username = bl.buddy
)
select distinct buddy
from buddyList
;

Open in new window

0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 33705245
And as Kdo indicates in if you think there is a possibility for an infinite loop just be careful.  For example if Mary is the buddy of one of her buddies this could end up in a loop.

At least with top level person it is easy to filter as you can add something like this:
with buddyList as (
   select username, buddy
   from user_buddies
   where username = 'Mary'

   union all 

   select ub.username, ub.buddy
   from buddyList bl
   join user_buddies ub on ub.username = bl.buddy
   and ub.buddy <> bl.username
)
select distinct buddy
from buddyList
;

Open in new window

0
 
LVL 3

Expert Comment

by:rgeers
ID: 33705246
You mean something like:

select BUDDY from BUDDYLIST where USERNAME=(select BUDDY from BUDDYLIST where USERNAME =$USER)
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 
LVL 15

Expert Comment

by:Devinder Singh Virdi
ID: 33705833
What about this:-

select user_name, buddy from user_buddies where user_name ='Sue'
union all
select user_name, buddy from user_buddies where user_name in
  (select buddy from user_buddies where user_name ='Sue' )

0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33706485
unfortunately, recursive WITH clauses aren't supported in Oracle until 11gR2 and this is a 9i question

since this IS a hierarchy you can do it with CONNECT BY.


SELECT buddy
  FROM buddies
  START WITH  username = 'Mary'
  CONNECT BY username  = PRIOR buddy;


however, this assumes you won't have any loops in your data
for instance,  if Bob is Mary's buddy,  is Mary  Bob's buddy?

assuming she is, or a loop might be caused later such as Joe having Mary as a buddy.

Normally I'd suggest using CONNECT BY NOCYCLE to address this issue, but NOCYCLE doesn't exist until 10g and this is a 9i question

But, this is also easily solved, simply remove your starting user from the connect by clause, and limiting the level which you wan't to
do anyway since you are only going as deep as buddies' buddies  (level 2)


and that leaves you with this...



SELECT DISTINCT buddy
      FROM buddies
START WITH username = 'Mary'
CONNECT BY buddy != 'Mary' AND username = PRIOR buddy AND LEVEL <= 2

Open in new window

0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 33706499
Very nice, Sean! Thanks for chiming in, I didn't spot the 9i zone there which is important as you pointed out.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33706522
thanks, I saw this question come across my phone a few hours ago but didn't get a chance to get back to it until now.
I remembered the asker saying connect by couldn't be used and it stuck in my head "why not?"  
So I had to check in and see if the thread was still open or not.
0
 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33709460
http:#a33705833 would be my pick for being dead simple and nothing sophisticated or complicated (requiring specific features of this or that Oracle server version or other DBMS).
Just plain Ansi SQL.

Don't know why Mary became Sue, but I would change the Union All to just Union; Union removes duplicate results, e.g Mary->Sue->Mary.

select username, buddy from tbl_user_buddy where username ='Mary'
union
select username, buddy from tbl_user_buddy where username in
  (select buddy from tbl_user_buddy where username ='Mary' )
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33709739
The query in 33705833 requires 3 accesses to the same table,  it's an IO hog.
Indexes may help, but those could be used by the connect by too.

Since this is an Oracle question, trying to accomodate the limitations of other platforms seems counterproductive.
Plus,  the oracle syntax is hardly complicated,  4 lines vs 4 lines.  Seems pretty simple to me

However, I'll never discourage testing.  
Definitely try all suggestions,  remember to check resource consumption as well as clock time so you can get a better idea of scaling.
The connect by version does require more sorts, but those are almost always going to be less expensive than extra io.
0
 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33710179
Sean,

Since you are the resident Oracle expert on this question, could you explain why you would comment that 33705833 (and 33709460 for that matter) "accesses" the table 3 times?  I thought that is by definition what CONNECT BY.. PRIOR does.  The PRIOR means that it is working off a *PRIOR* stage of the result set.  Although it is elegantly written, yet it also accesses the table 3 times unless I am mistaken?

Table access 1 : Start by "mary" - index on username (seed)
Table access 2 : Collect Mary's buddies (level 1)
Table access 3 : Collect buddies of Mary's buddies (level 2)
0
 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33710289
I was thinking about this while driving (as you do), and I can see now why CONNECT BY would work better.
In summary, it accesses the table twice, right?

Table access 1 : Collect Mary's buddies (level 1)
Table access 2 : Collect buddies of Mary's buddies (level 2) - except Mary
DISTINCT (in case Mary's buddies share the same buddies)

Using 33709460, you get

Table access 1 : Collect Mary's buddies
Table access 2 : Collect Mary's buddies ( again - as subquery for 3 )  <<< extra
Table access 3 : Collect the buddies of the subquery in #2
UNION (removes duplicates)

The 2nd step is required without CONNECT BY, to generate a subquery for the next level.  The query didn't exclude Mary anyway. For completeness only:

select username, buddy from tbl_user_buddy where username ='Mary'
union
select username, buddy from tbl_user_buddy where username in
  (select buddy from tbl_user_buddy where username ='Mary' )
and buddy != 'Mary'
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33711467
In     33705833  I say it acceses the table 3 times because there are 3 distinct select statements

select user_name, buddy from user_buddies where user_name ='Sue'   -----   1
union all
select user_name, buddy from user_buddies                                          ---- 2
where user_name in  
  (select buddy from user_buddies where user_name ='Sue' )                ------- 3

even those #1 and #3 are very similar they are not identical and when I do an explain plan I see 3 accesses

the connect by only accesses the table once.  the "connect by" is an internal sorting operation.  It's certainly not a "free" operation by any means.
That's why I encourage testing all the options.  In my tests, on my database,  the connect by did less work, and corroborated my expectations. It seemed fairly obviously superior.

With the asker's data volumes, memory,hash area and indexes the other version "might" be better.  But I don't think so. It's pretty clear the triple query version
will always do more IO (consistent gets);  and I "think" that will be the greatest performance/scaling inhibitor.  I could be wrong.

I might be the "resident expert" but I've had my fair share of misreads, brain farts, typos and other mistakes; and I expect I will in the future too, so by all means test everything I say.  :)



0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 33711491
Good points, gentlemen.  Another reason, why I had started down the path of recursive CTE (i.e., a more complex approach) is that the Author says that the levels may be increased.  For the union approach, you would have to keep growing the number of subqueries/accesses where as recursion or the connect by can handle this growth gracefully...maybe performance would be same, but efficiency of the programmer and code maintenance I believe to be better.
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 33711503
"...finds all of their buddies. Then I want to take the result set and find all the buddies for each person in the result set; and possibly do it once more."

I took that as:
(1) find Mary's buddies;
(2) find Mary's buddies' buddies;
(3) find Mary's buddies' buddies' buddies!

phew! That is why recursion was first to come to mind, but as Sean said for 9i connect by has to be used and this is a valid example of hierarchy.  Yes, the above can be written with UNION, but why when later we may add (4).
0
 

Author Comment

by:rostara
ID: 33711562
Thanks to everyone for the numerous comments and variety of approaches. I inadvertently listed this as a 9i question, when in fact it is a 10g environment, so the CONNECT BY PRIOR NOCYCLE approach may be the most appropriate option. I will test it this afternoon and post the results.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33712287
If you might be going beyond 2 levels then you MUST use NOCYCLE on the connect by.

If you don't, then you need to use some other strategies, in 9i code this could be very tricky to go beyond 2 levels
0
 

Author Comment

by:rostara
ID: 33716174
The SQL in 33706485 seems to do the trick, but my table structure is a little different than the original set up. There are in fact, 2 tables. The first is an ACCOUNT table which lists an account_id and username for each user. the second is the BUDDY table, which has a username (the user) and an account_id (which is the buddy's account_id and FK). The structure of the table seems a little backward to me, but I'm stuck with it. Here is a sample:

ACCOUNT
ACCOUNT_ID    USERNAME
            1                MARY
            2                 BOB
            3                 SUE
            4                 BILL
            5                  JOE

BUDDY
ACCOUNT_ID     USERNAME
            2                   MARY
            3                   MARY
            4                   MARY
            4                   SUE
            5                   SUE

So, Mary has Bob, Sue, and Bill (2,3,4) as buddies and Sue has Bill and Joe (4,5) as buddies.

I'm having trouble appying the CONNECT BY in the join of these tables.
0
 
LVL 73

Accepted Solution

by:
sdstuber earned 500 total points
ID: 33716406
where I have "buddylist" above  put this...

(select b.username, a.username buddy
from account a, buddy b
where a.account_id = b.account_id)
SELECT DISTINCT buddy
FROM (SELECT b.username, a.username buddy
      FROM account a, buddy b
      WHERE a.account_id = b.account_id)
START WITH username = 'MARY'
CONNECT BY buddy != 'MARY' AND username = PRIOR buddy AND LEVEL <= 2

Open in new window

0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33716417
oops,  "buddies"  not "buddylist"
0
 

Author Comment

by:rostara
ID: 33986328
I have a similar, but more complex scenario this time. I'm trying to find the door a person went through and then all the other people that went through those doors, and then the people that went through the doors that the people above them did. Here is the sample tables:

DOOR    PERSON
3       a
3       b
3       c
3       d

4       a
4       c
4       x
4       y

5       b
5       d

6       x
6       z

7       c

So we start with Person "a" going through door 3.
Who else went through door 3? "b, c, d"
Now we want to know what other doors "b, c, and d" went through. "b and d" went thorough 4 and "c" went through 7.
Continuing the iteration, we want to know what doors "x & y" went through. And so on... I hope this makes sense.

This may be another representation:

a --> b <--> d
   --> c <--> a
             --> x --> z
             --> y
   --> d
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33988798
please click the "ask a related question" link and open a new question for this addendum
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I'm trying, I really am. But I've seen so many wrong approaches involving date(time) boundaries I despair about my inability to explain it. I've seen quite a few recently that define a non-leap year as 364 days, or 366 days and the list goes on. …
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

785 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question