We help IT Professionals succeed at work.

I need a SQL statement to select a person and their buddies and their buddies buddies

rostara
rostara asked
on
551 Views
Last Modified: 2013-12-19
I have a table in Oracle that contains buddy list information. It contains the user's buddies. The columns are "username" and "buddy". Data looks like this (listing Mary and Sue's buddies:

USERNAME   BUDDY
Mary              Bob
Mary              Sue
Mary              Bill
Sue                Bill
Sue                Joe

 I am trying to write a query that starts with a specific username and finds all of their buddies. Then I want to take the result set and find all the buddies for each person in the result set; and possibly do it once more. I would like to do this in a single SQL statement. I thought it might be a CONNECT BY query, but this isn't really a hierarchy, since everyone could be a parent and a child of everyone else.
Comment
Watch Question

Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Most Valuable Expert 2011

Commented:
You can use a recursive common table expression (cte -- WITH).
http://iggyfernandez.wordpress.com/2010/02/28/i-heart-recursive-subquery-factoring/

Kdo wrote this EE article for DB2, but the explanation of CTE recursion should be portable to Oracle and aid in your understanding of the concept.
https://www.experts-exchange.com/Database/DB2/A_3618-Recursive-SQL-in-DB2-Converting-rows-to-columns.html

See below.  Note you can change ub.username to bl.username if you want the results to come reflect all your buddies as friends of whomever you selected first, but given you are selecting by a specific username you would probably just do a final select of buddy anyway...

You can add in a level column that starts at 0 or 1 for the first select and then in the recursive piece do level+1 so you can control how many levels deep you go.  I am more of a MS SQL person where there is a MAXRECURSION option that I can't remember if is available for Oracle.

Hope that makes sense.
with buddyList as (
   select username, buddy
   from user_buddies
   where username = 'Mary'

   union all 

   select ub.username, ub.buddy
   from buddyList bl
   join user_buddies ub on ub.username = bl.buddy
)
select distinct buddy
from buddyList
;

Open in new window

Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Most Valuable Expert 2011

Commented:
And as Kdo indicates in if you think there is a possibility for an infinite loop just be careful.  For example if Mary is the buddy of one of her buddies this could end up in a loop.

At least with top level person it is easy to filter as you can add something like this:
with buddyList as (
   select username, buddy
   from user_buddies
   where username = 'Mary'

   union all 

   select ub.username, ub.buddy
   from buddyList bl
   join user_buddies ub on ub.username = bl.buddy
   and ub.buddy <> bl.username
)
select distinct buddy
from buddyList
;

Open in new window

Commented:
You mean something like:

select BUDDY from BUDDYLIST where USERNAME=(select BUDDY from BUDDYLIST where USERNAME =$USER)
Devinder Singh VirdiLead Oracle DBA Team

Commented:
What about this:-

select user_name, buddy from user_buddies where user_name ='Sue'
union all
select user_name, buddy from user_buddies where user_name in
  (select buddy from user_buddies where user_name ='Sue' )

Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
unfortunately, recursive WITH clauses aren't supported in Oracle until 11gR2 and this is a 9i question

since this IS a hierarchy you can do it with CONNECT BY.


SELECT buddy
  FROM buddies
  START WITH  username = 'Mary'
  CONNECT BY username  = PRIOR buddy;


however, this assumes you won't have any loops in your data
for instance,  if Bob is Mary's buddy,  is Mary  Bob's buddy?

assuming she is, or a loop might be caused later such as Joe having Mary as a buddy.

Normally I'd suggest using CONNECT BY NOCYCLE to address this issue, but NOCYCLE doesn't exist until 10g and this is a 9i question

But, this is also easily solved, simply remove your starting user from the connect by clause, and limiting the level which you wan't to
do anyway since you are only going as deep as buddies' buddies  (level 2)


and that leaves you with this...



SELECT DISTINCT buddy
      FROM buddies
START WITH username = 'Mary'
CONNECT BY buddy != 'Mary' AND username = PRIOR buddy AND LEVEL <= 2

Open in new window

Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Most Valuable Expert 2011

Commented:
Very nice, Sean! Thanks for chiming in, I didn't spot the 9i zone there which is important as you pointed out.
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
thanks, I saw this question come across my phone a few hours ago but didn't get a chance to get back to it until now.
I remembered the asker saying connect by couldn't be used and it stuck in my head "why not?"  
So I had to check in and see if the thread was still open or not.
CERTIFIED EXPERT
Expert of the Quarter 2010
Expert of the Year 2010

Commented:
http:#a33705833 would be my pick for being dead simple and nothing sophisticated or complicated (requiring specific features of this or that Oracle server version or other DBMS).
Just plain Ansi SQL.

Don't know why Mary became Sue, but I would change the Union All to just Union; Union removes duplicate results, e.g Mary->Sue->Mary.

select username, buddy from tbl_user_buddy where username ='Mary'
union
select username, buddy from tbl_user_buddy where username in
  (select buddy from tbl_user_buddy where username ='Mary' )
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
The query in 33705833 requires 3 accesses to the same table,  it's an IO hog.
Indexes may help, but those could be used by the connect by too.

Since this is an Oracle question, trying to accomodate the limitations of other platforms seems counterproductive.
Plus,  the oracle syntax is hardly complicated,  4 lines vs 4 lines.  Seems pretty simple to me

However, I'll never discourage testing.  
Definitely try all suggestions,  remember to check resource consumption as well as clock time so you can get a better idea of scaling.
The connect by version does require more sorts, but those are almost always going to be less expensive than extra io.
CERTIFIED EXPERT
Expert of the Quarter 2010
Expert of the Year 2010

Commented:
Sean,

Since you are the resident Oracle expert on this question, could you explain why you would comment that 33705833 (and 33709460 for that matter) "accesses" the table 3 times?  I thought that is by definition what CONNECT BY.. PRIOR does.  The PRIOR means that it is working off a *PRIOR* stage of the result set.  Although it is elegantly written, yet it also accesses the table 3 times unless I am mistaken?

Table access 1 : Start by "mary" - index on username (seed)
Table access 2 : Collect Mary's buddies (level 1)
Table access 3 : Collect buddies of Mary's buddies (level 2)
CERTIFIED EXPERT
Expert of the Quarter 2010
Expert of the Year 2010

Commented:
I was thinking about this while driving (as you do), and I can see now why CONNECT BY would work better.
In summary, it accesses the table twice, right?

Table access 1 : Collect Mary's buddies (level 1)
Table access 2 : Collect buddies of Mary's buddies (level 2) - except Mary
DISTINCT (in case Mary's buddies share the same buddies)

Using 33709460, you get

Table access 1 : Collect Mary's buddies
Table access 2 : Collect Mary's buddies ( again - as subquery for 3 )  <<< extra
Table access 3 : Collect the buddies of the subquery in #2
UNION (removes duplicates)

The 2nd step is required without CONNECT BY, to generate a subquery for the next level.  The query didn't exclude Mary anyway. For completeness only:

select username, buddy from tbl_user_buddy where username ='Mary'
union
select username, buddy from tbl_user_buddy where username in
  (select buddy from tbl_user_buddy where username ='Mary' )
and buddy != 'Mary'
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
In     33705833  I say it acceses the table 3 times because there are 3 distinct select statements

select user_name, buddy from user_buddies where user_name ='Sue'   -----   1
union all
select user_name, buddy from user_buddies                                          ---- 2
where user_name in  
  (select buddy from user_buddies where user_name ='Sue' )                ------- 3

even those #1 and #3 are very similar they are not identical and when I do an explain plan I see 3 accesses

the connect by only accesses the table once.  the "connect by" is an internal sorting operation.  It's certainly not a "free" operation by any means.
That's why I encourage testing all the options.  In my tests, on my database,  the connect by did less work, and corroborated my expectations. It seemed fairly obviously superior.

With the asker's data volumes, memory,hash area and indexes the other version "might" be better.  But I don't think so. It's pretty clear the triple query version
will always do more IO (consistent gets);  and I "think" that will be the greatest performance/scaling inhibitor.  I could be wrong.

I might be the "resident expert" but I've had my fair share of misreads, brain farts, typos and other mistakes; and I expect I will in the future too, so by all means test everything I say.  :)



Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Most Valuable Expert 2011

Commented:
Good points, gentlemen.  Another reason, why I had started down the path of recursive CTE (i.e., a more complex approach) is that the Author says that the levels may be increased.  For the union approach, you would have to keep growing the number of subqueries/accesses where as recursion or the connect by can handle this growth gracefully...maybe performance would be same, but efficiency of the programmer and code maintenance I believe to be better.
Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Most Valuable Expert 2011

Commented:
"...finds all of their buddies. Then I want to take the result set and find all the buddies for each person in the result set; and possibly do it once more."

I took that as:
(1) find Mary's buddies;
(2) find Mary's buddies' buddies;
(3) find Mary's buddies' buddies' buddies!

phew! That is why recursion was first to come to mind, but as Sean said for 9i connect by has to be used and this is a valid example of hierarchy.  Yes, the above can be written with UNION, but why when later we may add (4).

Author

Commented:
Thanks to everyone for the numerous comments and variety of approaches. I inadvertently listed this as a 9i question, when in fact it is a 10g environment, so the CONNECT BY PRIOR NOCYCLE approach may be the most appropriate option. I will test it this afternoon and post the results.
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
If you might be going beyond 2 levels then you MUST use NOCYCLE on the connect by.

If you don't, then you need to use some other strategies, in 9i code this could be very tricky to go beyond 2 levels

Author

Commented:
The SQL in 33706485 seems to do the trick, but my table structure is a little different than the original set up. There are in fact, 2 tables. The first is an ACCOUNT table which lists an account_id and username for each user. the second is the BUDDY table, which has a username (the user) and an account_id (which is the buddy's account_id and FK). The structure of the table seems a little backward to me, but I'm stuck with it. Here is a sample:

ACCOUNT
ACCOUNT_ID    USERNAME
            1                MARY
            2                 BOB
            3                 SUE
            4                 BILL
            5                  JOE

BUDDY
ACCOUNT_ID     USERNAME
            2                   MARY
            3                   MARY
            4                   MARY
            4                   SUE
            5                   SUE

So, Mary has Bob, Sue, and Bill (2,3,4) as buddies and Sue has Bill and Joe (4,5) as buddies.

I'm having trouble appying the CONNECT BY in the join of these tables.
Database Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
oops,  "buddies"  not "buddylist"

Author

Commented:
I have a similar, but more complex scenario this time. I'm trying to find the door a person went through and then all the other people that went through those doors, and then the people that went through the doors that the people above them did. Here is the sample tables:

DOOR    PERSON
3       a
3       b
3       c
3       d

4       a
4       c
4       x
4       y

5       b
5       d

6       x
6       z

7       c

So we start with Person "a" going through door 3.
Who else went through door 3? "b, c, d"
Now we want to know what other doors "b, c, and d" went through. "b and d" went thorough 4 and "c" went through 7.
Continuing the iteration, we want to know what doors "x & y" went through. And so on... I hope this makes sense.

This may be another representation:

a --> b <--> d
   --> c <--> a
             --> x --> z
             --> y
   --> d
Sean StuberDatabase Developer & Administrator
CERTIFIED EXPERT
Most Valuable Expert 2011
Top Expert 2012

Commented:
please click the "ask a related question" link and open a new question for this addendum
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.