Solved

I need a SQL statement to select a person and their buddies and their buddies buddies

Posted on 2010-09-17
21
519 Views
Last Modified: 2013-12-19
I have a table in Oracle that contains buddy list information. It contains the user's buddies. The columns are "username" and "buddy". Data looks like this (listing Mary and Sue's buddies:

USERNAME   BUDDY
Mary              Bob
Mary              Sue
Mary              Bill
Sue                Bill
Sue                Joe

 I am trying to write a query that starts with a specific username and finds all of their buddies. Then I want to take the result set and find all the buddies for each person in the result set; and possibly do it once more. I would like to do this in a single SQL statement. I thought it might be a CONNECT BY query, but this isn't really a hierarchy, since everyone could be a parent and a child of everyone else.
0
Comment
Question by:rostara
  • 8
  • 5
  • 3
  • +3
21 Comments
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 33705199
You can use a recursive common table expression (cte -- WITH).
http://iggyfernandez.wordpress.com/2010/02/28/i-heart-recursive-subquery-factoring/

Kdo wrote this EE article for DB2, but the explanation of CTE recursion should be portable to Oracle and aid in your understanding of the concept.
http://www.experts-exchange.com/Database/DB2/A_3618-Recursive-SQL-in-DB2-Converting-rows-to-columns.html

See below.  Note you can change ub.username to bl.username if you want the results to come reflect all your buddies as friends of whomever you selected first, but given you are selecting by a specific username you would probably just do a final select of buddy anyway...

You can add in a level column that starts at 0 or 1 for the first select and then in the recursive piece do level+1 so you can control how many levels deep you go.  I am more of a MS SQL person where there is a MAXRECURSION option that I can't remember if is available for Oracle.

Hope that makes sense.
with buddyList as (
   select username, buddy
   from user_buddies
   where username = 'Mary'

   union all 

   select ub.username, ub.buddy
   from buddyList bl
   join user_buddies ub on ub.username = bl.buddy
)
select distinct buddy
from buddyList
;

Open in new window

0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 33705245
And as Kdo indicates in if you think there is a possibility for an infinite loop just be careful.  For example if Mary is the buddy of one of her buddies this could end up in a loop.

At least with top level person it is easy to filter as you can add something like this:
with buddyList as (
   select username, buddy
   from user_buddies
   where username = 'Mary'

   union all 

   select ub.username, ub.buddy
   from buddyList bl
   join user_buddies ub on ub.username = bl.buddy
   and ub.buddy <> bl.username
)
select distinct buddy
from buddyList
;

Open in new window

0
 
LVL 3

Expert Comment

by:rgeers
ID: 33705246
You mean something like:

select BUDDY from BUDDYLIST where USERNAME=(select BUDDY from BUDDYLIST where USERNAME =$USER)
0
 
LVL 15

Expert Comment

by:Devinder Singh Virdi
ID: 33705833
What about this:-

select user_name, buddy from user_buddies where user_name ='Sue'
union all
select user_name, buddy from user_buddies where user_name in
  (select buddy from user_buddies where user_name ='Sue' )

0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33706485
unfortunately, recursive WITH clauses aren't supported in Oracle until 11gR2 and this is a 9i question

since this IS a hierarchy you can do it with CONNECT BY.


SELECT buddy
  FROM buddies
  START WITH  username = 'Mary'
  CONNECT BY username  = PRIOR buddy;


however, this assumes you won't have any loops in your data
for instance,  if Bob is Mary's buddy,  is Mary  Bob's buddy?

assuming she is, or a loop might be caused later such as Joe having Mary as a buddy.

Normally I'd suggest using CONNECT BY NOCYCLE to address this issue, but NOCYCLE doesn't exist until 10g and this is a 9i question

But, this is also easily solved, simply remove your starting user from the connect by clause, and limiting the level which you wan't to
do anyway since you are only going as deep as buddies' buddies  (level 2)


and that leaves you with this...



SELECT DISTINCT buddy

      FROM buddies

START WITH username = 'Mary'

CONNECT BY buddy != 'Mary' AND username = PRIOR buddy AND LEVEL <= 2

Open in new window

0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 33706499
Very nice, Sean! Thanks for chiming in, I didn't spot the 9i zone there which is important as you pointed out.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33706522
thanks, I saw this question come across my phone a few hours ago but didn't get a chance to get back to it until now.
I remembered the asker saying connect by couldn't be used and it stuck in my head "why not?"  
So I had to check in and see if the thread was still open or not.
0
 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33709460
http:#a33705833 would be my pick for being dead simple and nothing sophisticated or complicated (requiring specific features of this or that Oracle server version or other DBMS).
Just plain Ansi SQL.

Don't know why Mary became Sue, but I would change the Union All to just Union; Union removes duplicate results, e.g Mary->Sue->Mary.

select username, buddy from tbl_user_buddy where username ='Mary'
union
select username, buddy from tbl_user_buddy where username in
  (select buddy from tbl_user_buddy where username ='Mary' )
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33709739
The query in 33705833 requires 3 accesses to the same table,  it's an IO hog.
Indexes may help, but those could be used by the connect by too.

Since this is an Oracle question, trying to accomodate the limitations of other platforms seems counterproductive.
Plus,  the oracle syntax is hardly complicated,  4 lines vs 4 lines.  Seems pretty simple to me

However, I'll never discourage testing.  
Definitely try all suggestions,  remember to check resource consumption as well as clock time so you can get a better idea of scaling.
The connect by version does require more sorts, but those are almost always going to be less expensive than extra io.
0
 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33710179
Sean,

Since you are the resident Oracle expert on this question, could you explain why you would comment that 33705833 (and 33709460 for that matter) "accesses" the table 3 times?  I thought that is by definition what CONNECT BY.. PRIOR does.  The PRIOR means that it is working off a *PRIOR* stage of the result set.  Although it is elegantly written, yet it also accesses the table 3 times unless I am mistaken?

Table access 1 : Start by "mary" - index on username (seed)
Table access 2 : Collect Mary's buddies (level 1)
Table access 3 : Collect buddies of Mary's buddies (level 2)
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33710289
I was thinking about this while driving (as you do), and I can see now why CONNECT BY would work better.
In summary, it accesses the table twice, right?

Table access 1 : Collect Mary's buddies (level 1)
Table access 2 : Collect buddies of Mary's buddies (level 2) - except Mary
DISTINCT (in case Mary's buddies share the same buddies)

Using 33709460, you get

Table access 1 : Collect Mary's buddies
Table access 2 : Collect Mary's buddies ( again - as subquery for 3 )  <<< extra
Table access 3 : Collect the buddies of the subquery in #2
UNION (removes duplicates)

The 2nd step is required without CONNECT BY, to generate a subquery for the next level.  The query didn't exclude Mary anyway. For completeness only:

select username, buddy from tbl_user_buddy where username ='Mary'
union
select username, buddy from tbl_user_buddy where username in
  (select buddy from tbl_user_buddy where username ='Mary' )
and buddy != 'Mary'
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33711467
In     33705833  I say it acceses the table 3 times because there are 3 distinct select statements

select user_name, buddy from user_buddies where user_name ='Sue'   -----   1
union all
select user_name, buddy from user_buddies                                          ---- 2
where user_name in  
  (select buddy from user_buddies where user_name ='Sue' )                ------- 3

even those #1 and #3 are very similar they are not identical and when I do an explain plan I see 3 accesses

the connect by only accesses the table once.  the "connect by" is an internal sorting operation.  It's certainly not a "free" operation by any means.
That's why I encourage testing all the options.  In my tests, on my database,  the connect by did less work, and corroborated my expectations. It seemed fairly obviously superior.

With the asker's data volumes, memory,hash area and indexes the other version "might" be better.  But I don't think so. It's pretty clear the triple query version
will always do more IO (consistent gets);  and I "think" that will be the greatest performance/scaling inhibitor.  I could be wrong.

I might be the "resident expert" but I've had my fair share of misreads, brain farts, typos and other mistakes; and I expect I will in the future too, so by all means test everything I say.  :)



0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 33711491
Good points, gentlemen.  Another reason, why I had started down the path of recursive CTE (i.e., a more complex approach) is that the Author says that the levels may be increased.  For the union approach, you would have to keep growing the number of subqueries/accesses where as recursion or the connect by can handle this growth gracefully...maybe performance would be same, but efficiency of the programmer and code maintenance I believe to be better.
0
 
LVL 59

Expert Comment

by:Kevin Cross
ID: 33711503
"...finds all of their buddies. Then I want to take the result set and find all the buddies for each person in the result set; and possibly do it once more."

I took that as:
(1) find Mary's buddies;
(2) find Mary's buddies' buddies;
(3) find Mary's buddies' buddies' buddies!

phew! That is why recursion was first to come to mind, but as Sean said for 9i connect by has to be used and this is a valid example of hierarchy.  Yes, the above can be written with UNION, but why when later we may add (4).
0
 

Author Comment

by:rostara
ID: 33711562
Thanks to everyone for the numerous comments and variety of approaches. I inadvertently listed this as a 9i question, when in fact it is a 10g environment, so the CONNECT BY PRIOR NOCYCLE approach may be the most appropriate option. I will test it this afternoon and post the results.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33712287
If you might be going beyond 2 levels then you MUST use NOCYCLE on the connect by.

If you don't, then you need to use some other strategies, in 9i code this could be very tricky to go beyond 2 levels
0
 

Author Comment

by:rostara
ID: 33716174
The SQL in 33706485 seems to do the trick, but my table structure is a little different than the original set up. There are in fact, 2 tables. The first is an ACCOUNT table which lists an account_id and username for each user. the second is the BUDDY table, which has a username (the user) and an account_id (which is the buddy's account_id and FK). The structure of the table seems a little backward to me, but I'm stuck with it. Here is a sample:

ACCOUNT
ACCOUNT_ID    USERNAME
            1                MARY
            2                 BOB
            3                 SUE
            4                 BILL
            5                  JOE

BUDDY
ACCOUNT_ID     USERNAME
            2                   MARY
            3                   MARY
            4                   MARY
            4                   SUE
            5                   SUE

So, Mary has Bob, Sue, and Bill (2,3,4) as buddies and Sue has Bill and Joe (4,5) as buddies.

I'm having trouble appying the CONNECT BY in the join of these tables.
0
 
LVL 73

Accepted Solution

by:
sdstuber earned 500 total points
ID: 33716406
where I have "buddylist" above  put this...

(select b.username, a.username buddy
from account a, buddy b
where a.account_id = b.account_id)
SELECT DISTINCT buddy

FROM (SELECT b.username, a.username buddy

      FROM account a, buddy b

      WHERE a.account_id = b.account_id)

START WITH username = 'MARY'

CONNECT BY buddy != 'MARY' AND username = PRIOR buddy AND LEVEL <= 2

Open in new window

0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33716417
oops,  "buddies"  not "buddylist"
0
 

Author Comment

by:rostara
ID: 33986328
I have a similar, but more complex scenario this time. I'm trying to find the door a person went through and then all the other people that went through those doors, and then the people that went through the doors that the people above them did. Here is the sample tables:

DOOR    PERSON
3       a
3       b
3       c
3       d

4       a
4       c
4       x
4       y

5       b
5       d

6       x
6       z

7       c

So we start with Person "a" going through door 3.
Who else went through door 3? "b, c, d"
Now we want to know what other doors "b, c, and d" went through. "b and d" went thorough 4 and "c" went through 7.
Continuing the iteration, we want to know what doors "x & y" went through. And so on... I hope this makes sense.

This may be another representation:

a --> b <--> d
   --> c <--> a
             --> x --> z
             --> y
   --> d
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 33988798
please click the "ask a related question" link and open a new question for this addendum
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Have you ever had to make fundamental changes to a table in Oracle, but haven't been able to get any downtime?  I'm talking things like: * Dropping columns * Shrinking allocated space * Removing chained blocks and restoring the PCTFREE * Re-or…
From implementing a password expiration date, to datatype conversions and file export options, these are some useful settings I've found in Jasper Server.
This video shows setup options and the basic steps and syntax for duplicating (cloning) a database from one instance to another. Examples are given for duplicating to the same machine and to different machines
This video shows how to Export data from an Oracle database using the Datapump Export Utility.  The corresponding Datapump Import utility is also discussed and demonstrated.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now