Solved

SQL Server Self Join on View

Posted on 2014-02-26
8
301 Views
Last Modified: 2014-02-27
Experts, I am attempting to execute a query that uses a self join on a view. The view is not an indexed view.

The query runs just fine. However it produces more rows than actually exist in the data.

select a.*
      from [dbo].[vwALL_SOURCES] a
      --LEFT join [dbo].[vwALL_SOURCES] b
      --ON a.resortid = b.resortid
      where a.resortid = '296'
      AND (a.points is not null)
      AND (a.points > 0)
      AND (a.price is not null)
      AND      (a.price > 0)

When the query is run as above with the join commented out it returns 13 rows - that's what's actually in the data.

However, when the join code is added to the query, it returns 2964 rows or 228 rows for each of the 13 rows in the data.
 
It makes no difference whether the query is run as LEFT JOIN, INNER JOIN, or RIGHT JOIN. The result set is the same.

What is likely to be causing this to happen?


Thanks in advance.
0
Comment
Question by:bobinorlando
8 Comments
 
LVL 22

Expert Comment

by:Steve Wales
ID: 39890672
What is the value returned from this query:

select count(*) from vwALL_SOURCES where resortid = '296'

Is it 228 by chance ?

If so, I'm pretty sure that your 13 rows returned from the first part of the query are joining with each of the 228 rows of your join condition to get you  your 2964 rows.

I tested by doing this:

create table vwALL_SOURCES (resortid char(3), price numeric, points numeric)
go
insert into vwALL_SOURCES values ('296',1,2)
go 13
insert into vwALL_SOURCES values ('296',null,null)
go 215

Open in new window


This created me 228 rows, 13 where there are a value in price/points and 215 with nulls for 228 total.

Running your query without the join got me 13 rows.
Running your query with the join got me 2964 rows.

Is there some other column in the view you can use to restrict the join condition ?

A join will match each row in the first table/view in the join with each row returned in the result set for the second table/view in the join.  Since you aren't restricting anything in what is returned on the "b" part of the join, you have 13 in the "a" part and 228 in the "b" part.

13 x 228 = 2964.

Either make your join condition more restrictive or add extra conditions to the where part of the query on table b to restrict your results returned - but even then, unless you can be more restricting on the join, it's still going to multiply the 13 in the first with however many you return from the second view unless you can make your join condition more distinct.
0
 
LVL 48

Assisted Solution

by:PortletPaul
PortletPaul earned 100 total points
ID: 39890717
what do you hope to achieve by the self join?
0
 
LVL 1

Author Comment

by:bobinorlando
ID: 39890930
You said:

>>>I'm pretty sure that your 13 rows returned from the first part of the query are joining with each of the 228 rows of your join condition to get you  your 2964 rows.<<<

Yes you are correct that is what is happening.  What I don't understand is why. For example,  Only a RIGHT JOIN or RIGHT OUTER JOIN should return all rows in b that match the rows in table a. A LEFT JOIN or an INNER JOIN condition should restrict the b table rows to only those that match the a table rows. JOINING ON a.resortid = b.resortid should do the restricting to prevent getting all 228 rows from b for every row in table a. But it is not working that way.
0
 
LVL 22

Accepted Solution

by:
Steve Wales earned 300 total points
ID: 39890946
If there are 228 rows from b that have resortid = 296 and 13 rows from a that have resortid=296 then the join is working correctly.

A join matches every row from "a" with every row from "b" on the join condition.

Since your join condition is just the resortid, then you get 13 x 228.

If there was a date or something else involved, where you could narrow the match down to a 1:1 then you'd get a different result.

Also, you seem to misunderstand what LEFT/RIGHT join is.

Go back to elementary school mathematics and think of a Venn Diagram.  You have two intersecting circles.  We'll call the left circle A, the right circle B and the intersection C.

A JOIN (also known as INNER JOIN) returns C - those rows appearing in the intersection only.

A LEFT JOIN (aka LEFT OUTER JOIN) will return C (the intersection) and the rest of A that doesn't appear in C.

A RIGHT JOIN (aka RIGHT OUTER JOIN) will return C (the intersection) and the rest of B that doesn't appear in C.

A FULL JOIN (FULL OUTER JOIN) returns C, the part of A that doesn't match and the part of C that doesn't match.

(I actually blogged about this a couple of years ago, with examples if you want to see how joins work : http://www.dba-in-exile.com/2011/10/another-way-of-looking-at-joins.html)

Since you don't have another condition to limit the scope of what's returned from the join, the result set is exactly what it is supposed to be.

Some extra references.

The Wikipedia Article on JOIN (for what it's worth) (link) defines an inner join as:

Inner join creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows which satisfy the join-predicate. When the join-predicate is satisfied, column values for each matched pair of rows of A and B are combined into a result row.

Another reference is this link: http://www.w3schools.com/sql/sql_join.asp

It notes (emphasis mine): INNER JOIN: Returns all rows when there is at least one match in BOTH tables
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 8

Assisted Solution

by:Andrei Fomitchev
Andrei Fomitchev earned 100 total points
ID: 39891111
The issue came from the fact that a.resortid is not an unique id of a resort.

If you use self join for 1:1 it works.
If you use self join for 10:10 - it produces 10*10 = 100 rows.

In your case a.resortid = '296' produces 13 rows for a.
If you add b.resortid = '296' it will produce 13 rows for b.
JOIN will produce 13*13 = 169 rows.

Without b.resortid = '296' b will produce ~200 rows.
JOIN will produce 13*200 = 2600 rows.

Try this:

SELECT * FROM table b WHERE b.resortid IN (
      SELECT DISTINCT a.resortid FROM table a WHERE <coditions on a>
)
0
 
LVL 1

Author Comment

by:bobinorlando
ID: 39891580
Steve, thank you but as I indicated, regardless of what  type of join I specify in the query, for each row in a I'm getting all the rows in b for the same resortid in other words (and consistent with the definitions you kindly provided as well), a RIGHT OUTER JOIN.

Fomand  - <<The issue came from the fact that a.resortid is not a unique id of a resort.>>

So is it happening because I'm using a view not an actual table and the view is not indexed (i.e. there is no primary key, no uniqueid, no identity col for each row, no clustered index on the view) and therefore nothing unique to join against?

>>
Try this:

SELECT * FROM table b WHERE b.resortid IN (
      SELECT DISTINCT a.resortid FROM table a WHERE <coditions on a>
)
<<

This is my version of your query.
When I run only the sub query alone I get 296 as expected.
When I run the query with where conditions only on a  I get 228 rows.
When I add the same where conditions on b as on a (other than specifying the resortid), I get 13 rows as desired.

SELECT *
      FROM [dbo].[vwALL_SOURCES] b
      WHERE b.resortid IN (
                                     SELECT DISTINCT a.resortid
                                     FROM [dbo].[vwALL_SOURCES] a
                                      where a.resortid = '296'
                                     AND (a.points is not null)
                                     AND (a.points > 0)
                                     AND (a.price is not null)
                                     AND (a.price > 0)
                                     )

AND (b.points is not null)
AND (b.points > 0)
AND (b.price is not null)
AND (b.price > 0)
0
 
LVL 22

Expert Comment

by:Steve Wales
ID: 39891766
Bob,

It's not happening because you're using a view.  Check my examples in my first reply - I did that by creating a table.

The bottom line is that the query as specified isn't unique enough on the join.

PortletPaul asked in his reply way about, "What do you hope to achieve with the self join?".

Maybe answering that can give us a better understanding of the requirement.

What is the actual requirement of your query?  What are the other columns in the view?

It seems to be based upon what you've explained so far that the join is redundant - unless there's extra here you're not telling us, you're not gaining anything by doing a join for this query.  You can get the information from the table without the join.

The key piece here that you don't seem to understand is how the join works.

The results you're getting is exactly how a relational join works.  You need to be able to get the join condition down to a state where you can have a 1:1 or 1:many match or else you'll get this multiplication effect in your final result set.

Since you have a many to many relationship in your join condition what you're getting is the expected behavior - it's just how databases work.

Can you explain, then, what it is that you are trying to do and what other columns there are in the view that could help narrow down your a vs b result sets.

Oh and it's not a RIGHT OUTER JOIN that you're getting.  I'll go back to the Wikipedia definition of Inner Join for a moment and finish the paragraph that I quoted earlier.

The result of the join can be defined as the outcome of first taking the Cartesian product (or Cross join) of all records in the tables (combining every record in table A with every record in table B) and then returning all records which satisfy the join predicate.

This is exactly what's happening in your case since you have a multi:multi join condition, you're getting a cartesian product result (a * b).
0
 
LVL 1

Author Comment

by:bobinorlando
ID: 39892368
Ok thanks Steve I appreciate that.

As to the purpose of the join, I am debugging a larger query in which this join appears.
I now see that eliminating the reference to the b table for the one column and taking that column from a solves the issue with the multiple data and eliminates the need for the join.

Thanks all for the help.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Introduced in Microsoft SQL Server 2005, the Copy Database Wizard (http://msdn.microsoft.com/en-us/library/ms188664.aspx) is useful in copying databases and associated objects between SQL instances; therefore, it is a good migration and upgrade tool…
Having an SQL database can be a big investment for a small company. Hardware, setup and of course, the price of software all add up to a big bill that some companies may not be able to absorb.  Luckily, there is a free version SQL Express, but does …
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now