Link to home
Start Free TrialLog in
Avatar of bobinorlando
bobinorlando

asked on

SQL Server Self Join on View

Experts, I am attempting to execute a query that uses a self join on a view. The view is not an indexed view.

The query runs just fine. However it produces more rows than actually exist in the data.

select a.*
      from [dbo].[vwALL_SOURCES] a
      --LEFT join [dbo].[vwALL_SOURCES] b
      --ON a.resortid = b.resortid
      where a.resortid = '296'
      AND (a.points is not null)
      AND (a.points > 0)
      AND (a.price is not null)
      AND      (a.price > 0)

When the query is run as above with the join commented out it returns 13 rows - that's what's actually in the data.

However, when the join code is added to the query, it returns 2964 rows or 228 rows for each of the 13 rows in the data.
 
It makes no difference whether the query is run as LEFT JOIN, INNER JOIN, or RIGHT JOIN. The result set is the same.

What is likely to be causing this to happen?


Thanks in advance.
Avatar of Steve Wales
Steve Wales
Flag of United States of America image

What is the value returned from this query:

select count(*) from vwALL_SOURCES where resortid = '296'

Is it 228 by chance ?

If so, I'm pretty sure that your 13 rows returned from the first part of the query are joining with each of the 228 rows of your join condition to get you  your 2964 rows.

I tested by doing this:

create table vwALL_SOURCES (resortid char(3), price numeric, points numeric)
go
insert into vwALL_SOURCES values ('296',1,2)
go 13
insert into vwALL_SOURCES values ('296',null,null)
go 215

Open in new window


This created me 228 rows, 13 where there are a value in price/points and 215 with nulls for 228 total.

Running your query without the join got me 13 rows.
Running your query with the join got me 2964 rows.

Is there some other column in the view you can use to restrict the join condition ?

A join will match each row in the first table/view in the join with each row returned in the result set for the second table/view in the join.  Since you aren't restricting anything in what is returned on the "b" part of the join, you have 13 in the "a" part and 228 in the "b" part.

13 x 228 = 2964.

Either make your join condition more restrictive or add extra conditions to the where part of the query on table b to restrict your results returned - but even then, unless you can be more restricting on the join, it's still going to multiply the 13 in the first with however many you return from the second view unless you can make your join condition more distinct.
SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bobinorlando
bobinorlando

ASKER

You said:

>>>I'm pretty sure that your 13 rows returned from the first part of the query are joining with each of the 228 rows of your join condition to get you  your 2964 rows.<<<

Yes you are correct that is what is happening.  What I don't understand is why. For example,  Only a RIGHT JOIN or RIGHT OUTER JOIN should return all rows in b that match the rows in table a. A LEFT JOIN or an INNER JOIN condition should restrict the b table rows to only those that match the a table rows. JOINING ON a.resortid = b.resortid should do the restricting to prevent getting all 228 rows from b for every row in table a. But it is not working that way.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Steve, thank you but as I indicated, regardless of what  type of join I specify in the query, for each row in a I'm getting all the rows in b for the same resortid in other words (and consistent with the definitions you kindly provided as well), a RIGHT OUTER JOIN.

Fomand  - <<The issue came from the fact that a.resortid is not a unique id of a resort.>>

So is it happening because I'm using a view not an actual table and the view is not indexed (i.e. there is no primary key, no uniqueid, no identity col for each row, no clustered index on the view) and therefore nothing unique to join against?

>>
Try this:

SELECT * FROM table b WHERE b.resortid IN (
      SELECT DISTINCT a.resortid FROM table a WHERE <coditions on a>
)
<<

This is my version of your query.
When I run only the sub query alone I get 296 as expected.
When I run the query with where conditions only on a  I get 228 rows.
When I add the same where conditions on b as on a (other than specifying the resortid), I get 13 rows as desired.

SELECT *
      FROM [dbo].[vwALL_SOURCES] b
      WHERE b.resortid IN (
                                     SELECT DISTINCT a.resortid
                                     FROM [dbo].[vwALL_SOURCES] a
                                      where a.resortid = '296'
                                     AND (a.points is not null)
                                     AND (a.points > 0)
                                     AND (a.price is not null)
                                     AND (a.price > 0)
                                     )

AND (b.points is not null)
AND (b.points > 0)
AND (b.price is not null)
AND (b.price > 0)
Bob,

It's not happening because you're using a view.  Check my examples in my first reply - I did that by creating a table.

The bottom line is that the query as specified isn't unique enough on the join.

PortletPaul asked in his reply way about, "What do you hope to achieve with the self join?".

Maybe answering that can give us a better understanding of the requirement.

What is the actual requirement of your query?  What are the other columns in the view?

It seems to be based upon what you've explained so far that the join is redundant - unless there's extra here you're not telling us, you're not gaining anything by doing a join for this query.  You can get the information from the table without the join.

The key piece here that you don't seem to understand is how the join works.

The results you're getting is exactly how a relational join works.  You need to be able to get the join condition down to a state where you can have a 1:1 or 1:many match or else you'll get this multiplication effect in your final result set.

Since you have a many to many relationship in your join condition what you're getting is the expected behavior - it's just how databases work.

Can you explain, then, what it is that you are trying to do and what other columns there are in the view that could help narrow down your a vs b result sets.

Oh and it's not a RIGHT OUTER JOIN that you're getting.  I'll go back to the Wikipedia definition of Inner Join for a moment and finish the paragraph that I quoted earlier.

The result of the join can be defined as the outcome of first taking the Cartesian product (or Cross join) of all records in the tables (combining every record in table A with every record in table B) and then returning all records which satisfy the join predicate.

This is exactly what's happening in your case since you have a multi:multi join condition, you're getting a cartesian product result (a * b).
Ok thanks Steve I appreciate that.

As to the purpose of the join, I am debugging a larger query in which this join appears.
I now see that eliminating the reference to the b table for the one column and taking that column from a solves the issue with the multiple data and eliminates the need for the join.

Thanks all for the help.