Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


SQL Server Self Join on View

Posted on 2014-02-26
Medium Priority
Last Modified: 2014-02-27
Experts, I am attempting to execute a query that uses a self join on a view. The view is not an indexed view.

The query runs just fine. However it produces more rows than actually exist in the data.

select a.*
      from [dbo].[vwALL_SOURCES] a
      --LEFT join [dbo].[vwALL_SOURCES] b
      --ON a.resortid = b.resortid
      where a.resortid = '296'
      AND (a.points is not null)
      AND (a.points > 0)
      AND (a.price is not null)
      AND      (a.price > 0)

When the query is run as above with the join commented out it returns 13 rows - that's what's actually in the data.

However, when the join code is added to the query, it returns 2964 rows or 228 rows for each of the 13 rows in the data.
It makes no difference whether the query is run as LEFT JOIN, INNER JOIN, or RIGHT JOIN. The result set is the same.

What is likely to be causing this to happen?

Thanks in advance.
Question by:bobinorlando
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 23

Expert Comment

by:Steve Wales
ID: 39890672
What is the value returned from this query:

select count(*) from vwALL_SOURCES where resortid = '296'

Is it 228 by chance ?

If so, I'm pretty sure that your 13 rows returned from the first part of the query are joining with each of the 228 rows of your join condition to get you  your 2964 rows.

I tested by doing this:

create table vwALL_SOURCES (resortid char(3), price numeric, points numeric)
insert into vwALL_SOURCES values ('296',1,2)
go 13
insert into vwALL_SOURCES values ('296',null,null)
go 215

Open in new window

This created me 228 rows, 13 where there are a value in price/points and 215 with nulls for 228 total.

Running your query without the join got me 13 rows.
Running your query with the join got me 2964 rows.

Is there some other column in the view you can use to restrict the join condition ?

A join will match each row in the first table/view in the join with each row returned in the result set for the second table/view in the join.  Since you aren't restricting anything in what is returned on the "b" part of the join, you have 13 in the "a" part and 228 in the "b" part.

13 x 228 = 2964.

Either make your join condition more restrictive or add extra conditions to the where part of the query on table b to restrict your results returned - but even then, unless you can be more restricting on the join, it's still going to multiply the 13 in the first with however many you return from the second view unless you can make your join condition more distinct.
LVL 49

Assisted Solution

PortletPaul earned 400 total points
ID: 39890717
what do you hope to achieve by the self join?

Author Comment

ID: 39890930
You said:

>>>I'm pretty sure that your 13 rows returned from the first part of the query are joining with each of the 228 rows of your join condition to get you  your 2964 rows.<<<

Yes you are correct that is what is happening.  What I don't understand is why. For example,  Only a RIGHT JOIN or RIGHT OUTER JOIN should return all rows in b that match the rows in table a. A LEFT JOIN or an INNER JOIN condition should restrict the b table rows to only those that match the a table rows. JOINING ON a.resortid = b.resortid should do the restricting to prevent getting all 228 rows from b for every row in table a. But it is not working that way.
Learn Veeam advantages over legacy backup

Every day, more and more legacy backup customers switch to Veeam. Technologies designed for the client-server era cannot restore any IT service running in the hybrid cloud within seconds. Learn top Veeam advantages over legacy backup and get Veeam for the price of your renewal

LVL 23

Accepted Solution

Steve Wales earned 1200 total points
ID: 39890946
If there are 228 rows from b that have resortid = 296 and 13 rows from a that have resortid=296 then the join is working correctly.

A join matches every row from "a" with every row from "b" on the join condition.

Since your join condition is just the resortid, then you get 13 x 228.

If there was a date or something else involved, where you could narrow the match down to a 1:1 then you'd get a different result.

Also, you seem to misunderstand what LEFT/RIGHT join is.

Go back to elementary school mathematics and think of a Venn Diagram.  You have two intersecting circles.  We'll call the left circle A, the right circle B and the intersection C.

A JOIN (also known as INNER JOIN) returns C - those rows appearing in the intersection only.

A LEFT JOIN (aka LEFT OUTER JOIN) will return C (the intersection) and the rest of A that doesn't appear in C.

A RIGHT JOIN (aka RIGHT OUTER JOIN) will return C (the intersection) and the rest of B that doesn't appear in C.

A FULL JOIN (FULL OUTER JOIN) returns C, the part of A that doesn't match and the part of C that doesn't match.

(I actually blogged about this a couple of years ago, with examples if you want to see how joins work : http://www.dba-in-exile.com/2011/10/another-way-of-looking-at-joins.html)

Since you don't have another condition to limit the scope of what's returned from the join, the result set is exactly what it is supposed to be.

Some extra references.

The Wikipedia Article on JOIN (for what it's worth) (link) defines an inner join as:

Inner join creates a new result table by combining column values of two tables (A and B) based upon the join-predicate. The query compares each row of A with each row of B to find all pairs of rows which satisfy the join-predicate. When the join-predicate is satisfied, column values for each matched pair of rows of A and B are combined into a result row.

Another reference is this link: http://www.w3schools.com/sql/sql_join.asp

It notes (emphasis mine): INNER JOIN: Returns all rows when there is at least one match in BOTH tables

Assisted Solution

by:Andrei Fomitchev
Andrei Fomitchev earned 400 total points
ID: 39891111
The issue came from the fact that a.resortid is not an unique id of a resort.

If you use self join for 1:1 it works.
If you use self join for 10:10 - it produces 10*10 = 100 rows.

In your case a.resortid = '296' produces 13 rows for a.
If you add b.resortid = '296' it will produce 13 rows for b.
JOIN will produce 13*13 = 169 rows.

Without b.resortid = '296' b will produce ~200 rows.
JOIN will produce 13*200 = 2600 rows.

Try this:

SELECT * FROM table b WHERE b.resortid IN (
      SELECT DISTINCT a.resortid FROM table a WHERE <coditions on a>

Author Comment

ID: 39891580
Steve, thank you but as I indicated, regardless of what  type of join I specify in the query, for each row in a I'm getting all the rows in b for the same resortid in other words (and consistent with the definitions you kindly provided as well), a RIGHT OUTER JOIN.

Fomand  - <<The issue came from the fact that a.resortid is not a unique id of a resort.>>

So is it happening because I'm using a view not an actual table and the view is not indexed (i.e. there is no primary key, no uniqueid, no identity col for each row, no clustered index on the view) and therefore nothing unique to join against?

Try this:

SELECT * FROM table b WHERE b.resortid IN (
      SELECT DISTINCT a.resortid FROM table a WHERE <coditions on a>

This is my version of your query.
When I run only the sub query alone I get 296 as expected.
When I run the query with where conditions only on a  I get 228 rows.
When I add the same where conditions on b as on a (other than specifying the resortid), I get 13 rows as desired.

      FROM [dbo].[vwALL_SOURCES] b
      WHERE b.resortid IN (
                                     SELECT DISTINCT a.resortid
                                     FROM [dbo].[vwALL_SOURCES] a
                                      where a.resortid = '296'
                                     AND (a.points is not null)
                                     AND (a.points > 0)
                                     AND (a.price is not null)
                                     AND (a.price > 0)

AND (b.points is not null)
AND (b.points > 0)
AND (b.price is not null)
AND (b.price > 0)
LVL 23

Expert Comment

by:Steve Wales
ID: 39891766

It's not happening because you're using a view.  Check my examples in my first reply - I did that by creating a table.

The bottom line is that the query as specified isn't unique enough on the join.

PortletPaul asked in his reply way about, "What do you hope to achieve with the self join?".

Maybe answering that can give us a better understanding of the requirement.

What is the actual requirement of your query?  What are the other columns in the view?

It seems to be based upon what you've explained so far that the join is redundant - unless there's extra here you're not telling us, you're not gaining anything by doing a join for this query.  You can get the information from the table without the join.

The key piece here that you don't seem to understand is how the join works.

The results you're getting is exactly how a relational join works.  You need to be able to get the join condition down to a state where you can have a 1:1 or 1:many match or else you'll get this multiplication effect in your final result set.

Since you have a many to many relationship in your join condition what you're getting is the expected behavior - it's just how databases work.

Can you explain, then, what it is that you are trying to do and what other columns there are in the view that could help narrow down your a vs b result sets.

Oh and it's not a RIGHT OUTER JOIN that you're getting.  I'll go back to the Wikipedia definition of Inner Join for a moment and finish the paragraph that I quoted earlier.

The result of the join can be defined as the outcome of first taking the Cartesian product (or Cross join) of all records in the tables (combining every record in table A with every record in table B) and then returning all records which satisfy the join predicate.

This is exactly what's happening in your case since you have a multi:multi join condition, you're getting a cartesian product result (a * b).

Author Comment

ID: 39892368
Ok thanks Steve I appreciate that.

As to the purpose of the join, I am debugging a larger query in which this join appears.
I now see that eliminating the reference to the b table for the one column and taking that column from a solves the issue with the multiple data and eliminates the need for the join.

Thanks all for the help.

Featured Post

Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
It is possible to export the data of a SQL Table in SSMS and generate INSERT statements. It's neatly tucked away in the generate scripts option of a database.
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Viewers will learn how the fundamental information of how to create a table.

598 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question