• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 418
  • Last Modified:

Using Exist vs Join give different results

I have included 2 queries below that seems to me like they should give the same results, but the "Join" query gives returns more records (I believe duplidates) than the Exists query. I am more interested in the theory as to why ths is so than solving this particular example, it seems to me like they both shoud return the same records. This is querying an Oracle DB btw.
Exists query:
SELECT *
FROM   orgmstee org,
       prdmstee prd
WHERE   EXISTS
       ( SELECT NULL
       FROM    prdfxsee fxs,
               fmtpdsee fmt
       WHERE   fxs.org_lvl_child = org.org_lvl_child
           AND fmt.prd_lvl_child = prd.prd_lvl_child
           AND fmt.prd_afh_key   = fxs.prd_afh_key
       )
 
 
  "Join" Query    
SELECT *
FROM   prdfxsee fxs          ,
       fmtpdsee fmt          ,
       ( SELECT org_lvl_child,
               prd_lvl_child 
       FROM    orgmstee org,
               prdmstee prd
       ) t1
WHERE
       (
              fxs.org_lvl_child = t1.org_lvl_child
          AND fmt.prd_lvl_child = t1.prd_lvl_child
          AND fmt.prd_afh_key   = fxs.prd_afh_key
       )

Open in new window

0
bmutch
Asked:
bmutch
3 Solutions
 
lwadwellCommented:
Hi bmutch,

An EXISTS check for 'at least' one row of data being returned from the sub SQL - it is either true or false.  There may be more than one row 'returnable' - but it only takes one row to make the EXISTS true.

A JOIN joins to every row in the sub SQL.  So if there more than one row - each row gets returned.

does this make sense?


lwadwell
0
 
bmutchAuthor Commented:
hello lwadwell,

but you can see that the join clause and the exists clause are identical:

WHERE   fxs.org_lvl_child = org.org_lvl_child
           AND fmt.prd_lvl_child = prd.prd_lvl_child
           AND fmt.prd_afh_key   = fxs.prd_afh_key
0
 
sdstuberCommented:
Your results are different because you're querying different objects.
Yes, the same 4 objects in both queries, but in the first query you're pulling from org and prd with a condition

In the second you're pulling from all 4 with conditions.

I don't know why would expect them to be the same,  it's not a join vs Exists,  they are obviously trying to return different results.

but I'll try to explain it what happening by using a simple example.  I'll use 4 views that each have 2 rows in them (see snippet below)

Every join is a cartesian join until the join conditions filter rows out.
So, the join multiplies the number of rows.  4 views, 2 rows in each is 2 x 2 x 2 x 2=16 rows. Which are then filtered down to 2

The "exists" query only returns rows from a join of 2 views of 2 rows each so 2 x 2 = 4 rows.
Of these 4 rows we then filter them by the exists clause which reduces the result set to 1 row.

So, without any filtering the exists version is already a smaller result set.
Apply filters via join conditions or exists clause and some data drops out.


Play around with the sample data and you can see how the results change.

One way to think about the filtering vs multiplying action...
In the exists,  once a row "exists" it's included and not included again, even if it satisfies the exits clause multiple times.
Where as, in the join, if a row in prd matches 2 rows in fmt (for example) then it's included twice unless some other filter kicks one out.




WITH orgmstee AS (SELECT   1 org_lvl_child FROM DUAL
                  UNION ALL
                  SELECT   2 FROM DUAL),
    prdmstee AS (SELECT   1 prd_lvl_child FROM DUAL
                 UNION ALL
                 SELECT   2 FROM DUAL),
    prdfxsee AS (SELECT   1 prd_afh_key, 1 prd_lvl_child, 1 org_lvl_child FROM DUAL
                 UNION ALL
                 SELECT   2, 1, 1 FROM DUAL),
    fmtpdsee AS (SELECT   1 prd_afh_key, 1 prd_lvl_child FROM DUAL
                 UNION ALL
                 SELECT   2, 1 FROM DUAL)

Open in new window

0
Configuration Guide and Best Practices

Read the guide to learn how to orchestrate Data ONTAP, create application-consistent backups and enable fast recovery from NetApp storage snapshots. Version 9.5 also contains performance and scalability enhancements to meet the needs of the largest enterprise environments.

 
dportasCommented:
bmutch,

A correlated EXISTS subquery like yours behaves pretty much like a semi-join in relational algebra. It returns only the rows from table "A" that match another table "B". Therefore such a query will never return more rows than exist in A.

A SQL "equi-join" on the other hand combines data from the joined tables such that "A join B" can return at most A*B rows. Also, your second query returns a different set of columns to your first, so even if you use DISTINCT to eliminate duplicate rows the results won't be the same.
0
 
bmutchAuthor Commented:
ok, thanks all. I think I was expecting there to be a one-to-one rel. between some of the tables and there wasn't so that was throwing me.
Seems like sdstuber's answer was the most extensive so I'll grant him the majority.

thanks.
0
 
sdstuberCommented:
glad we could help
0

Featured Post

Transaction-level recovery for Oracle database

Veeam Explore for Oracle delivers low RTOs and RPOs with agentless transaction log backup and transaction-level recovery of Oracle databases. You can restore the database to a precise point in time, even to a specific transaction.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now