Link to home
Start Free TrialLog in
Avatar of Russell Scheinberg, MCSE Business Intelligence
Russell Scheinberg, MCSE Business IntelligenceFlag for United States of America

asked on

"NOT IN" Query Returns No Rows

I have been playing with this query for some time now and can't figure out what I am missing. It is probably something simple but I can't for the life of me see it. I have a query in which I am simply trying to get the a list of values that are not in another list. I am using CTEs as much of this is sub-querying the same view.

The query looks like this:

;WITH CTE_NONULLS AS (
SELECT MAX(CabinetID) AS CabinetID, TIN FROM [Trades].[dbo].[vw_CabinetNG_New] 
WHERE (Source = 'DST' OR Source = 'DAZL') 
  AND (Address1 IS NOT NULL) AND (TIN IS NOT NULL AND TIN <> '000-00-0000')
  GROUP BY TIN),
CTE_NULLS AS (
SELECT MAX(CabinetID) AS CABINETID, FirstName, TIN FROM [Trades].[dbo].[vw_CabinetNG_New]
WHERE (Source = 'DST' OR Source = 'DAZL') 
  AND (Address1 IS NOT NULL) AND ((FIRSTNAME IS NOT NULL AND FIRSTNAME <> '') OR (LASTNAME IS NOT NULL AND LASTNAME <> ''))
  AND (TIN IS NULL OR TIN = '000-00-0000')
  GROUP BY FirstName, TIN),
CTE_FC AS (
SELECT TIN FROM [dbo].[vw_CabinetNG_New] 
WHERE Source = 'FirstClearing'
GROUP BY TIN
)
SELECT TIN
	  , CASE WHEN [FirstName] IS NOT NULL THEN [FirstName] + ' ' ELSE '' END
	   + CASE WHEN [LastName] IS NOT NULL THEN [LastName] ELSE '' END  
	   AS [Full Name]
	  ,CASE WHEN [AdditionalName] IS NULL THEN '' ELSE [AdditionalName] END AS [AdditionalName]
      ,CASE WHEN [Address1] IS NULL THEN '' ELSE [Address1] END AS [Address1]
      ,CASE WHEN [Address2] IS NULL THEN '' ELSE [Address2] END AS [Address2]
      ,CASE WHEN [Address3] IS NULL THEN '' ELSE [Address3] END AS [Address3]
      ,[City]
      ,[State]
      ,[ZipCode]
	  ,LastUpdated
	  ,[Source]
FROM [Trades].[dbo].[vw_CabinetNG_New] 
WHERE CabinetID IN
(SELECT CabinetID FROM CTE_NONULLS UNION ALL SELECT CabinetID FROM CTE_NULLS)
AND TIN NOT IN (SELECT TIN FROM CTE_FC) 
ORDER BY TIN

Open in new window


Pretty simple. If I end the query at the UNION ALL join and leave out the "AND TIN NOT IN...." I get about 5100 records. If I add the "AND TIN NOT IN...." I get 0 records. However, if I change the "NOT IN" to "IN" I get about 2300 matches, which is what it should be. So what happens to my 2800 records that do not match when I use the "NOT IN" instead of the "IN"?

Like I said it is probably going to be something crazy simple but I am stumped.

Thanks ahead of time for you help.
Avatar of David Todd
David Todd
Flag of New Zealand image

Hi,

Firstly, because of the sorting in cte_fc, that is likely to be expensive.

what does this query give?
select tin, count( * )
from dbo.vw_CabinetNG_New
group by
    tin
order by
    count( * ) desc
;

Open in new window


Then do a count of each of the lines from cte_nonulls and cte_nulls.

HTH
  David

PS Doing a union all of cte_nonulls and cte_nulls LOGICALLY defeats the purpose of the exercise!

I suspect that your query can be simplified to:
use Trades
go

select 
	v.TIN
	, isnull( v.FirstName + ' ', '' ) = isnull( v.LastName, '' ) as [Full Name]
	, isnull( v.AdditionalName, '' ) as AdditionalName
	, isnull( v.Address1, '' ) as Address1
	, isnull( v.Address2, '' ) as Address2
	, isnull( v.Address3, '' ) as Address3
	, v.City
	, v.State
	, v.ZipCode
	, v.LastUpdated
	, v.Source
from dbo.vw_CabinetNG_New  v
where 
	v.Source = 'FirstClearing' 
order by 
	v.TIN 
;

Open in new window

Avatar of magarity
magarity

"NOT IN" can lead to performance problems; try to use join syntax instead:

...
FROM [Trades].[dbo].[vw_CabinetNG_New] LEFT OUTER JOIN CTE_FC
ON vw_CabinetNG_New.TIN = CTE_FC.TIN
WHERE CabinetID IN
(SELECT CabinetID FROM CTE_NONULLS UNION ALL SELECT CabinetID FROM CTE_NULLS)
AND CTE_FC.TIN IS NULL
Whilst you are re-using the same table, you are not re-using any single CTE, so there is no particular performance advantage by using the CTEs as you have them now. e.g.this would work without the third CTE
      AND TIN NOT IN (
                  SELECT
                        TIN
                  FROM [dbo].[vw_CabinetNG_New]
                  WHERE Source = 'FirstClearing'
                  GROUP BY
                        TIN
      )

Open in new window

So you are asking for TIN NOT IN (list of TIN where Source = 'FirstClearing')

why not simply use
AND SOURCE <> 'FirstClearing'
instead
I think it boils down to:
SELECT
*
FROM [Trades].[dbo].[vw_CabinetNG_New]
WHERE (Source = 'DST' OR Source = 'DAZL')
AND Address1 IS NOT NULL
AND (
     (TIN IS NOT NULL AND TIN <> '000-00-0000')
  OR 
     ( 
               (
                 (FIRSTNAME IS NOT NULL AND FIRSTNAME <> '')
            OR 
                 (LASTNAME IS NOT NULL AND LASTNAME <> '')
                )
            AND (TIN IS NULL OR TIN = '000-00-0000')
     )
   )
AND Source <> 'FirstClearing'
ORDER BY
      TIN

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Vikas Garg
Vikas Garg
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Russell Scheinberg, MCSE Business Intelligence

ASKER

Thanks everyone for your help. I actually needed the grouping, which I know is expensive, to deal with the fact that I needed the last TIN entry of many in the table. And this was a query that will not be run on a regular basis. Once a year, so I was not too worried about the cost. That being said the fact that their were null values in the CTE_FC was the culprit. I have had this issue of null values biting me in the past so I am not surprised that it did so again. When I took out the null values in the CTE_FC, I got my 2814 rows as expected. Thanks again for your help. Experts Exchange rocks!!
>>"When I took out the null values in the CTE_FC, I got my 2814 rows as expected."
that's great

but you don't need CTE_FC

CREATE TABLE Table1
	([TIN] int, [Source] varchar(14))
;
	
INSERT INTO Table1
	([TIN], [Source])
VALUES
	(1, 'FirstClearing'),
	(2, 'FirstClearing'),
	(3, 'FirstClearing'),
	(4, 'FirstClearing'),
	(5, 'FirstClearing'),
	(6, 'FirstClearing'),
	(7, 'FirstClearing'),
	(8, 'FirstClearing'),
	(9, 'FirstClearing'),
	(10, 'FirstClearing'),
	(11, 'SecondClearing'),
	(12, 'SecondClearing'),
	(13, 'SecondClearing'),
	(14, 'SecondClearing'),
	(15, 'SecondClearing'),
	(16, 'SecondClearing'),
	(17, 'SecondClearing'),
	(18, 'SecondClearing'),
	(19, 'SecondClearing'),
	(20, 'SecondClearing')
;

Open in new window

    WITH cte_fc AS ( SELECT TIN FROM table1 WHERE SOURCE = 'FirstClearing')
    
    SELECT
    *
    FROM table1
    WHERE TIN NOT IN (SELECT TIN FROM cte_fc)
    

**[Results][2]**:
    
    | TIN |         SOURCE |
    |-----|----------------|
    |  11 | SecondClearing |
    |  12 | SecondClearing |
    |  13 | SecondClearing |
    |  14 | SecondClearing |
    |  15 | SecondClearing |
    |  16 | SecondClearing |
    |  17 | SecondClearing |
    |  18 | SecondClearing |
    |  19 | SecondClearing |
    |  20 | SecondClearing |


**Query 2**:

    SELECT
    *
    FROM table1
    WHERE SOURCE <> 'FirstClearing'
    

**[Results][3]**:
    
    | TIN |         SOURCE |
    |-----|----------------|
    |  11 | SecondClearing |
    |  12 | SecondClearing |
    |  13 | SecondClearing |
    |  14 | SecondClearing |
    |  15 | SecondClearing |
    |  16 | SecondClearing |
    |  17 | SecondClearing |
    |  18 | SecondClearing |
    |  19 | SecondClearing |
    |  20 | SecondClearing |



  [1]: http://sqlfiddle.com/#!3/0dd65/4

Open in new window