"NOT IN" Query Returns No Rows
I have been playing with this query for some time now and can't figure out what I am missing. It is probably something simple but I can't for the life of me see it. I have a query in which I am simply trying to get the a list of values that are not in another list. I am using CTEs as much of this is sub-querying the same view.
The query looks like this:
Pretty simple. If I end the query at the UNION ALL join and leave out the "AND TIN NOT IN...." I get about 5100 records. If I add the "AND TIN NOT IN...." I get 0 records. However, if I change the "NOT IN" to "IN" I get about 2300 matches, which is what it should be. So what happens to my 2800 records that do not match when I use the "NOT IN" instead of the "IN"?
Like I said it is probably going to be something crazy simple but I am stumped.
Thanks ahead of time for you help.
The query looks like this:
;WITH CTE_NONULLS AS (
SELECT MAX(CabinetID) AS CabinetID, TIN FROM [Trades].[dbo].[vw_CabinetNG_New]
WHERE (Source = 'DST' OR Source = 'DAZL')
AND (Address1 IS NOT NULL) AND (TIN IS NOT NULL AND TIN <> '000-00-0000')
GROUP BY TIN),
CTE_NULLS AS (
SELECT MAX(CabinetID) AS CABINETID, FirstName, TIN FROM [Trades].[dbo].[vw_CabinetNG_New]
WHERE (Source = 'DST' OR Source = 'DAZL')
AND (Address1 IS NOT NULL) AND ((FIRSTNAME IS NOT NULL AND FIRSTNAME <> '') OR (LASTNAME IS NOT NULL AND LASTNAME <> ''))
AND (TIN IS NULL OR TIN = '000-00-0000')
GROUP BY FirstName, TIN),
CTE_FC AS (
SELECT TIN FROM [dbo].[vw_CabinetNG_New]
WHERE Source = 'FirstClearing'
GROUP BY TIN
)
SELECT TIN
, CASE WHEN [FirstName] IS NOT NULL THEN [FirstName] + ' ' ELSE '' END
+ CASE WHEN [LastName] IS NOT NULL THEN [LastName] ELSE '' END
AS [Full Name]
,CASE WHEN [AdditionalName] IS NULL THEN '' ELSE [AdditionalName] END AS [AdditionalName]
,CASE WHEN [Address1] IS NULL THEN '' ELSE [Address1] END AS [Address1]
,CASE WHEN [Address2] IS NULL THEN '' ELSE [Address2] END AS [Address2]
,CASE WHEN [Address3] IS NULL THEN '' ELSE [Address3] END AS [Address3]
,[City]
,[State]
,[ZipCode]
,LastUpdated
,[Source]
FROM [Trades].[dbo].[vw_CabinetNG_New]
WHERE CabinetID IN
(SELECT CabinetID FROM CTE_NONULLS UNION ALL SELECT CabinetID FROM CTE_NULLS)
AND TIN NOT IN (SELECT TIN FROM CTE_FC)
ORDER BY TIN
Pretty simple. If I end the query at the UNION ALL join and leave out the "AND TIN NOT IN...." I get about 5100 records. If I add the "AND TIN NOT IN...." I get 0 records. However, if I change the "NOT IN" to "IN" I get about 2300 matches, which is what it should be. So what happens to my 2800 records that do not match when I use the "NOT IN" instead of the "IN"?
Like I said it is probably going to be something crazy simple but I am stumped.
Thanks ahead of time for you help.
"NOT IN" can lead to performance problems; try to use join syntax instead:
...
FROM [Trades].[dbo].[vw_Cabinet NG_New] LEFT OUTER JOIN CTE_FC
ON vw_CabinetNG_New.TIN = CTE_FC.TIN
WHERE CabinetID IN
(SELECT CabinetID FROM CTE_NONULLS UNION ALL SELECT CabinetID FROM CTE_NULLS)
AND CTE_FC.TIN IS NULL
...
FROM [Trades].[dbo].[vw_Cabinet
ON vw_CabinetNG_New.TIN = CTE_FC.TIN
WHERE CabinetID IN
(SELECT CabinetID FROM CTE_NONULLS UNION ALL SELECT CabinetID FROM CTE_NULLS)
AND CTE_FC.TIN IS NULL
Whilst you are re-using the same table, you are not re-using any single CTE, so there is no particular performance advantage by using the CTEs as you have them now. e.g.this would work without the third CTE
why not simply use
AND SOURCE <> 'FirstClearing'
instead
AND TIN NOT IN (
SELECT
TIN
FROM [dbo].[vw_CabinetNG_New]
WHERE Source = 'FirstClearing'
GROUP BY
TIN
)
So you are asking for TIN NOT IN (list of TIN where Source = 'FirstClearing')why not simply use
AND SOURCE <> 'FirstClearing'
instead
I think it boils down to:
SELECT
*
FROM [Trades].[dbo].[vw_CabinetNG_New]
WHERE (Source = 'DST' OR Source = 'DAZL')
AND Address1 IS NOT NULL
AND (
(TIN IS NOT NULL AND TIN <> '000-00-0000')
OR
(
(
(FIRSTNAME IS NOT NULL AND FIRSTNAME <> '')
OR
(LASTNAME IS NOT NULL AND LASTNAME <> '')
)
AND (TIN IS NULL OR TIN = '000-00-0000')
)
)
AND Source <> 'FirstClearing'
ORDER BY
TIN
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks everyone for your help. I actually needed the grouping, which I know is expensive, to deal with the fact that I needed the last TIN entry of many in the table. And this was a query that will not be run on a regular basis. Once a year, so I was not too worried about the cost. That being said the fact that their were null values in the CTE_FC was the culprit. I have had this issue of null values biting me in the past so I am not surprised that it did so again. When I took out the null values in the CTE_FC, I got my 2814 rows as expected. Thanks again for your help. Experts Exchange rocks!!
>>"When I took out the null values in the CTE_FC, I got my 2814 rows as expected."
that's great
but you don't need CTE_FC
that's great
but you don't need CTE_FC
CREATE TABLE Table1
([TIN] int, [Source] varchar(14))
;
INSERT INTO Table1
([TIN], [Source])
VALUES
(1, 'FirstClearing'),
(2, 'FirstClearing'),
(3, 'FirstClearing'),
(4, 'FirstClearing'),
(5, 'FirstClearing'),
(6, 'FirstClearing'),
(7, 'FirstClearing'),
(8, 'FirstClearing'),
(9, 'FirstClearing'),
(10, 'FirstClearing'),
(11, 'SecondClearing'),
(12, 'SecondClearing'),
(13, 'SecondClearing'),
(14, 'SecondClearing'),
(15, 'SecondClearing'),
(16, 'SecondClearing'),
(17, 'SecondClearing'),
(18, 'SecondClearing'),
(19, 'SecondClearing'),
(20, 'SecondClearing')
;
WITH cte_fc AS ( SELECT TIN FROM table1 WHERE SOURCE = 'FirstClearing')
SELECT
*
FROM table1
WHERE TIN NOT IN (SELECT TIN FROM cte_fc)
**[Results][2]**:
| TIN | SOURCE |
|-----|----------------|
| 11 | SecondClearing |
| 12 | SecondClearing |
| 13 | SecondClearing |
| 14 | SecondClearing |
| 15 | SecondClearing |
| 16 | SecondClearing |
| 17 | SecondClearing |
| 18 | SecondClearing |
| 19 | SecondClearing |
| 20 | SecondClearing |
**Query 2**:
SELECT
*
FROM table1
WHERE SOURCE <> 'FirstClearing'
**[Results][3]**:
| TIN | SOURCE |
|-----|----------------|
| 11 | SecondClearing |
| 12 | SecondClearing |
| 13 | SecondClearing |
| 14 | SecondClearing |
| 15 | SecondClearing |
| 16 | SecondClearing |
| 17 | SecondClearing |
| 18 | SecondClearing |
| 19 | SecondClearing |
| 20 | SecondClearing |
[1]: http://sqlfiddle.com/#!3/0dd65/4
Firstly, because of the sorting in cte_fc, that is likely to be expensive.
what does this query give?
Open in new window
Then do a count of each of the lines from cte_nonulls and cte_nulls.
HTH
David
PS Doing a union all of cte_nonulls and cte_nulls LOGICALLY defeats the purpose of the exercise!
I suspect that your query can be simplified to:
Open in new window