peterjerry asked on # EXCEPT STATEMENT IN T-SQL

Hi,

I am trying to convert a SAS script to T-SQL but somehow SQL SERVER 2000 doesnt accept except statement. Is there a better way to get the script to work in T-SQL. SAS Script attached...

I am trying to convert a SAS script to T-SQL but somehow SQL SERVER 2000 doesnt accept except statement. Is there a better way to get the script to work in T-SQL. SAS Script attached...

```
proc sql;
create table tmp3 as
select * from tmp1
except
select * from tmp2;
quit;
```

Microsoft SQL Server

Lexie

select * from tmp1 where <field> NOT IN (select * from tmp2;)

Lexie

Oeps,

select * from tmp1 where <field> NOT IN (select <field> from tmp2)

The query optimizer will probably use the same plan as the NOT EXISTS.

select * from tmp1 where <field> NOT IN (select <field> from tmp2)

The query optimizer will probably use the same plan as the NOT EXISTS.

All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat

William Peck

peterjerry

Unfortunately there is no unique id on these tables

Lexie

Hm, may be concatenate all fields behind each other...

select a, b, c from tmp1 where a + b + c NOT IN (select a + b + c from tmp2)

select a, b, c from tmp1 where a + b + c NOT IN (select a + b + c from tmp2)

Lexie

Or...

SELECT a, b, c FROM

(SELECT a, b,c, 1 AS d FROM tmp1 UNION ALL SELECT a, b,c, -1 AS d FROM tmp2)

GROUP BY a, b, c

HAVING SUM(d) > 0

SELECT a, b, c FROM

(SELECT a, b,c, 1 AS d FROM tmp1 UNION ALL SELECT a, b,c, -1 AS d FROM tmp2)

GROUP BY a, b, c

HAVING SUM(d) > 0

Get an unlimited membership to EE for less than $4 a week.

Unlimited question asking, solutions, articles and more.

peterjerry

Hi Lexie,

I had tried your first option which was a bit laborious with quite a number of fields in there and it still didnt give the exact number of records coming out. Let me try your next option and will let you know the outcome.

Regards,

I had tried your first option which was a bit laborious with quite a number of fields in there and it still didnt give the exact number of records coming out. Let me try your next option and will let you know the outcome.

Regards,

Lexie

Oh, that gave an error:

SELECT a, b, c FROM

(SELECT a, b,c, 1 AS d FROM tmp1 UNION ALL SELECT a, b,c, -1 AS d FROM tmp2) AS tabel

GROUP BY a, b, c

HAVING SUM(d) > 0

SELECT a, b, c FROM

(SELECT a, b,c, 1 AS d FROM tmp1 UNION ALL SELECT a, b,c, -1 AS d FROM tmp2) AS tabel

GROUP BY a, b, c

HAVING SUM(d) > 0

Guy Hengel [angelIII / a3]

hi,

note: I don't like the EXCEPT statement (or MINUS like in other databases), as the operation, like DISTINCT, works on ALL the columns data, so once you add or remove a column from the list (or to the table when you use "select *"), the results will eventually differ !

the underlying issue is that you don't have a primary key to match the rows between the 2 tables.

however, that would be the most effective way.

the second best is to create a hash value for each row, index that and join using that field (first).

http://www.fotia.co.uk/fotia/FA.03.Sql2KHashIndexes.01.aspx

the function CHECKSUM() can be applied on a single value, so you might just concatenate all the values (NOTE: take care of the NULL values!!!) for the hashed value.

so: add a computed column to that table with the checksum result, index it, and join on that column (use the NOT EXISTS version from matrix_aash's first comment)

you can then add more fields to be checked by, the index of the hashed column should still be used and make the query most efficient.

note: I don't like the EXCEPT statement (or MINUS like in other databases), as the operation, like DISTINCT, works on ALL the columns data, so once you add or remove a column from the list (or to the table when you use "select *"), the results will eventually differ !

the underlying issue is that you don't have a primary key to match the rows between the 2 tables.

however, that would be the most effective way.

the second best is to create a hash value for each row, index that and join using that field (first).

http://www.fotia.co.uk/fotia/FA.03.Sql2KHashIndexes.01.aspx

the function CHECKSUM() can be applied on a single value, so you might just concatenate all the values (NOTE: take care of the NULL values!!!) for the hashed value.

so: add a computed column to that table with the checksum result, index it, and join on that column (use the NOT EXISTS version from matrix_aash's first comment)

you can then add more fields to be checked by, the index of the hashed column should still be used and make the query most efficient.

Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!

Walt Forbes

peterjerry

angell,

Will get back after trying this

Will get back after trying this

peterjerry

Lexie,

I tried this

SELECT a, b, c FROM

(SELECT a, b,c, 1 AS d FROM tmp1 UNION ALL SELECT a, b,c, -1 AS d FROM tmp2) AS tabel

GROUP BY a, b, c

HAVING SUM(d) > 0

and it seems to have done it. Can you explain the principle behind this.

Regards,

I tried this

SELECT a, b, c FROM

(SELECT a, b,c, 1 AS d FROM tmp1 UNION ALL SELECT a, b,c, -1 AS d FROM tmp2) AS tabel

GROUP BY a, b, c

HAVING SUM(d) > 0

and it seems to have done it. Can you explain the principle behind this.

Regards,

Guy Hengel [angelIII / a3]

>Can you explain the principle behind this.

see the subselect, it makes a UNION of the 2 tables, with 1 additional field, with 1 and -1 in the 2 tables.

the GROUP BY in the outer query will return 1 row for the "distinct" values names (a,b,c), and then check if the sum() of the additional field return > 0.

say the a,b,c is only in tmp1, then sum(d) will return 1

say the a,b,c is only in tmp2, then sum(d) will return -1

say the a,b,c is in both tmp1 and tmp2, then sum(d) will return 0

so, you want > 0 cases to be returned

see the subselect, it makes a UNION of the 2 tables, with 1 additional field, with 1 and -1 in the 2 tables.

the GROUP BY in the outer query will return 1 row for the "distinct" values names (a,b,c), and then check if the sum() of the additional field return > 0.

say the a,b,c is only in tmp1, then sum(d) will return 1

say the a,b,c is only in tmp2, then sum(d) will return -1

say the a,b,c is in both tmp1 and tmp2, then sum(d) will return 0

so, you want > 0 cases to be returned

Get an unlimited membership to EE for less than $4 a week.

Unlimited question asking, solutions, articles and more.

Log in or sign up to see answer

Become an EE member today7-DAY FREE TRIAL

Members can start a 7-Day Free trial then enjoy unlimited access to the platform

or

Learn why we charge membership fees

We get it - no one likes a content blocker. Take one extra minute and find out why we block content.

Not exactly the question you had in mind?

Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.

ask a question
Select * from tmp1 t1 where not exists (select * from tmp2 t2 where t1.fieldname = t2.fieldname)

Hope this helps.

Cheers.