asked on

Different between these 2 sql statements

I had this question after viewing Simple SQL - Only select row with specific status.

I got help from the experts in the link above. I'm running the sql against the real database and I get a difference in number of rows I get back.

This is SQL 2014

This gives me 59152 rows

 WITH cteLoadS AS
(
	SELECT  CommSalesOrder_CommSalesOrderId, loadStatus, COUNT(CommSalesOrder_CommSalesOrderId) OVER(PARTITION BY CommSalesOrder_CommSalesOrderId, loadStatus) AS CountByStatus,
		COUNT(CommSalesOrder_CommSalesOrderId) OVER(PARTITION BY CommSalesOrder_CommSalesOrderId) AS CountByCommSalesId
	FROM loads
)
SELECT DISTINCT C.*, Loadstatus
FROM cteLoadS AS cte
INNER JOIN CommSalesOrders AS C
	ON cte.CommSalesOrder_CommSalesOrderId = C.CommSalesOrderId
WHERE cte.CountByCommSalesId = cte.CountByStatus
	AND cte.loadStatus = 4

Open in new window

This one gives me 64449 rows. I'd think they should return the same number of rows. Just looking at the 2 sql statements, why would be different as far as the number of rows it's bringing?

select c.*
from (
    select CommSalesOrder_CommSalesOrderId
    from loads
    group by CommSalesOrder_CommSalesOrderId, loadstatus
    having max(case when loadStatus !=4 then 0 else 1 end) = 1
) as l
inner join CommSalesOrders c on c.CommSalesOrderId = l.CommSalesOrder_CommSalesOrderId

Open in new window

Arana (G.P.)

Have you identified any different result ? you can compare both queries to find out which ones are not showing in the other query.

I see you are using the HAVING clause, I personally try to avoid it, for performance reasons , that is why in your previous question i used EXISTS (did you try it?) it is way faster than 'having' as it doesnt process all the results, uses indexes and other optimizations.

Can you try it with the example I gave you and see how many rows you get?

Camillia

ASKER

Your sql in the related question didn't work...it didn't run. I can try it again but I got an error when I tried it with the sample table I posted.

I'll dump the results in 2 different tables and compare and see what's missing

Arana (G.P.)

hmm weird as i tested it in ideone.com as i just took away the #'s and it worked for me

create table commsales
(
  id int,
  Seller varchar(20)

);


Create table loads
(
  Id int,
  CommSalesId int,
  loadStatus varchar(1)

);


insert into commsales
select  1, 'ABC';

insert into commsales
select  2, 'XYZ';

insert into loads
select  1, 1, 'S';

insert into loads
select  2, 1, 'S';

insert into loads
select  3, 1, 'N';

insert into loads
select  1, 2, 'S';

insert into loads
select  1, 2, 'S';
go

select distinct * from
(
select *  from commsales c
 inner join loads l on c.id = l.CommSalesId
  and not exists
(select 1 from  loads where loads.loadstatus='N' and c.id=loads.commsalesid));

Open in new window

Camillia

ASKER

I'll try it in the morning when I go to work. Thanks

ste5an

It's for sure the combination of DISTINCT and the INNER JOIN. Cause you run DISTINCT over *. The separate GROUP BY is reducing the number of rows, this behavior is not the same with your DISTINCT.

As a rule of thumb: Don't use DISTINCT on complex queries.

So as a test: Just use a GROUP BY instead of the DISTINCT.

Arana (G.P.)

Camillia, in my previous code I tried now in sql server and i get errors, I dont know why it works on ideone.com but not on sqlserver itself, so i modfied this part a bit:

select distinct * from #commsales c
inner join #loads l on c.id = l.CommSalesId
and not exists
(select 1 from #loads where loadstatus='N' and c.id=commsalesid)

it gives the result you expected and should be faster and give consistent results with the first example you have.

Camillia

ASKER

ste5an -

So as a test: Just use a GROUP BY instead of the DISTINCT.

So, change this to use "Group by" as a test?

 WITH cteLoadS AS
(
	SELECT  CommSalesOrder_CommSalesOrderId, loadStatus, COUNT(CommSalesOrder_CommSalesOrderId) OVER(PARTITION BY CommSalesOrder_CommSalesOrderId, loadStatus) AS CountByStatus,
		COUNT(CommSalesOrder_CommSalesOrderId) OVER(PARTITION BY CommSalesOrder_CommSalesOrderId) AS CountByCommSalesId
	FROM loads
)
SELECT DISTINCT C.*, Loadstatus
FROM cteLoadS AS cte
INNER JOIN CommSalesOrders AS C
	ON cte.CommSalesOrder_CommSalesOrderId = C.CommSalesOrderId
WHERE cte.CountByCommSalesId = cte.CountByStatus
	AND cte.loadStatus = 4

Open in new window

Chris Luttrell

all that stuff about distinct and group by differences are true and part of your problem here
but forget all those and try this query, much simpler and will give you want you wanted from your original question. There you wanted to only output orders and sellers with all load statuses = 'S' which I take is equivalent to the 4 here? But I may have translated your samples incorrectly from the old question to this one. Anyway, the query needed should be much easier to do than any of those complex examples above.

select CSO.*, L.* 
FROM dbo.CommSalesOrders CSO
OUTER APPLY (SELECT TOP 1 'anything' unfilledLoads FROM dbo.loads L WHERE L.CommSalesOrder_CommSalesOrderId = CSO.CommSalesOrderId AND L.loadStatus <> 4) Loads
INNER JOIN dbo.loads L ON L.CommSalesOrder_CommSalesOrderId = CSO.CommSalesOrderId
WHERE Loads.unfilledLoads IS NULL

Open in new window

Results like this: