Camillia
asked on
Different between these 2 sql statements
I had this question after viewing Simple SQL - Only select row with specific status.
I got help from the experts in the link above. I'm running the sql against the real database and I get a difference in number of rows I get back.
This is SQL 2014
This gives me 59152 rows
This one gives me 64449 rows. I'd think they should return the same number of rows. Just looking at the 2 sql statements, why would be different as far as the number of rows it's bringing?
I got help from the experts in the link above. I'm running the sql against the real database and I get a difference in number of rows I get back.
This is SQL 2014
This gives me 59152 rows
WITH cteLoadS AS
(
SELECT CommSalesOrder_CommSalesOrderId, loadStatus, COUNT(CommSalesOrder_CommSalesOrderId) OVER(PARTITION BY CommSalesOrder_CommSalesOrderId, loadStatus) AS CountByStatus,
COUNT(CommSalesOrder_CommSalesOrderId) OVER(PARTITION BY CommSalesOrder_CommSalesOrderId) AS CountByCommSalesId
FROM loads
)
SELECT DISTINCT C.*, Loadstatus
FROM cteLoadS AS cte
INNER JOIN CommSalesOrders AS C
ON cte.CommSalesOrder_CommSalesOrderId = C.CommSalesOrderId
WHERE cte.CountByCommSalesId = cte.CountByStatus
AND cte.loadStatus = 4
This one gives me 64449 rows. I'd think they should return the same number of rows. Just looking at the 2 sql statements, why would be different as far as the number of rows it's bringing?
select c.*
from (
select CommSalesOrder_CommSalesOrderId
from loads
group by CommSalesOrder_CommSalesOrderId, loadstatus
having max(case when loadStatus !=4 then 0 else 1 end) = 1
) as l
inner join CommSalesOrders c on c.CommSalesOrderId = l.CommSalesOrder_CommSalesOrderId
ASKER
Your sql in the related question didn't work...it didn't run. I can try it again but I got an error when I tried it with the sample table I posted.
I'll dump the results in 2 different tables and compare and see what's missing
I'll dump the results in 2 different tables and compare and see what's missing
hmm weird as i tested it in ideone.com as i just took away the #'s and it worked for me
create table commsales
(
id int,
Seller varchar(20)
);
Create table loads
(
Id int,
CommSalesId int,
loadStatus varchar(1)
);
insert into commsales
select 1, 'ABC';
insert into commsales
select 2, 'XYZ';
insert into loads
select 1, 1, 'S';
insert into loads
select 2, 1, 'S';
insert into loads
select 3, 1, 'N';
insert into loads
select 1, 2, 'S';
insert into loads
select 1, 2, 'S';
go
select distinct * from
(
select * from commsales c
inner join loads l on c.id = l.CommSalesId
and not exists
(select 1 from loads where loads.loadstatus='N' and c.id=loads.commsalesid));
ASKER
I'll try it in the morning when I go to work. Thanks
It's for sure the combination of DISTINCT and the INNER JOIN. Cause you run DISTINCT over *. The separate GROUP BY is reducing the number of rows, this behavior is not the same with your DISTINCT.
As a rule of thumb: Don't use DISTINCT on complex queries.
So as a test: Just use a GROUP BY instead of the DISTINCT.
As a rule of thumb: Don't use DISTINCT on complex queries.
So as a test: Just use a GROUP BY instead of the DISTINCT.
Camillia, in my previous code I tried now in sql server and i get errors, I dont know why it works on ideone.com but not on sqlserver itself, so i modfied this part a bit:
select distinct * from #commsales c
inner join #loads l on c.id = l.CommSalesId
and not exists
(select 1 from #loads where loadstatus='N' and c.id=commsalesid)
it gives the result you expected and should be faster and give consistent results with the first example you have.
select distinct * from #commsales c
inner join #loads l on c.id = l.CommSalesId
and not exists
(select 1 from #loads where loadstatus='N' and c.id=commsalesid)
it gives the result you expected and should be faster and give consistent results with the first example you have.
ASKER
ste5an -
So, change this to use "Group by" as a test?
So as a test: Just use a GROUP BY instead of the DISTINCT.
So, change this to use "Group by" as a test?
WITH cteLoadS AS
(
SELECT CommSalesOrder_CommSalesOrderId, loadStatus, COUNT(CommSalesOrder_CommSalesOrderId) OVER(PARTITION BY CommSalesOrder_CommSalesOrderId, loadStatus) AS CountByStatus,
COUNT(CommSalesOrder_CommSalesOrderId) OVER(PARTITION BY CommSalesOrder_CommSalesOrderId) AS CountByCommSalesId
FROM loads
)
SELECT DISTINCT C.*, Loadstatus
FROM cteLoadS AS cte
INNER JOIN CommSalesOrders AS C
ON cte.CommSalesOrder_CommSalesOrderId = C.CommSalesOrderId
WHERE cte.CountByCommSalesId = cte.CountByStatus
AND cte.loadStatus = 4
all that stuff about distinct and group by differences are true and part of your problem here
but forget all those and try this query, much simpler and will give you want you wanted from your original question. There you wanted to only output orders and sellers with all load statuses = 'S' which I take is equivalent to the 4 here? But I may have translated your samples incorrectly from the old question to this one. Anyway, the query needed should be much easier to do than any of those complex examples above.
but forget all those and try this query, much simpler and will give you want you wanted from your original question. There you wanted to only output orders and sellers with all load statuses = 'S' which I take is equivalent to the 4 here? But I may have translated your samples incorrectly from the old question to this one. Anyway, the query needed should be much easier to do than any of those complex examples above.
select CSO.*, L.*
FROM dbo.CommSalesOrders CSO
OUTER APPLY (SELECT TOP 1 'anything' unfilledLoads FROM dbo.loads L WHERE L.CommSalesOrder_CommSalesOrderId = CSO.CommSalesOrderId AND L.loadStatus <> 4) Loads
INNER JOIN dbo.loads L ON L.CommSalesOrder_CommSalesOrderId = CSO.CommSalesOrderId
WHERE Loads.unfilledLoads IS NULL
Results like this:ASKER
Thanks, Chris. I'll try it in the morning.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks, Paul. I'm at work now. Let me see.
I see you are using the HAVING clause, I personally try to avoid it, for performance reasons , that is why in your previous question i used EXISTS (did you try it?) it is way faster than 'having' as it doesnt process all the results, uses indexes and other optimizations.
Can you try it with the example I gave you and see how many rows you get?