Link to home
Start Free TrialLog in
Avatar of LeTay
LeTay

asked on

Optimise a SQL query

I have two SQL tables, says T1 and T2
T1 has PK1 as primary key and a column C1
T2 has K1 as foreign key PK1 in T1 and a column C1
I need to count the number of T1.PK1 for which  T1.C1 has a given value (3) and for which there is at least on T2.C1 > 0
My current select is

select   count(distinct T1.PK1)
from    T1, T2
where  T1.PK1 = T2.K1
and      T1.C1 = 3
and      T2.C1 > 0

I think that it is possible to optimise this as I presume that the entire T2 table is scanned for those rows where T2.C1 = T1.C1 (and T1.C1 = 3) ...
Avatar of Bitsqueezer
Bitsqueezer
Flag of Germany image

Hi,

normally you would use a FROM T1 INNER JOIN T2 ON T1.PK1 = T2.K1 instead of using a CROSS JOIN and a WHERE. But modern query machines should find that out by themselves and create an identical execution plan in both cases.

Instead of creating the WHERE at the end you could reduce the number of rows before joining like this:
SELECT COUNT(distinct Q1.PK1)
FROM ( SELECT T1.PK1 FROM T1 WHERE T1.C1 = 3) AS Q1
INNER JOIN (SELECT T2.K1 FROM T2 WHERE T2.C1 > 0) AS Q2
ON Q1.PK1 = Q2.K1

Open in new window


But I think also this is something the query engine will find out by itself and optimize on it's own. Just create an execution plan for both and look at the result to see if there's any difference. You can also activate the execution time statistics (both in case of SQL Server, don't know which database you use).

Cheers,

Christian
Avatar of LeTay
LeTay

ASKER

Thanks
Will SQL stop looking in T2 as soon as it finds one T2.C1 > 0 for each PK1 candidate ?
Depending on the cardinalities involved, you need the follwoing two indices: T1(C1) and T2(C1) to support the predicates for T1.C1=3 and T2.C1>0
Avatar of LeTay

ASKER

No problem with the predicates
But my question remains : will SQL stop looking in T2 as soon as it sees a T2.C1 > 0 for each PK involved ?
Avatar of LeTay

ASKER

For info, the table T1 is very small and the number of rows in T2 for each T1.PK is large (2.000 per PK)
The primary key in T2 is (K1,C1)
Sure.

After rereading your post again, the index on T2 should be T2(K1, C1). Cause it should give you an index seek instead of a table scan.

When T2(K1) is indexed, a common practice, then you should get a index seek on this index with an key lookup for C1, which is slower, cause it requires the additional lookup.
ASKER CERTIFIED SOLUTION
Avatar of Pavel Celba
Pavel Celba
Flag of Czechia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial