Optimise a SQL query

I have two SQL tables, says T1 and T2
T1 has PK1 as primary key and a column C1
T2 has K1 as foreign key PK1 in T1 and a column C1
I need to count the number of T1.PK1 for which  T1.C1 has a given value (3) and for which there is at least on T2.C1 > 0
My current select is

select   count(distinct T1.PK1)
from    T1, T2
where  T1.PK1 = T2.K1
and      T1.C1 = 3
and      T2.C1 > 0

I think that it is possible to optimise this as I presume that the entire T2 table is scanned for those rows where T2.C1 = T1.C1 (and T1.C1 = 3) ...
LVL 1
LeTayAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

BitsqueezerCommented:
Hi,

normally you would use a FROM T1 INNER JOIN T2 ON T1.PK1 = T2.K1 instead of using a CROSS JOIN and a WHERE. But modern query machines should find that out by themselves and create an identical execution plan in both cases.

Instead of creating the WHERE at the end you could reduce the number of rows before joining like this:
SELECT COUNT(distinct Q1.PK1)
FROM ( SELECT T1.PK1 FROM T1 WHERE T1.C1 = 3) AS Q1
INNER JOIN (SELECT T2.K1 FROM T2 WHERE T2.C1 > 0) AS Q2
ON Q1.PK1 = Q2.K1

Open in new window


But I think also this is something the query engine will find out by itself and optimize on it's own. Just create an execution plan for both and look at the result to see if there's any difference. You can also activate the execution time statistics (both in case of SQL Server, don't know which database you use).

Cheers,

Christian
LeTayAuthor Commented:
Thanks
Will SQL stop looking in T2 as soon as it finds one T2.C1 > 0 for each PK1 candidate ?
ste5anSenior DeveloperCommented:
Depending on the cardinalities involved, you need the follwoing two indices: T1(C1) and T2(C1) to support the predicates for T1.C1=3 and T2.C1>0
Angular Fundamentals

Learn the fundamentals of Angular 2, a JavaScript framework for developing dynamic single page applications.

LeTayAuthor Commented:
No problem with the predicates
But my question remains : will SQL stop looking in T2 as soon as it sees a T2.C1 > 0 for each PK involved ?
LeTayAuthor Commented:
For info, the table T1 is very small and the number of rows in T2 for each T1.PK is large (2.000 per PK)
The primary key in T2 is (K1,C1)
ste5anSenior DeveloperCommented:
Sure.

After rereading your post again, the index on T2 should be T2(K1, C1). Cause it should give you an index seek instead of a table scan.

When T2(K1) is indexed, a common practice, then you should get a index seek on this index with an key lookup for C1, which is slower, cause it requires the additional lookup.
pcelbaCommented:
The question was Whether the whole T2 table is scanned.
The answer is NO because your indexes are designed correctly.

SQL Server recognizes the T1 as much smaller table than T2 and even filtered by c1 =  3 so it retrieves all the rows fulfilling this condition and then seeks for the appropriate index entry existence in T2. SQL Server does not need to retrieve T2 table data because the index entry existence is sufficient to calculate the COUNT().

The execution is exactly same as if you were wrote the query the following way:
SELECT COUNT(*) 
  FROM T1
 WHERE T1.C1 = 33 AND EXISTS (SELECT * FROM T2 WHERE T2.K1 = T1.PK1 AND T2.c1 > 0)

Open in new window

To optimize the query you could add an index on C1 column in T1 table. This index would be useful when the T1 table is large enough - let say over 10000 rows.

Everything is visible in execution plans.

If you don't like to decode execution plans then you may do much simpler thing:

SET STATISTICS TIME, IO ON

and execute your queries in SSMS. The Messages tab should contain the query execution time and even more important number of logical and physical reads. The more is query optimized the less number of reads is reported. You may see differences if you drop some of the indexes or write the query different way.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
SQL

From novice to tech pro — start learning today.