Link to home
Start Free TrialLog in
Avatar of Staudte
StaudteFlag for Germany

asked on

SQL SELECT performance with JOIN and COUNT

Hi Experts,

I have a MS SQL Server 2008 database with the following structure:

• table Persons with a PersonID and some data fields, e.g. Name and State,
• table Projects with Person1, Person2, Person3 and some data fields, e.g. ProjectTitle. I have 3 separate indexes on Person1, 2 and 3.

A person's ID may appear as Person1, Person2 or Person3 of the Projects table, when the person has participated in the project. There each several tens of thousands persons and projects in the database.

I would now like to SELECT all persons that match a certain criteria and count the number of projects they are in. So far I have used

SELECT COUNT(Projects.ProjectID) AS NumProjects,
Persons.Name
FROM Persons
LEFT JOIN Projects ON (Persons.PersonID = Projects.Person1 OR Persons.PersonID = Projects.Person2 OR Persons.PersonID = Projects.Person3)
WHERE State='CA'
GROUP BY Persons.Name

This takes a very long time: About 1 minute to return about 10.000 records. If I remove the 3rd condition and only compare to Person1 or 2, time is about the same. If I only compare to Person1, time is less than a second!

Unfortunately, I need to compare to all Person1, 2 and 3 - could you suggest I way I can cut the cost of the query down?

Thanks a lot for your advice,
Thomas
Avatar of santhimurthyd
santhimurthyd
Flag of United States of America image

Check this Querry

SELECT COUNT(Projects.ProjectID) AS NumProjects, Persons.Name
FROM Persons
LEFT JOIN Projects ON Persons.PersonID in(Projects.Person1 , Projects.Person2 , Projects.Person3)
WHERE State='CA' GROUP BY Persons.Name

Avatar of Staudte

ASKER

Sorry, no change. Actually even a bit slower - but that might by caused by other server load. Having 2 or 3 fields in the IN-set takes ~1 minute, having just one field returns instantaneously.
Try if this works better
SELECT
	P.NumProjects
	,Persons.Name
FROM
	Persons CROSS APPLY
	(SELECT COUNT(*) AS NumProjects
	 FROM Projects
	 WHERE Persons.PersonID = Projects.Person1 OR Persons.PersonID = Projects.Person2 OR Persons.PersonID = Projects.Person3
	) P
WHERE State='CA'
GROUP BY Persons.Name

Open in new window

hi

create an index on the STATE columns which includes (use the include clause) the PersonID,name columns.

add another index that is based on the Person1 (or any of the other columns that are more likly to have the value in them) and include the other columns+ ProjectID.

or you could try adding three new indexes on the Person1 ,Person2 ,Person3 on every on indevidualy
adding the ProjectID column as an includeed column
look at the execution plan you could find that the optimizer uses only the indexes and skip the tables since the data required by the query is in the indexes.
ASKER CERTIFIED SOLUTION
Avatar of santhimurthyd
santhimurthyd
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Staudte

ASKER

Uhuh... you guys are faster than I can try... just to let you know I'm still on it, but can't reply in real-time :-) Thomasian's approach works lightning fast, but doesn't quite return the same figures as my (slow) approach - there are subtle differences in the COUNTs return by the statement. I will come back here once I have figured out where exactly the differences are and crome from. Also, I'll check out the other two comments. THANKS so much for looking into this!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Staudte

ASKER

Thomasian: That makes no difference. Here's my query
SELECT 
COUNT(*) as numprojects,
Personen.Nachname,Personen.Vorname,
Personen.PersonID
FROM Personen 
LEFT JOIN Projekte ON Personen.PersonID IN (Projekte.Teilnehmer1, Projekte.Teilnehmer2) 
WHERE FunktionTeilnehmer = 'x' AND (Right(Benutzername, 4) = '2012' OR Left(Right(Benutzername, 5), 4) = '2012')
GROUP BY personen.Nachname,Personen.Vorname, Personen.PersonID
ORDER BY numprojects desc, Nachname asc, vorname asc

Open in new window


and the results:

9	Brahm	Bastian	6103CC40-66D5-4D69-9348-50ED361FA327
4	Christi	Johannes	5D1331DF-5462-49DA-AAFD-3B3A67217BA7
4	Domes	Marian	3AD12839-F651-44AC-A992-DF64C32C653D
3	Bergenthal	Gesa	CEED9E8D-F4A0-44C0-BC7F-AA597CCA1DDE
3	Drabik	Filip	54EE8E6D-53F5-4C3F-8169-D760504E8777
3	Heuer	Eva-Maria	1BE35305-5119-48CE-BB01-09AC2915D6BE
3	Inözü	Görkem	C1EE813B-4A89-40DC-951E-74776270D29F
3	Klamka	Andrea Carolin	37DF318F-5C04-42F5-A824-A5E9D3D3B194
3	Kreiner	Paul	E0E82EAE-8D08-4641-A209-3A2A8AA518B5
3	Krüger	Nina	27B3F535-009B-4C1A-8321-7C3259357444
3	Meier	Marlin	FC09FFA2-E70D-413D-A0B9-58DF5ECBD32B
3	Mittelstaedt	Frédérique	FD52DDBB-B909-4969-AD86-ED06FC4D786D
3	Päzolt	Maximilian	2897CE4A-F2C0-4EF1-9B86-D17C867A2550
3	Reinhardt	Nana	6E51B054-C478-4E62-B81E-846E097FE30C
3	Robben	Anna	1F2F75B3-A014-49C4-983E-D5C7F92450DA

Open in new window


Your query:

SELECT 
P.numprojects,
Personen.Nachname, Personen.Vorname, Personen.PersonID
FROM Personen cross apply (
SELECT COUNT(*) AS numprojects FROM projekte WHERE Personen.PersonID=Projekte.Teilnehmer1 OR Personen.PersonID=Projekte.Teilnehmer2 OR Personen.PersonID=Projekte.Teilnehmer3) P
WHERE FunktionTeilnehmer = 'x' AND (Right(Benutzername, 4) = '2012' OR Left(Right(Benutzername, 5), 4) = '2012')
ORDER BY numprojects DESC, Nachname ASC, vorname ASC

Open in new window


returns

9	Brahm	Bastian	6103CC40-66D5-4D69-9348-50ED361FA327
4	Christi	Johannes	5D1331DF-5462-49DA-AAFD-3B3A67217BA7
4	Domes	Marian	3AD12839-F651-44AC-A992-DF64C32C653D
4	Riederle	Tobias	9325A52F-7345-4A31-9982-F65A72BFE1B4
3	Bergenthal	Gesa	CEED9E8D-F4A0-44C0-BC7F-AA597CCA1DDE
3	Drabik	Filip	54EE8E6D-53F5-4C3F-8169-D760504E8777
3	Fellerer	Konstantin	0494B7FA-DC58-4A58-83C5-44763BAA04ED
3	Heuer	Eva-Maria	1BE35305-5119-48CE-BB01-09AC2915D6BE
3	Inözü	Görkem	C1EE813B-4A89-40DC-951E-74776270D29F
3	Klamka	Andrea Carolin	37DF318F-5C04-42F5-A824-A5E9D3D3B194
3	Kreiner	Paul	E0E82EAE-8D08-4641-A209-3A2A8AA518B5
3	Krüger	Nina	27B3F535-009B-4C1A-8321-7C3259357444
3	Meier	Marlin	FC09FFA2-E70D-413D-A0B9-58DF5ECBD32B
3	Mittelstaedt	Frédérique	FD52DDBB-B909-4969-AD86-ED06FC4D786D
3	Oberländer	Nora	4AB5FEA1-14F7-4928-9F6C-D0F9015AFF85
3	Päzolt	Maximilian	2897CE4A-F2C0-4EF1-9B86-D17C867A2550
3	Plischke	Janina	E832EC76-75C9-4584-9BB6-0ECEB2F6B72F
3	Reinhardt	Nana	6E51B054-C478-4E62-B81E-846E097FE30C
3	Robben	Anna	1F2F75B3-A014-49C4-983E-D5C7F92450DA
3	Schild	Jonathan	1EF864BE-38DF-49E6-94EF-F49814ACE39A

Open in new window


The (first) difference is Riederle, Tobias (line 4 of your results) whom you show having 4 projects, while I find only 1 project (way down in my list)

So far the fact - let me do my homework to see where this comes from. I don't want you to do ALL of my work :-) Thanks so much!
In your query, you did not include Projekte.Teilnehmer3
Avatar of Staudte

ASKER

Uuuppps... that came when I removed and reinserted these criteria for tests...
Avatar of Staudte

ASKER

Hi Thomasian and Santhimurthyd,

both of your approaches work perfectly fine, both running at the same speed. (I understand Santhimurthyd's approach better, but that's just due to my personal lack of understanding CROSS APPLY - I need to look this up in the manuals.) (Sorry, ashilo, I haven't tried your suggestions, as they involve significant rebuilding of my indexes - something I'll look into at a later time, but right now the problem is solved by just modifying my query.)

Your replies were worth much more than just to solution of an isolated problem - they have improved my awareness of performance impacts of JOINS quite a bit.

I would love to give you both 1.000 points - if I could. Hey: I'll open another question and would like to invite both of you to answer very briefly.