Staudte
asked on
SQL SELECT performance with JOIN and COUNT
Hi Experts,
I have a MS SQL Server 2008 database with the following structure:
• table Persons with a PersonID and some data fields, e.g. Name and State,
• table Projects with Person1, Person2, Person3 and some data fields, e.g. ProjectTitle. I have 3 separate indexes on Person1, 2 and 3.
A person's ID may appear as Person1, Person2 or Person3 of the Projects table, when the person has participated in the project. There each several tens of thousands persons and projects in the database.
I would now like to SELECT all persons that match a certain criteria and count the number of projects they are in. So far I have used
SELECT COUNT(Projects.ProjectID) AS NumProjects,
Persons.Name
FROM Persons
LEFT JOIN Projects ON (Persons.PersonID = Projects.Person1 OR Persons.PersonID = Projects.Person2 OR Persons.PersonID = Projects.Person3)
WHERE State='CA'
GROUP BY Persons.Name
This takes a very long time: About 1 minute to return about 10.000 records. If I remove the 3rd condition and only compare to Person1 or 2, time is about the same. If I only compare to Person1, time is less than a second!
Unfortunately, I need to compare to all Person1, 2 and 3 - could you suggest I way I can cut the cost of the query down?
Thanks a lot for your advice,
Thomas
I have a MS SQL Server 2008 database with the following structure:
• table Persons with a PersonID and some data fields, e.g. Name and State,
• table Projects with Person1, Person2, Person3 and some data fields, e.g. ProjectTitle. I have 3 separate indexes on Person1, 2 and 3.
A person's ID may appear as Person1, Person2 or Person3 of the Projects table, when the person has participated in the project. There each several tens of thousands persons and projects in the database.
I would now like to SELECT all persons that match a certain criteria and count the number of projects they are in. So far I have used
SELECT COUNT(Projects.ProjectID) AS NumProjects,
Persons.Name
FROM Persons
LEFT JOIN Projects ON (Persons.PersonID = Projects.Person1 OR Persons.PersonID = Projects.Person2 OR Persons.PersonID = Projects.Person3)
WHERE State='CA'
GROUP BY Persons.Name
This takes a very long time: About 1 minute to return about 10.000 records. If I remove the 3rd condition and only compare to Person1 or 2, time is about the same. If I only compare to Person1, time is less than a second!
Unfortunately, I need to compare to all Person1, 2 and 3 - could you suggest I way I can cut the cost of the query down?
Thanks a lot for your advice,
Thomas
ASKER
Sorry, no change. Actually even a bit slower - but that might by caused by other server load. Having 2 or 3 fields in the IN-set takes ~1 minute, having just one field returns instantaneously.
Try if this works better
SELECT
P.NumProjects
,Persons.Name
FROM
Persons CROSS APPLY
(SELECT COUNT(*) AS NumProjects
FROM Projects
WHERE Persons.PersonID = Projects.Person1 OR Persons.PersonID = Projects.Person2 OR Persons.PersonID = Projects.Person3
) P
WHERE State='CA'
GROUP BY Persons.Name
hi
create an index on the STATE columns which includes (use the include clause) the PersonID,name columns.
add another index that is based on the Person1 (or any of the other columns that are more likly to have the value in them) and include the other columns+ ProjectID.
or you could try adding three new indexes on the Person1 ,Person2 ,Person3 on every on indevidualy
adding the ProjectID column as an includeed column
look at the execution plan you could find that the optimizer uses only the indexes and skip the tables since the data required by the query is in the indexes.
create an index on the STATE columns which includes (use the include clause) the PersonID,name columns.
add another index that is based on the Person1 (or any of the other columns that are more likly to have the value in them) and include the other columns+ ProjectID.
or you could try adding three new indexes on the Person1 ,Person2 ,Person3 on every on indevidualy
adding the ProjectID column as an includeed column
look at the execution plan you could find that the optimizer uses only the indexes and skip the tables since the data required by the query is in the indexes.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Uhuh... you guys are faster than I can try... just to let you know I'm still on it, but can't reply in real-time :-) Thomasian's approach works lightning fast, but doesn't quite return the same figures as my (slow) approach - there are subtle differences in the COUNTs return by the statement. I will come back here once I have figured out where exactly the differences are and crome from. Also, I'll check out the other two comments. THANKS so much for looking into this!
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thomasian: That makes no difference. Here's my query
and the results:
Your query:
returns
The (first) difference is Riederle, Tobias (line 4 of your results) whom you show having 4 projects, while I find only 1 project (way down in my list)
So far the fact - let me do my homework to see where this comes from. I don't want you to do ALL of my work :-) Thanks so much!
SELECT
COUNT(*) as numprojects,
Personen.Nachname,Personen.Vorname,
Personen.PersonID
FROM Personen
LEFT JOIN Projekte ON Personen.PersonID IN (Projekte.Teilnehmer1, Projekte.Teilnehmer2)
WHERE FunktionTeilnehmer = 'x' AND (Right(Benutzername, 4) = '2012' OR Left(Right(Benutzername, 5), 4) = '2012')
GROUP BY personen.Nachname,Personen.Vorname, Personen.PersonID
ORDER BY numprojects desc, Nachname asc, vorname asc
and the results:
9 Brahm Bastian 6103CC40-66D5-4D69-9348-50ED361FA327
4 Christi Johannes 5D1331DF-5462-49DA-AAFD-3B3A67217BA7
4 Domes Marian 3AD12839-F651-44AC-A992-DF64C32C653D
3 Bergenthal Gesa CEED9E8D-F4A0-44C0-BC7F-AA597CCA1DDE
3 Drabik Filip 54EE8E6D-53F5-4C3F-8169-D760504E8777
3 Heuer Eva-Maria 1BE35305-5119-48CE-BB01-09AC2915D6BE
3 Inözü Görkem C1EE813B-4A89-40DC-951E-74776270D29F
3 Klamka Andrea Carolin 37DF318F-5C04-42F5-A824-A5E9D3D3B194
3 Kreiner Paul E0E82EAE-8D08-4641-A209-3A2A8AA518B5
3 Krüger Nina 27B3F535-009B-4C1A-8321-7C3259357444
3 Meier Marlin FC09FFA2-E70D-413D-A0B9-58DF5ECBD32B
3 Mittelstaedt Frédérique FD52DDBB-B909-4969-AD86-ED06FC4D786D
3 Päzolt Maximilian 2897CE4A-F2C0-4EF1-9B86-D17C867A2550
3 Reinhardt Nana 6E51B054-C478-4E62-B81E-846E097FE30C
3 Robben Anna 1F2F75B3-A014-49C4-983E-D5C7F92450DA
Your query:
SELECT
P.numprojects,
Personen.Nachname, Personen.Vorname, Personen.PersonID
FROM Personen cross apply (
SELECT COUNT(*) AS numprojects FROM projekte WHERE Personen.PersonID=Projekte.Teilnehmer1 OR Personen.PersonID=Projekte.Teilnehmer2 OR Personen.PersonID=Projekte.Teilnehmer3) P
WHERE FunktionTeilnehmer = 'x' AND (Right(Benutzername, 4) = '2012' OR Left(Right(Benutzername, 5), 4) = '2012')
ORDER BY numprojects DESC, Nachname ASC, vorname ASC
returns
9 Brahm Bastian 6103CC40-66D5-4D69-9348-50ED361FA327
4 Christi Johannes 5D1331DF-5462-49DA-AAFD-3B3A67217BA7
4 Domes Marian 3AD12839-F651-44AC-A992-DF64C32C653D
4 Riederle Tobias 9325A52F-7345-4A31-9982-F65A72BFE1B4
3 Bergenthal Gesa CEED9E8D-F4A0-44C0-BC7F-AA597CCA1DDE
3 Drabik Filip 54EE8E6D-53F5-4C3F-8169-D760504E8777
3 Fellerer Konstantin 0494B7FA-DC58-4A58-83C5-44763BAA04ED
3 Heuer Eva-Maria 1BE35305-5119-48CE-BB01-09AC2915D6BE
3 Inözü Görkem C1EE813B-4A89-40DC-951E-74776270D29F
3 Klamka Andrea Carolin 37DF318F-5C04-42F5-A824-A5E9D3D3B194
3 Kreiner Paul E0E82EAE-8D08-4641-A209-3A2A8AA518B5
3 Krüger Nina 27B3F535-009B-4C1A-8321-7C3259357444
3 Meier Marlin FC09FFA2-E70D-413D-A0B9-58DF5ECBD32B
3 Mittelstaedt Frédérique FD52DDBB-B909-4969-AD86-ED06FC4D786D
3 Oberländer Nora 4AB5FEA1-14F7-4928-9F6C-D0F9015AFF85
3 Päzolt Maximilian 2897CE4A-F2C0-4EF1-9B86-D17C867A2550
3 Plischke Janina E832EC76-75C9-4584-9BB6-0ECEB2F6B72F
3 Reinhardt Nana 6E51B054-C478-4E62-B81E-846E097FE30C
3 Robben Anna 1F2F75B3-A014-49C4-983E-D5C7F92450DA
3 Schild Jonathan 1EF864BE-38DF-49E6-94EF-F49814ACE39A
The (first) difference is Riederle, Tobias (line 4 of your results) whom you show having 4 projects, while I find only 1 project (way down in my list)
So far the fact - let me do my homework to see where this comes from. I don't want you to do ALL of my work :-) Thanks so much!
In your query, you did not include Projekte.Teilnehmer3
ASKER
Uuuppps... that came when I removed and reinserted these criteria for tests...
ASKER
Hi Thomasian and Santhimurthyd,
both of your approaches work perfectly fine, both running at the same speed. (I understand Santhimurthyd's approach better, but that's just due to my personal lack of understanding CROSS APPLY - I need to look this up in the manuals.) (Sorry, ashilo, I haven't tried your suggestions, as they involve significant rebuilding of my indexes - something I'll look into at a later time, but right now the problem is solved by just modifying my query.)
Your replies were worth much more than just to solution of an isolated problem - they have improved my awareness of performance impacts of JOINS quite a bit.
I would love to give you both 1.000 points - if I could. Hey: I'll open another question and would like to invite both of you to answer very briefly.
both of your approaches work perfectly fine, both running at the same speed. (I understand Santhimurthyd's approach better, but that's just due to my personal lack of understanding CROSS APPLY - I need to look this up in the manuals.) (Sorry, ashilo, I haven't tried your suggestions, as they involve significant rebuilding of my indexes - something I'll look into at a later time, but right now the problem is solved by just modifying my query.)
Your replies were worth much more than just to solution of an isolated problem - they have improved my awareness of performance impacts of JOINS quite a bit.
I would love to give you both 1.000 points - if I could. Hey: I'll open another question and would like to invite both of you to answer very briefly.
SELECT COUNT(Projects.ProjectID) AS NumProjects, Persons.Name
FROM Persons
LEFT JOIN Projects ON Persons.PersonID in(Projects.Person1 , Projects.Person2 , Projects.Person3)
WHERE State='CA' GROUP BY Persons.Name