SQL GROUP BY With Multiple Columns

Hey there,

I'm trying to select all of the records from MyTable where EmployeeID is a duplicate and Name is not the same.

For Example, given this table:
------------------------------------------------------
Name  EmployeeNumber
------------------------------------------------------
Jim             0001
Joe              0002
Jay              0002
Kim            0003
------------------------------------------------------

I want to return Joe and Jay

I tried this:

SELECT Name, EmployeeID
FROM MyTable
GROUP BY Name, EmployeeID HAVING COUNT (EmployeeID) > 1

But the only thing that does is show me I don't really understand the GROUP BY clause!  :P

Any help would be greatly appreciated.  

Thanks!
LVL 1
ttist25Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

unknown_routineCommented:
Try this:

SELECT Name, EmployeeID,count(*)
FROM MyTable
GROUP BY Name, EmployeeID
 HAVING COUNT (EmployeeID) > 1

Open in new window



What is  the output of above for you?
0
Simone BSenior E-Commerce AnalystCommented:
Take "Name" out of the group by:

SELECT Name, EmployeeID
FROM MyTable
GROUP BY EmployeeID HAVING COUNT (EmployeeID) > 1

Open in new window

0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
Copy-paste the below code into your SSMS, verify that it works, then modify to meet your needs.  
CREATE TABLE #MyTable (name varchar(3), EmployeeID varchar(4))

INSERT INTO #MyTable (name, EmployeeID)
VALUES ('Jim', '0001'), ('Joe', '0002'), ('Jay', '0002'), ('Kim', '0003')

SELECT #MyTable.Name, #MyTable.EmployeeID
FROM #MyTable
	JOIN (	SELECT EmployeeID, COUNT(EmployeeID) as dups 
			FROM #MyTable
			GROUP BY EmployeeID 
			HAVING COUNT(EmployeeID) > 1) dups ON #MyTable.EmployeeID = dups.EmployeeID

Open in new window


>But the only thing that does is show me I don't really understand the GROUP BY clause!
Eyeball your SELECT clause.  Any column name that does not participate in an aggregate such as SUM(), COUNT(), etc. needs to be in the GROUP BY clause.
0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

Simone BSenior E-Commerce AnalystCommented:
Sorry, forgot something in my previous post:

select name, employeeID from MyTable where employeeID in 
(SELECT EmployeeID
FROM MyTable
GROUP BY EmployeeID HAVING COUNT (EmployeeID) > 1)

Open in new window

0
Scott PletcherSenior DBACommented:
DISTINCT makes this easy and accurate:


SELECT mt.*
FROM MyTable mt
INNER JOIN (
    SELECT EmployeeID
    FROM MyTable
    GROUP BY EmployeeID
    HAVING COUNT (DISTINCT Name) > 1
) AS mt2 ON
    mt2.EmployeeID = mt.EmployeeID
ORDER BY
    mt.EmployeeID, mt.Name
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ttist25Author Commented:
Ok guys - first - thanks so much for all of the responses.  Good to know you're there!

I used jimhorn's CREATE TABLE code to make a test table (thanks jim for the #table trick - you've got a way of always teaching me something I didn't ask for and I appreciate it).

Then I ran all of the suggestions against it and they all returned the expected result (except for unknownroutine's which returned nothing).

As you might have guessed - these aren't the actual field names and table names I'm working with so, I modified the suggestions to match my real data and ran them and, interestingly, Buttercup and Jim's suggestions returned the same number of records and Scott's didn't return any.  Hmmm?

Eyeing up the records returned from Jim and Buttercup, they seem to be returning records that I don't want; that is, they are returning EmployeeId's that appear to have only a single unique name.  

So, what to do?  I've got way too many records to eye it all up - is Scott's giving me the correct result (there are no duplicate EmployeeID's with more than one name)?

SQL makes my brain hurt :P

Thanks again for all of your help!
0
ttist25Author Commented:
PS - If I say the equivalent of :

SELECT name FROM #MyTable GROUP BY name COUNT(*)>1

I get a significant number of records returned.
0
Scott PletcherSenior DBACommented:
Run the subquery in my code by itself:

    SELECT EmployeeID
    FROM MyTable
    GROUP BY EmployeeID
    HAVING COUNT (DISTINCT Name) > 1

If that returns no rows, then there are no EmployeeIDs with different names (although there could be some Ids with Names that are NULL (not empty, but NULL)).
0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
>I want to return Joe and Jay
>they seem to be returning records that I don't want; that is, they are returning EmployeeId's that appear to have only a single unique name.

'I want to return Joe and Jay' should have been defined as 'I want to return all names and employee ID's where the ID's are the same but the names are different.

To state the obvious, the better requirements can be explained in the original question, the more efficient experts can be in providing a solution.
0
Scott PletcherSenior DBACommented:
>> PS - If I say the equivalent of :
SELECT name FROM #MyTable GROUP BY name COUNT(*)>1
<<

Somewhat odd, but that's still a fundamentally different query.

For example, two people could each be named 'John Smith' but with different employeeIDs.

My query won't show them, since they don't overlap, but the query above will.
0
Scott PletcherSenior DBACommented:
@jimhorn:

I believe that's a quite unfair criticism.  The problem was clearly stated in the opening sentence of the original q:

"
I'm trying to select all of the records from MyTable where EmployeeID is a duplicate and Name is not the same.
"
0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
The below code works, throwing in a bunch of duplicate Kim/0003 rows and then verifying that they are not returned, although I'm guessing a more elegant solution can be found.
DROP TABLE #MyTable

CREATE TABLE #MyTable (name varchar(3), EmployeeID varchar(4))

INSERT INTO #MyTable (name, EmployeeID)
VALUES ('Jim', '0001'), ('Joe', '0002'), ('Jay', '0002'), ('Kim', '0003'), ('Kim', '0003'), ('Kim', '0003')

SELECT #MyTable.Name, #MyTable.EmployeeID
FROM #MyTable
	JOIN (	SELECT DISTINCT a.EmployeeID, COUNT(a.EmployeeID) as dups 
			FROM ( SELECT DISTINCT Name, EmployeeID FROM #MyTable) a
			GROUP BY a.EmployeeID 
			HAVING COUNT(a.EmployeeID) > 1) dups ON #MyTable.EmployeeID = dups.EmployeeID

Open in new window

0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
@Scott - So it would appear.  I stand corrected.
0
ttist25Author Commented:
Point taken Jim, I apologize.  My request was not clear and I'm not sure why I always try to obfuscate using different field names etc., it used to be OK but as things get more complicated, it definitely becomes problematic.  

The "name" field equivalent is actually SSN.  Sorry.  

So to be more clear, I am trying to find instances where an EmployeeID has been assigned to more than one person.  I'm using SSN to identify a person.  

Scott - when I run the subquery I get 0 records returned.  

Given that, is it safe to assume that there are no EmployeeID's assigned to more than one person (SSN)?  

Again I apologize and will try to be more clear with future questions.  

Thanks again.
0
Scott PletcherSenior DBACommented:
I believe the data is clean if the subquery returned no rows.

As I noted above, I think using DISTINCT makes it easier to accept that the query results are accurate.
0
Simone BSenior E-Commerce AnalystCommented:
Would your data could have records like Jim and Jim with the same employeeid? And if so, you want to exclude them, but include Jim (just once) and Jane?

------------------------------------------------------
Name  EmployeeNumber
------------------------------------------------------
Jim             0001
Jim             0001
Jane            0001
Joe              0002
Jay              0002
Kim            0003
------------------------------------------------------

So the desired result would be:

------------------------------------------------------
Name  EmployeeNumber
------------------------------------------------------
Jim             0001
Jane            0001
Joe              0002
Jay              0002
------------------------------------------------------
0
Simone BSenior E-Commerce AnalystCommented:
Try this. It also uses Jim's idea of a temp table.

DROP TABLE #MyTable

SELECT DISTINCT Name, EmployeeID INTO #MyTable FROM MyTable

select name, employeeID from #MyTable where employeeID in
(SELECT EmployeeID
FROM #MyTable
GROUP BY EmployeeID HAVING COUNT (EmployeeID) > 1)
0
ttist25Author Commented:
Buttercup - I'm trying to identify records where an EmployeeID has been assigned to more than one person (name/SSN - yeesh - sorry again about  that).  

Jim and Scott, I ran Jim's second query on my actual data and got 0 records.  So, given everything, I think it's safe to say there are no EmployeeID's that have been assigned to more than one person (SSN), would you agree?  

Thanks again.
0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
>, would you agree?  
Can't say without having access to your SSN numbers, so really the only way to validate that aside from sending us your data would be for you to add known dups into the source data, and test to make sure the T-SQL returns them.
0
ttist25Author Commented:
Great idea.  How would you like them?  Just copy and paste them here?  :P

I'll dump everything into a test table, add dups, and see what happens.  

I'll let you know what happens.
0
PortletPaulfreelancerCommented:
I've used the sample data by Buttercup1 ID: 39468258 where only Kim/0003 is unique

Name  EmployeeNumber
Jim             0001
Jim             0001
Jane            0001
Joe              0002
Jay              0002
Kim            0003

here are 2 alternates (and I'm not suggesting these are performance improvements) but they may help.

This one will list all records except Kim/0003 and both rows of Jim/001 are shown:
SELECT
       mytable.name, mytable.employeenumber
FROM mytable
INNER JOIN (
            SELECT DISTINCT employeenumber
            FROM (
                  SELECT
                          employeenumber
                        , count(*) over (partition BY employeenumber) AS cn
                        , count(*) over (partition BY name) AS ce
                  FROM MyTable
                 ) x
            WHERE cn > 1 OR ce > 1
           ) dup
       ON mytable.employeenumber = dup.employeenumber
ORDER BY mytable.name, mytable.employeenumber
;

Open in new window

The next one also excludes Kim/003 but only shows one row for Jim
SELECT
       name, employeenumber
FROM mytable

--MINUS -- ora
EXCEPT --mssql

SELECT
       name, employeenumber
FROM (
      SELECT
              name
            , employeenumber
            , count(*) over (partition BY employeenumber) AS cn
            , count(*) over (partition BY name) AS ce
      FROM MyTable
    ) x
WHERE cn = 1 AND ce = 1
ORDER BY name, employeenumber
;

Open in new window

nb: tested on Oracle only http://sqlfiddle.com/#!4/0e9f0/1
{+ edit} added 'except'
0
ttist25Author Commented:
Thanks for the help everyone.  I really did learn something from each of you, thus the split.  

Thanks again.
0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
Thanks for the split.  Good luck with your project.  -Jim
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.