Solved

TSQL question

Posted on 2012-04-04
8
358 Views
Last Modified: 2012-04-05
I hope I explain this sufficiently.   I am working on MS SQL Server 2008.  I am looking to code an ANSI TSQL statement(s) that will do the following.

The table being accessed is defined as such (I have cut out non-relevant columns):
CREATE TABLE PSExtract.temp.RJZ_Temp
(KeyID INT NOT NULL
,DedupID INT NOT NULL
,DistinctID VARCHAR(61) NULL
,EmployeeCount INT
,OSLevelID int NULL)

KeyID identifies a unique company,

DedupID defines a unique individual within a unique company

DistinctID = KeyID & ' - ' & DedupID

EmployeeCount is the total # of employees in a unique company (keyed on KeyID)

OSLevelID is the level of the employee in the food chain at the company, the lower the number, the higher the employee is at the company.  

What I am trying to do is extract a list from this table, I want to only select 10,000 rows (the table has 150,000+ rows in it),

I want to select the companies that have the highest # of employees first
BUT I only want 3 employees per company
AND the 3 employees for each company are the highest level employees there  (lowest OSLevelID value) (I realize that there may be employees of the same level who are not selected).

I am only 3 weeks into this job as a junior TSQL programmer so...  don't assume I know anything.

Thanks folks,
Rich
0
Comment
Question by:RichNH
8 Comments
 
LVL 6

Expert Comment

by:Patrick Tallarico
ID: 37808728
So in your table, are listed all the employees by DedupID, and each row has an accurate count of the total number of employees for that company in the EmployeeCount field?

I would suspect you would have to have some sort of process running to update the EmployeeCount field upon entering new data..?

Either way, I would think you could count your KeyIds for each company, group by the company to find your company with the greatest number of employees, then cycle through those results using a cursor.  I realize that cursors could be quite slow, so I suppose you would have to see

Declare @counter int,@rownum int,@empcnt int,@keyid int
/*create a table to dump the final records into*/
Create table temptable(Rank,DedupID,KeyId,{morefields for report})
/*use a counter to stop at 10,000
set @counter = 0
/*begin cursor for the ranked companies*/
Declare c Cursor For select rownumber() as 'rn',count(KeyId) as 'EmployeeCount',KeyId as 'CompanyId'
                         from PSExtract.temp.RJZ_Temp
                         Order by KeyId Desc
/*cycle through the ranked companies using the cursor*/
Open c
Fetch next From c into @rownum,@empcnt,@keyid
While @counter <= 10000
/*Insert into the temp table*/
   Insert Into temptable
   Select top 3 @rownum as 'Rank',DedupID,KeyID,{more fields for report}
   from PSExtract.temp.RJZ_Temp
   Where KeyId like @keyid
/*add one to your counter*/
  set @counter = @counter + 1
/*Move to the next record returned by the ranked query*/
   Fetch next From c into  @rownum,@empcnt,@keyid

End

Forgive me, my syntax may be a bit off, as I haven't had access to my test environment, but I hope what I've written could point you to a possible solution.
0
 
LVL 7

Accepted Solution

by:
Anoo S Pillai earned 500 total points
ID: 37809367
A co-related subquery will do the trick in this context. The following query will help you for a quick start:-

SELECT      TOP 10000 *
FROM      RJZ_Temp Employee
WHERE      Employee.DistinctID  IN  
            (
            SELECT      TOP 3 DistinctID
            FROM      RJZ_Temp Top3Emp
            WHERE      Top3Emp.KeyID =  Employee.KeyID
            ORDER BY OSLevelID ASC )  
ORDER BY EmployeeCount DESC

The table and data I used to test follows :-

CREATE TABLE RJZ_Temp
(KeyID INT NOT NULL
,DedupID INT NOT NULL
,DistinctID VARCHAR(61) NULL
,EmployeeCount INT
,OSLevelID int NULL)
GO
INSERT INTO RJZ_Temp VALUES ( 1 , 1 , '1-1' , 10 , 1 )
INSERT INTO RJZ_Temp VALUES ( 1 , 2 , '1-2' , 10 , 5 )
INSERT INTO RJZ_Temp VALUES ( 1 , 3 , '1-3' , 10 , 4 )
INSERT INTO RJZ_Temp VALUES ( 1 , 4 , '1-4' , 10 , 2 )
INSERT INTO RJZ_Temp VALUES ( 1 , 5 , '1-5' , 10 , 3 )
GO
INSERT INTO RJZ_Temp VALUES ( 2 , 1 , '2-1' , 20 , 2 )
INSERT INTO RJZ_Temp VALUES ( 2 , 2 , '2-2' , 20 , 1 )
INSERT INTO RJZ_Temp VALUES ( 2 , 3 , '2-3' , 20 , 3 )
INSERT INTO RJZ_Temp VALUES ( 2 , 4 , '2-4' , 20 , 4 )
INSERT INTO RJZ_Temp VALUES ( 2 , 5 , '2-5' , 20 , 5 )
GO
INSERT INTO RJZ_Temp VALUES ( 3 , 1 , '3-1' , 100 , 2 )
INSERT INTO RJZ_Temp VALUES ( 3 , 2 , '3-2' , 100 , 1 )
INSERT INTO RJZ_Temp VALUES ( 3 , 3 , '3-3' , 100 , 3 )
INSERT INTO RJZ_Temp VALUES ( 3 , 4 , '3-4' , 100 , 4 )
INSERT INTO RJZ_Temp VALUES ( 3 , 5 , '3-5' , 100 , 5 )
GO

As the no. of rows are large, I would prefer to convert this query into an equivalent JOIN statement, if you need help on that please post back.
0
 
LVL 32

Expert Comment

by:ewangoya
ID: 37809873
try

select top 10000 A.KeyID, B.DedupID, A.DistinctID, A.EmployeeCount, A.OSLevelID
from (select top 100 percent * from RJZ_Temp order by EmployeeCount desc) A
inner join (select top 3 DedupID from RJZ_Temp order by OSLevelID) B on B.DedupID = A.DedupID

Open in new window

0
NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

 
LVL 32

Expert Comment

by:ewangoya
ID: 37809888
You could do a join with the DistinctID instead

select top 10000 A.KeyID, A.DedupID, B.DistinctID, A.EmployeeCount, A.OSLevelID
from (select top 100 percent * from RJZ_Temp order by EmployeeCount desc) A
inner join (select top 3 DedupID from RJZ_Temp order by OSLevelID) B on B.DistinctID= A.DistinctID

Open in new window

0
 
LVL 32

Expert Comment

by:ewangoya
ID: 37809890
correction

select top 10000 A.KeyID, A.DedupID, B.DistinctID, A.EmployeeCount, A.OSLevelID
from (select top 100 percent * from RJZ_Temp order by EmployeeCount desc) A
inner join (select top 3 DistinctID from RJZ_Temp order by OSLevelID) B on B.DistinctID= A.DistinctID

Open in new window

0
 
LVL 32

Expert Comment

by:ewangoya
ID: 37809899
Actually that seems a little bit too complicated but should work

Try this other one

select top 10000 A.KeyID, B.DedupID, A.DistinctID, A.EmployeeCount, A.OSLevelID
from RJZ_Temp A
inner join (select top 3 DedupID from RJZ_Temp order by OSLevelID) B on B.DedupID = A.DedupID
order by A.EmployeeCount DESC, A.KeyID ASC
 

Open in new window

0
 
LVL 32

Expert Comment

by:ewangoya
ID: 37809917
Disregard my first three queries,

And here is yet another variation

select * from
(
  select top 10000 A.KeyID, B.DedupID, A.DistinctID, A.EmployeeCount, A.OSLevelID
  from RJZ_Temp A
  inner join (select top 3 DedupID from RJZ_Temp order by OSLevelID) B on B.DedupID = A.DedupID
  order by A.EmployeeCount DESC, A.KeyID ASC
) X
order by KeyID, OSLevelID

Open in new window

0
 
LVL 1

Author Closing Comment

by:RichNH
ID: 37811392
Thank you, Your solution correctly returned what I was looking for.   I realized after the fact that there were other factors I hadn't considered but they were easily surmounted once I was pointed in the correct direction.

If you have time, I'd love to see how this all worked as an inner join.  FWIW, your solution went into a larger query that was three levels deep and .  The first two levels are an inner join, I'll look to make this level an inner join but I sometimes struggle with this stuff.
0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
DTS Connection Failed 7 70
SQL Server 2012 r2 - Make Temp Table Query Faster 5 43
Unable to Uninstall Visual Studio 2015 7 27
SQL View nearest date 5 36
Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question