Rob Rudloff
asked on
Combine alternate (related) rows into a single row ?
Hi. If there's an easy way to do this in SQL, it'll be a huge timesaver for me, however I can't seem to find a way ...
Simplifying my issue, I have employee attendance data, including Emp ID, an "IN" or "OUT" record, and the date/time. Assume they are sorted as such.
I would like to combine the "IN" and "OUT" records into a single record in a new table, so :
Thanks!
Simplifying my issue, I have employee attendance data, including Emp ID, an "IN" or "OUT" record, and the date/time. Assume they are sorted as such.
I would like to combine the "IN" and "OUT" records into a single record in a new table, so :
EMPID PUNCH PTIME
94 IN 3/9/2013 6:09:24
94 OUT 3/9/2013 15:16:48
94 IN 3/11/2013 6:10:56
94 OUT 3/11/2013 17:02:04
95 IN 2/22/2013 6:00:00
95 OUT 2/22/2013 17:06:32
95 IN 2/23/2013 5:38:02
95 OUT 2/23/2013 14:59:00
Would look like:EMPID PTIME-IN PTIME-OUT
94 3/9/2013 6:09:24 3/9/2013 15:16:48
94 3/11/2013 6:10:56 3/11/2013 17:02:04
95 2/22/2013 6:00:00 2/22/2013 17:06:32
95 2/23/2013 5:38:02 2/23/2013 14:59:00
There is another wrinkle that I am not illustrating here, but we'll tackle that later...Thanks!
something like this:
DECLARE @table TABLE (
EMPID INT,
PUNCH VARCHAR(10),
PTIME DATETIME)
INSERT INTO @table ( EMPID, PUNCH, PTIME )
VALUES
(94, 'IN', '3/9/2013 6:09:24'),
(94, 'OUT', '3/9/2013 15:16:48'),
(94, 'IN', '3/11/2013 6:10:56'),
(94, 'OUT', '3/11/2013 17:02:04'),
(95, 'IN', '2/22/2013 6:00:00'),
(95, 'OUT', '2/22/2013 17:06:32'),
(95, 'IN', '2/23/2013 5:38:02'),
(95, 'OUT', '2/23/2013 14:59:00')
SELECT * FROM @table
SELECT DISTINCT T1.EMPID, CAST(T1.PTIME AS DATE)
, (SELECT PTIME FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'in' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-IN'
, (SELECT PTIME FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'out' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-OUT'
FROM @table AS T1
Hi,
you can do that with sub-queries like this:
As I do not have test tables this is of course untested. But you should be aware that this would only work if the employee has exactly one ptime-in and one ptime-out for each day. If that happens more than once each day (and that is very likely) it would not work.
I don't see why you want to move that into a new table with two fields as that makes exactly this no longer possible. The original layout is perfect as it is.
Cheers,
Christian
you can do that with sub-queries like this:
WITH qryEmp AS
(SELECT EmpID, CAST(FLOOR(CAST(PTIME AS float)) AS int) AS PDATE
FROM @table AS P0)
SELECT DISTINCT EmpID,
(SELECT PTIME FROM @table AS P1
WHERE P1.EMPID = P.EMPID
AND CAST(FLOOR(CAST(PTIME AS float)) AS int) = P.PDATE
AND P1.PUNCH = 'IN') AS [PTIME-IN],
(SELECT PTIME FROM @table AS P2
WHERE P2.EMPID = P.EMPID
AND CAST(FLOOR(CAST(PTIME AS float)) AS int) = P.PDATE
AND P2.PUNCH = 'OUT') AS [PTIME-OUT]
FROM qryEmp AS P
As I do not have test tables this is of course untested. But you should be aware that this would only work if the employee has exactly one ptime-in and one ptime-out for each day. If that happens more than once each day (and that is very likely) it would not work.
I don't see why you want to move that into a new table with two fields as that makes exactly this no longer possible. The original layout is perfect as it is.
Cheers,
Christian
ASKER
I will give that code a try.
To elaborate on the "why" of this situation:
The original table is a pure "punch" table. There is .net code that loads this table into another "timesheet" table - -the code looks for the "in" and "out" records, and decides if there are any erroneous punches that should be ignored. Among other things, the "timesheet" table has the "in", the "out", and the calculated "hours worked" ( which is "out" minus "in"). I think the reason the timesheet table exists is because there can be erroneous punches.
This experiment is simply to preload the timesheet table with missing punches. But, if I can ultimately do all that in SQL, I could eliminate the the .NET program that does the same thing.
Here's the wrinkle that I alluded to: The data can have erroneous duplicate "in" or "out" punches, so the data can actaully look like this:
To elaborate on the "why" of this situation:
The original table is a pure "punch" table. There is .net code that loads this table into another "timesheet" table - -the code looks for the "in" and "out" records, and decides if there are any erroneous punches that should be ignored. Among other things, the "timesheet" table has the "in", the "out", and the calculated "hours worked" ( which is "out" minus "in"). I think the reason the timesheet table exists is because there can be erroneous punches.
This experiment is simply to preload the timesheet table with missing punches. But, if I can ultimately do all that in SQL, I could eliminate the the .NET program that does the same thing.
Here's the wrinkle that I alluded to: The data can have erroneous duplicate "in" or "out" punches, so the data can actaully look like this:
EMPID PUNCH PTIME
94 IN 3/9/2013 6:09:24
94 IN 3/9/2013 6:09:50
94 OUT 3/9/2013 15:16:48
94 IN 3/11/2013 6:10:56
94 OUT 3/11/2013 17:02:04
94 OUT 3/11/2013 17:02:44
95 IN 2/22/2013 6:00:00
The .NET code decides to throw away any duplicate/earlier "in" punch, and any duplicate/later "out" punch. So the result here would be:EMPID PTIME-IN PTIME-OUT
94 3/9/2013 6:09:50 3/9/2013 15:16:48
94 3/11/2013 6:10:56 3/11/2013 17:02:04
Change the query in my earlier post, from a LEFT JOIN to an INNER JOIN.
However, I'd strongly suggest testing this with real data to check for unexpected edge cases not shown in the sample data.
Results:
However, I'd strongly suggest testing this with real data to check for unexpected edge cases not shown in the sample data.
;WITH data
AS
(
SELECT EmpID, punch, ptime
, ROW_NUMBER() OVER(PARTITION BY EmpID ORDER BY ptime) AS RowNum
FROM #yourTable
)
SELECT it.EmpID, it.ptime AS PunchInTime, ot.ptime AS PunchOutTime
FROM data it INNER JOIN data ot
ON it.EmpID = ot.EmpID
AND ot.punch = 'OUT'
AND it.ptime < ot.ptime
AND it.RowNum = ot.RowNum - 1
WHERE it.punch = 'IN'
ORDER BY it.EmpId, it.ptime
Results:
EmpID PunchInTime PunchOutTime
94 2013-03-09 06:09:50.000 2013-03-09 15:16:48.000
94 2013-03-11 06:10:56.000 2013-03-11 17:02:04.000
for your "wrinkle", just add a MAX to my sub queries:
DECLARE @table TABLE (
EMPID INT,
PUNCH VARCHAR(10),
PTIME DATETIME)
INSERT INTO @table ( EMPID, PUNCH, PTIME )
VALUES
(94, 'IN', '3/9/2013 6:09:24'),
(94, 'IN', '3/9/2013 6:09:50'),
(94, 'OUT', '3/9/2013 15:16:48'),
(94, 'IN', '3/11/2013 6:10:56'),
(94, 'OUT', '3/11/2013 17:02:04'),
(95, 'IN', '2/22/2013 6:00:00'),
(95, 'OUT', '2/22/2013 17:06:32'),
(95, 'IN', '2/23/2013 5:38:02'),
(95, 'OUT', '2/23/2013 14:59:00')
SELECT * FROM @table
SELECT DISTINCT T1.EMPID, CAST(T1.PTIME AS DATE)
, (SELECT MAX(PTIME) FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'in' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-IN'
, (SELECT MAX(PTIME) FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'out' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-OUT'
FROM @table AS T1
ASKER
Hi.
Both solutions, by _agx_ and Éric Moreau seem to work with my original test data.
So I ran both on the actual table, and both solutions are real close to working on all the "weird" data situations.
The data below illustrates the weird/outlier situations, where there are:
The solution by _agx_ is real close, in that it handles the outlier records.
If I can get it to take the latest "out" record, rather than the first "out" record, then it's perfect.
The solution by Éric Moreau is also close, in that it gets that "later out record", but it doesn't handle the outliers.
Here's what I am running in SQL:
Any suggestions?
This is all greatly appreciated, I must say!
Thanks
Rob
Both solutions, by _agx_ and Éric Moreau seem to work with my original test data.
So I ran both on the actual table, and both solutions are real close to working on all the "weird" data situations.
The data below illustrates the weird/outlier situations, where there are:
- duplicate "outs" for a single "in"
- an "out" with no "in" (see first record)
- an "in" with no "outs" (see last record)
- an "out" on the next day from the "in" (see emp 100)
The solution by _agx_ is real close, in that it handles the outlier records.
If I can get it to take the latest "out" record, rather than the first "out" record, then it's perfect.
The solution by Éric Moreau is also close, in that it gets that "later out record", but it doesn't handle the outliers.
Here's what I am running in SQL:
INSERT INTO @MyTable ( EMP_ID, PunchCode, PunchTime )
VALUES
(94, 'OUT', '03/09/2013 06:09'),
(94, 'IN', '03/09/2013 07:10'),
(94, 'OUT', '03/09/2013 15:16'),
(95, 'IN', '02/22/2013 06:00'),
(95, 'OUT', '02/22/2013 17:06'),
(95, 'OUT', '02/22/2013 17:07'),
(95, 'IN', '02/23/2013 05:38'),
(95, 'OUT', '02/23/2013 14:59'),
(100, 'IN', '02/24/2013 05:38'),
(100, 'OUT', '02/25/2013 04:00'),
(100, 'IN', '02/25/2013 10:00')
select * from @MyTable
order by emp_id, PunchTime
--** Éric Moreau *****************
SELECT DISTINCT T1.EMP_ID, CAST(T1.PunchTime AS DATE),
(SELECT MAX(PunchTime) FROM @MyTable WHERE EMP_ID = T1.EMP_ID AND PunchCode = 'in' AND CAST(PunchTime AS DATE) = CAST(T1.PunchTime AS date)) AS 'PTIME-IN',
(SELECT MAX(PunchTime) FROM @MyTable WHERE EMP_ID = T1.EMP_ID AND PunchCode = 'out' AND CAST(PunchTime AS DATE) = CAST(T1.PunchTime AS date)) AS 'PTIME-OUT'
FROM @MyTable AS T1
ORDER BY t1.EMP_ID , CAST(T1.PunchTime AS DATE)
--**********************************************************
--** _agx_ *************************
;WITH data
AS
(SELECT EMP_ID, PunchCode, PunchTime,
ROW_NUMBER() OVER(PARTITION BY EMP_ID ORDER BY PunchTime) AS RowNum
FROM @MyTable
)
SELECT it.EMP_ID, it.PunchTime AS PunchInTime, ot.PunchTime AS PunchOutTime
FROM data it
INNER JOIN data ot
ON it.EMP_ID = ot.EMP_ID
AND ot.PunchCode = 'OUT'
AND it.PunchTime < ot.PunchTime
AND it.RowNum = ot.RowNum - 1
WHERE it.PunchCode = 'IN'
ORDER BY it.EMP_ID, it.PunchTime
--**********************************************************
The results are:
EMP_ID (No column name) PTIME-IN PTIME-OUT
94 2013-03-09 2013-03-09 07:10:00.000 2013-03-09 15:16:00.000
95 2013-02-22 2013-02-22 06:00:00.000 2013-02-22 17:07:00.000
95 2013-02-23 2013-02-23 05:38:00.000 2013-02-23 14:59:00.000
100 2013-02-24 2013-02-24 05:38:00.000 NULL ***
100 2013-02-25 2013-02-25 10:00:00.000 2013-02-25 04:00:00.000
EMP_ID PunchInTime PunchOutTime
94 2013-03-09 07:10:00.000 2013-03-09 15:16:00.000
95 2013-02-22 06:00:00.000 2013-02-22 17:06:00.000 ***
95 2013-02-23 05:38:00.000 2013-02-23 14:59:00.000
100 2013-02-24 05:38:00.000 2013-02-25 04:00:00.000
I see how you are both getting the good results, but I can't figure a way to fix the outliers in either solution.Any suggestions?
This is all greatly appreciated, I must say!
Thanks
Rob
Query:
;WITH data
AS
(
SELECT EmpID, punch, ptime
, ROW_NUMBER() OVER(PARTITION BY EmpID ORDER BY ptime) AS OverallRow
, ROW_NUMBER() OVER(PARTITION BY punch ORDER BY EmpID, ptime) AS GroupRow
FROM #yourTable
)
SELECT ci.EmpID
, ci.ptime AS PunchInTime
, co.PunchOutTime
-- current and next punch "IN" time
FROM data ci LEFT JOIN data ni
ON ci.EmpID = ni.EmpID
AND ci.punch = ni.punch
AND ci.GroupRow = ni.GroupRow - 1
CROSS APPLY (
-- latest punch OUT that's between curr/next punch "IN"
SELECT MAX(ot.ptime)
FROM data ot
WHERE ot.EmpID = ci.EmpID
AND ot.punch = 'OUT'
AND ot.OverallRow > ci.OverallRow
AND (ot.OverallRow < ni.OverallRow OR ni.OverallRow IS NULL)
) co(PunchOutTime)
WHERE ci.punch = 'IN'
AND co.PunchOutTime IS NOT NULL
ORDER BY ci.EmpID, ci.ptime
Results:EmpID PunchInTime PunchOutTime
94 2013-03-09 07:10:00.000 2013-03-09 15:16:00.000
95 2013-02-22 06:00:00.000 2013-02-22 17:07:00.000
95 2013-02-23 05:38:00.000 2013-02-23 14:59:00.000
100 2013-02-24 05:38:00.000 2013-02-25 04:00:00.000
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
with the latest dataset, what is your expected result?
ASKER
I ran it by the "deciders" here, and they say to use the latest IN punch and the latest OUT punch, and work-time can span over-nights).
So the data in would look like this the first set of times below, and the result would look like the last set. (It looks like _asx_ last solution and results) :
So the data in would look like this the first set of times below, and the result would look like the last set. (It looks like _asx_ last solution and results) :
(94, 'OUT', '03/09/2013 06:09'), <--- throw away
(94, 'IN', '03/09/2013 07:10'), <--- keep
(94, 'OUT', '03/09/2013 15:16'), <--- keep
(95, 'IN', '02/22/2013 06:00'), <--- keep
(95, 'OUT', '02/22/2013 17:06'), <--- throw away
(95, 'OUT', '02/22/2013 17:07'), <--- keep
(95, 'IN', '02/23/2013 05:38'), <--- keep
(95, 'OUT', '02/23/2013 14:59'), <--- keep
(100, 'IN', '02/24/2013 05:38'), <--- keep
(100, 'OUT', '02/25/2013 04:00'), <--- keep
(100, 'IN', '02/25/2013 10:00'), <--- throw away
EMP_ID PunchInTime PunchOutTime
94 2013-03-09 07:10:00.000 2013-03-09 15:16
95 2013-02-22 06:00:00.000 2013-02-22 17:07
95 2013-02-23 05:38:00.000 2013-02-23 14:59
100 2013-02-24 05:38:00.000 2013-02-25 04:00
<<Simplifying my issue, I have employee attendance data, including Emp ID, an "IN" or "OUT" record, and the date/time. >>
As a side comment having done more than a few time clock systems, I have found it simpler to have a single record with an in and out date and time rather than having a record for each transaction.
There are too many problems with trying to pair up the transactions afterwards with this type of design. It becomes especially confusing when employees forget to punch in or out.
If you are in control of the design, you might want to consider changing it.
Of course the time clock process would need to be updated as well. When an employee identifies themselves, you then need to figure out if they are clocking in (your going to start a new record), or punching out (updating the last record with an end date/time). Sometimes though depending on what your using for that, that might not be possible to do.
Jim.
As a side comment having done more than a few time clock systems, I have found it simpler to have a single record with an in and out date and time rather than having a record for each transaction.
There are too many problems with trying to pair up the transactions afterwards with this type of design. It becomes especially confusing when employees forget to punch in or out.
If you are in control of the design, you might want to consider changing it.
Of course the time clock process would need to be updated as well. When an employee identifies themselves, you then need to figure out if they are clocking in (your going to start a new record), or punching out (updating the last record with an end date/time). Sometimes though depending on what your using for that, that might not be possible to do.
Jim.
ASKER
That last solution from _agx_ seems to do the best job, both for the original sample data situations I put forth, and the actual data encountered in production. I am going to tweak it a bit, and run some tests. This is all to re-platform (and fix) our old timecock system.
-- thanks all!
-- Rob
-- thanks all!
-- Rob
Demo Table
Open in new window
Query:Open in new window
Results:
Open in new window