Link to home
Start Free TrialLog in
Avatar of Rob Rudloff
Rob RudloffFlag for United States of America

asked on

Combine alternate (related) rows into a single row ?

Hi.  If there's an easy way to do this in SQL, it'll be a huge timesaver for me, however I can't seem to find a way ...

Simplifying my issue, I have employee attendance data, including Emp ID, an "IN" or "OUT" record, and the date/time.   Assume they are sorted as such.
I would like to combine the "IN" and "OUT" records into a single record in a new table, so :
EMPID	PUNCH	PTIME
94		IN	3/9/2013 6:09:24
94		OUT	3/9/2013 15:16:48
94		IN	3/11/2013 6:10:56
94		OUT	3/11/2013 17:02:04
95		IN	2/22/2013 6:00:00
95		OUT	2/22/2013 17:06:32
95		IN	2/23/2013 5:38:02
95		OUT	2/23/2013 14:59:00

Open in new window

Would look like:
EMPID	PTIME-IN		PTIME-OUT
94		3/9/2013 6:09:24	3/9/2013 15:16:48
94		3/11/2013 6:10:56	3/11/2013 17:02:04
95		2/22/2013 6:00:00	2/22/2013 17:06:32
95		2/23/2013 5:38:02	2/23/2013 14:59:00

Open in new window

There is another wrinkle that I am not illustrating here, but we'll tackle that later...
Thanks!
Avatar of _agx_
_agx_
Flag of United States of America image

Maybe use row_number() to sort by EmpID and punch date. Then find pairs where check IN and OUT rows are consecutive?

Demo Table
create table #yourTable
(
empID int
, punch varchar(5)
, ptime datetime
)

insert into #yourTable
values 
(94,'IN','3/9/2013 6:09:24')
, (94,'OUT','3/9/2013 15:16:48')
, (94,'IN','3/11/2013 6:10:56')
, (94,'OUT','3/11/2013 17:02:04')
, (95,'IN','2/22/2013 6:00:00')
, (95,'OUT','2/22/2013 17:06:32')
, (95,'IN','2/23/2013 5:38:02')
, (95,'OUT','2/23/2013 14:59:00')
, (96,'IN','2/23/2013 5:38:02')
, (96,'IN','2/23/2013 6:38:02')
, (96,'OUT','2/23/2013 14:59:00')

Open in new window

Query:
WITH data
AS
(
	SELECT EmpID, punch, ptime
			, ROW_NUMBER() OVER(PARTITION BY EmpID ORDER BY ptime) AS RowNum
	FROM  #yourTable
)
SELECT it.EmpID, it.ptime AS PunchInTime, ot.ptime AS PunchOutTime
FROM   data it LEFT JOIN data ot 
		ON it.EmpID = ot.EmpID 
		AND ot.punch = 'OUT' 
		AND it.ptime < ot.ptime 
		AND it.RowNum = ot.RowNum - 1
WHERE  it.punch = 'IN'		
ORDER BY it.EmpId, it.ptime

Open in new window


Results:
EmpID	PunchInTime	PunchOutTime
94	2013-03-09 06:09:24.000	2013-03-09 15:16:48.000
94	2013-03-11 06:10:56.000	2013-03-11 17:02:04.000
95	2013-02-22 06:00:00.000	2013-02-22 17:06:32.000
95	2013-02-23 05:38:02.000	2013-02-23 14:59:00.000
96	2013-02-23 05:38:02.000	NULL
96	2013-02-23 06:38:02.000	2013-02-23 14:59:00.000

Open in new window

something like this:
DECLARE @table TABLE (
	EMPID	INT,
	PUNCH	VARCHAR(10),
	PTIME	DATETIME)

INSERT INTO @table ( EMPID, PUNCH, PTIME )
VALUES 
(94, 'IN', '3/9/2013 6:09:24'),
(94, 'OUT', '3/9/2013 15:16:48'),
(94, 'IN', '3/11/2013 6:10:56'),
(94, 'OUT', '3/11/2013 17:02:04'),
(95, 'IN', '2/22/2013 6:00:00'),
(95, 'OUT', '2/22/2013 17:06:32'),
(95, 'IN', '2/23/2013 5:38:02'),
(95, 'OUT', '2/23/2013 14:59:00')

SELECT * FROM @table

SELECT DISTINCT T1.EMPID, CAST(T1.PTIME AS DATE) 
, (SELECT PTIME FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'in' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-IN'
, (SELECT PTIME FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'out' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-OUT'
FROM @table AS T1

Open in new window

Hi,

you can do that with sub-queries like this:
WITH qryEmp AS
(SELECT EmpID, CAST(FLOOR(CAST(PTIME AS float)) AS int) AS PDATE
 FROM @table AS P0)
SELECT DISTINCT EmpID, 
       (SELECT PTIME FROM @table AS P1
        WHERE P1.EMPID = P.EMPID
        AND CAST(FLOOR(CAST(PTIME AS float)) AS int) = P.PDATE
        AND P1.PUNCH = 'IN') AS [PTIME-IN],
       (SELECT PTIME FROM @table AS P2
        WHERE P2.EMPID = P.EMPID
        AND CAST(FLOOR(CAST(PTIME AS float)) AS int) = P.PDATE
        AND P2.PUNCH = 'OUT') AS [PTIME-OUT]
FROM qryEmp AS P

Open in new window


As I do not have test tables this is of course untested. But you should be aware that this would only work if the employee has exactly one ptime-in and one ptime-out for each day. If that happens more than once each day (and that is very likely) it would not work.

I don't see why you want to move that into a new table with two fields as that makes exactly this no longer possible. The original layout is perfect as it is.

Cheers,

Christian
Avatar of Rob Rudloff

ASKER

I will give that code a try.  
To elaborate on the "why" of this situation:
The original table is a pure "punch" table.  There is .net code that loads this table into another "timesheet" table - -the code looks for the "in" and "out" records, and decides if there are any erroneous punches that should be ignored.   Among other things, the "timesheet" table has the "in", the "out", and the calculated "hours worked" ( which is "out" minus "in").  I think the reason the timesheet table exists is because there can be erroneous punches.  
This experiment is simply to preload the timesheet table with missing punches.  But, if I can ultimately do all that in SQL, I could eliminate the the .NET program that does the same thing.

Here's the wrinkle that I alluded to:   The data can have erroneous duplicate "in" or "out" punches, so the data can actaully look like this:
EMPID	PUNCH	PTIME
94		IN	3/9/2013 6:09:24
94		IN	3/9/2013 6:09:50
94		OUT	3/9/2013 15:16:48
94		IN	3/11/2013 6:10:56
94		OUT	3/11/2013 17:02:04
94		OUT	3/11/2013 17:02:44
95		IN	2/22/2013 6:00:00

Open in new window

The .NET code decides to throw away any duplicate/earlier "in" punch, and any duplicate/later "out" punch.  So the result here would be:
EMPID	PTIME-IN		PTIME-OUT
94		3/9/2013 6:09:50	3/9/2013 15:16:48
94		3/11/2013 6:10:56	3/11/2013 17:02:04

Open in new window

Change the query in my earlier post, from a LEFT JOIN to an INNER JOIN.

However, I'd strongly suggest testing this with real data to check for unexpected edge cases not shown in the sample data.

;WITH data
AS
(
	SELECT EmpID, punch, ptime
			, ROW_NUMBER() OVER(PARTITION BY EmpID ORDER BY ptime) AS RowNum
	FROM  #yourTable
)
SELECT it.EmpID, it.ptime AS PunchInTime, ot.ptime AS PunchOutTime
FROM   data it INNER JOIN data ot 
		ON it.EmpID = ot.EmpID 
		AND ot.punch = 'OUT' 
		AND it.ptime < ot.ptime 
		AND it.RowNum = ot.RowNum - 1
WHERE  it.punch = 'IN'		
ORDER BY it.EmpId, it.ptime

Open in new window


Results:
EmpID	PunchInTime	PunchOutTime
94	2013-03-09 06:09:50.000	2013-03-09 15:16:48.000
94	2013-03-11 06:10:56.000	2013-03-11 17:02:04.000

Open in new window

for your "wrinkle", just add a MAX to my sub queries:

DECLARE @table TABLE (
	EMPID	INT,
	PUNCH	VARCHAR(10),
	PTIME	DATETIME)

INSERT INTO @table ( EMPID, PUNCH, PTIME )
VALUES 
(94, 'IN', '3/9/2013 6:09:24'),
(94, 'IN', '3/9/2013 6:09:50'),
(94, 'OUT', '3/9/2013 15:16:48'),
(94, 'IN', '3/11/2013 6:10:56'),
(94, 'OUT', '3/11/2013 17:02:04'),
(95, 'IN', '2/22/2013 6:00:00'),
(95, 'OUT', '2/22/2013 17:06:32'),
(95, 'IN', '2/23/2013 5:38:02'),
(95, 'OUT', '2/23/2013 14:59:00')

SELECT * FROM @table

SELECT DISTINCT T1.EMPID, CAST(T1.PTIME AS DATE) 
, (SELECT MAX(PTIME) FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'in' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-IN'
, (SELECT MAX(PTIME) FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'out' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-OUT'
FROM @table AS T1

Open in new window

Hi.
Both solutions, by _agx_   and  Éric Moreau  seem to work with my original test data.

So I ran both on the actual table, and both solutions are real close to working on all the "weird" data situations.  
The data below illustrates the weird/outlier situations, where there are:
  • duplicate "outs" for a single "in"
  • an "out" with no "in"  (see first record)
  • an "in" with no "outs" (see last record)
  • an "out" on the next day from the "in"  (see emp 100)

The solution by _agx_  is real close, in that it handles the outlier records.  
If I can get it to take the latest "out" record, rather than the first "out" record, then it's perfect.

The solution by Éric Moreau is also close, in that it gets that "later out record", but it doesn't handle the outliers.

Here's what I am running in SQL:
INSERT INTO @MyTable ( EMP_ID, PunchCode, PunchTime )
VALUES 
(94, 'OUT',  '03/09/2013 06:09'),
(94, 'IN',   '03/09/2013 07:10'),
(94, 'OUT',  '03/09/2013 15:16'),
(95, 'IN',   '02/22/2013 06:00'),
(95, 'OUT',  '02/22/2013 17:06'),
(95, 'OUT',  '02/22/2013 17:07'),
(95, 'IN',   '02/23/2013 05:38'),
(95, 'OUT',  '02/23/2013 14:59'),
(100, 'IN',  '02/24/2013 05:38'),
(100, 'OUT', '02/25/2013 04:00'),
(100, 'IN',  '02/25/2013 10:00')


select * from @MyTable 
order by emp_id, PunchTime 


--** Éric Moreau  ***************** 
SELECT DISTINCT T1.EMP_ID, CAST(T1.PunchTime AS DATE), 
	(SELECT MAX(PunchTime) FROM @MyTable WHERE EMP_ID = T1.EMP_ID AND PunchCode = 'in'  AND CAST(PunchTime AS DATE) = CAST(T1.PunchTime AS date)) AS 'PTIME-IN',
	(SELECT MAX(PunchTime) FROM @MyTable WHERE EMP_ID = T1.EMP_ID AND PunchCode = 'out' AND CAST(PunchTime AS DATE) = CAST(T1.PunchTime AS date)) AS 'PTIME-OUT'
FROM @MyTable AS T1
ORDER BY  t1.EMP_ID , CAST(T1.PunchTime AS DATE) 
--**********************************************************


--** _agx_ *************************
;WITH data
AS
(SELECT	EMP_ID, PunchCode, PunchTime, 
        ROW_NUMBER() OVER(PARTITION BY EMP_ID ORDER BY PunchTime) AS RowNum
 FROM  @MyTable
)

SELECT it.EMP_ID, it.PunchTime AS PunchInTime, ot.PunchTime AS PunchOutTime
FROM data it
INNER JOIN data ot 
		ON it.EMP_ID = ot.EMP_ID 
		AND ot.PunchCode = 'OUT' 
		AND it.PunchTime < ot.PunchTime 
		AND it.RowNum = ot.RowNum - 1
   WHERE it.PunchCode = 'IN'		
   ORDER BY it.EMP_ID, it.PunchTime

--**********************************************************

Open in new window

The results are:
EMP_ID	(No column name)	PTIME-IN		PTIME-OUT
94		2013-03-09		2013-03-09 07:10:00.000	2013-03-09 15:16:00.000
95		2013-02-22		2013-02-22 06:00:00.000	2013-02-22 17:07:00.000
95		2013-02-23		2013-02-23 05:38:00.000	2013-02-23 14:59:00.000
100		2013-02-24		2013-02-24 05:38:00.000	NULL ***
100		2013-02-25		2013-02-25 10:00:00.000	2013-02-25 04:00:00.000

Open in new window

EMP_ID	PunchInTime			PunchOutTime
94		2013-03-09 07:10:00.000		2013-03-09 15:16:00.000
95		2013-02-22 06:00:00.000		2013-02-22 17:06:00.000 ***
95		2013-02-23 05:38:00.000		2013-02-23 14:59:00.000
100		2013-02-24 05:38:00.000		2013-02-25 04:00:00.000

Open in new window

I see how you are both getting the good results, but I can't figure a way to fix the outliers in either solution.
Any suggestions?
This is all greatly appreciated, I must say!
Thanks
Rob
Query:
;WITH data
AS
(

	SELECT EmpID, punch, ptime
           , ROW_NUMBER() OVER(PARTITION BY EmpID ORDER BY ptime) AS OverallRow
           , ROW_NUMBER() OVER(PARTITION BY punch ORDER BY EmpID, ptime) AS GroupRow
	FROM  #yourTable
)
SELECT  ci.EmpID
        , ci.ptime AS PunchInTime
        , co.PunchOutTime
--  current and next punch "IN" time
FROM   data ci LEFT JOIN data ni 
            ON ci.EmpID = ni.EmpID 
            AND ci.punch = ni.punch
            AND ci.GroupRow = ni.GroupRow - 1
         CROSS APPLY (
             -- latest punch OUT that's between curr/next punch "IN"
             SELECT MAX(ot.ptime)
             FROM   data ot
             WHERE  ot.EmpID = ci.EmpID 
             AND	   ot.punch = 'OUT' 
             AND    ot.OverallRow > ci.OverallRow
             AND    (ot.OverallRow < ni.OverallRow OR ni.OverallRow IS NULL) 
        ) co(PunchOutTime)
WHERE  ci.punch = 'IN'	
AND    co.PunchOutTime IS NOT NULL
ORDER BY ci.EmpID, ci.ptime

Open in new window

Results:

EmpID	PunchInTime	PunchOutTime
94	2013-03-09 07:10:00.000	2013-03-09 15:16:00.000
95	2013-02-22 06:00:00.000	2013-02-22 17:07:00.000
95	2013-02-23 05:38:00.000	2013-02-23 14:59:00.000
100	2013-02-24 05:38:00.000	2013-02-25 04:00:00.000

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of _agx_
_agx_
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
with the latest dataset, what is your expected result?
I ran it by the "deciders" here, and they say to use the latest IN punch and the latest OUT punch, and work-time can span over-nights).
So the data in would look like this the first set of times below, and the result would look like the last set.   (It looks like _asx_  last solution and results) :
(94, 'OUT',  '03/09/2013 06:09'),	<--- throw away
(94, 'IN',   '03/09/2013 07:10'),	<--- keep
(94, 'OUT',  '03/09/2013 15:16'),	<--- keep
(95, 'IN',   '02/22/2013 06:00'),	<--- keep
(95, 'OUT',  '02/22/2013 17:06'),	<--- throw away
(95, 'OUT',  '02/22/2013 17:07'),	<--- keep
(95, 'IN',   '02/23/2013 05:38'),	<--- keep
(95, 'OUT',  '02/23/2013 14:59'),	<--- keep
(100, 'IN',  '02/24/2013 05:38'),	<--- keep
(100, 'OUT', '02/25/2013 04:00'),	<--- keep
(100, 'IN',  '02/25/2013 10:00'),	<--- throw away

Open in new window

EMP_ID	PunchInTime			PunchOutTime
94		2013-03-09 07:10:00.000		2013-03-09 15:16
95		2013-02-22 06:00:00.000		2013-02-22 17:07
95		2013-02-23 05:38:00.000		2013-02-23 14:59
100		2013-02-24 05:38:00.000		2013-02-25 04:00

Open in new window

<<Simplifying my issue, I have employee attendance data, including Emp ID, an "IN" or "OUT" record, and the date/time.   >>

  As a side comment having done more than a few time clock systems, I have found it simpler to have a single record with an in and out date and time rather than having a record for each transaction.

  There are too many problems with trying  to pair up the transactions afterwards with this type of design.   It becomes especially confusing when employees forget to punch in or out.

  If you are in control of the design, you might want to consider changing it.

  Of course the time clock process would need to be updated as well.   When an employee identifies themselves, you then need to figure out if they are clocking in (your going to start a new record), or punching out (updating the last record with an end date/time).   Sometimes though depending on what your using for that, that might not be possible to do.

Jim.
That last solution from _agx_ seems to do the best job, both for the original sample data situations I put forth, and the actual data encountered in production.   I am going to tweak it a bit, and run some tests.   This is all to re-platform (and fix) our old timecock system.  
-- thanks all!
-- Rob