asked on

Combine alternate (related) rows into a single row ?

Hi. If there's an easy way to do this in SQL, it'll be a huge timesaver for me, however I can't seem to find a way ...

Simplifying my issue, I have employee attendance data, including Emp ID, an "IN" or "OUT" record, and the date/time. Assume they are sorted as such.
I would like to combine the "IN" and "OUT" records into a single record in a new table, so :

EMPID	PUNCH	PTIME
94		IN	3/9/2013 6:09:24
94		OUT	3/9/2013 15:16:48
94		IN	3/11/2013 6:10:56
94		OUT	3/11/2013 17:02:04
95		IN	2/22/2013 6:00:00
95		OUT	2/22/2013 17:06:32
95		IN	2/23/2013 5:38:02
95		OUT	2/23/2013 14:59:00

Open in new window

Would look like:

EMPID	PTIME-IN		PTIME-OUT
94		3/9/2013 6:09:24	3/9/2013 15:16:48
94		3/11/2013 6:10:56	3/11/2013 17:02:04
95		2/22/2013 6:00:00	2/22/2013 17:06:32
95		2/23/2013 5:38:02	2/23/2013 14:59:00

Open in new window

There is another wrinkle that I am not illustrating here, but we'll tackle that later...
Thanks!

_agx_

Maybe use row_number() to sort by EmpID and punch date. Then find pairs where check IN and OUT rows are consecutive?

Demo Table

create table #yourTable
(
empID int
, punch varchar(5)
, ptime datetime
)

insert into #yourTable
values 
(94,'IN','3/9/2013 6:09:24')
, (94,'OUT','3/9/2013 15:16:48')
, (94,'IN','3/11/2013 6:10:56')
, (94,'OUT','3/11/2013 17:02:04')
, (95,'IN','2/22/2013 6:00:00')
, (95,'OUT','2/22/2013 17:06:32')
, (95,'IN','2/23/2013 5:38:02')
, (95,'OUT','2/23/2013 14:59:00')
, (96,'IN','2/23/2013 5:38:02')
, (96,'IN','2/23/2013 6:38:02')
, (96,'OUT','2/23/2013 14:59:00')

Open in new window

Query:

WITH data
AS
(
	SELECT EmpID, punch, ptime
			, ROW_NUMBER() OVER(PARTITION BY EmpID ORDER BY ptime) AS RowNum
	FROM  #yourTable
)
SELECT it.EmpID, it.ptime AS PunchInTime, ot.ptime AS PunchOutTime
FROM   data it LEFT JOIN data ot 
		ON it.EmpID = ot.EmpID 
		AND ot.punch = 'OUT' 
		AND it.ptime < ot.ptime 
		AND it.RowNum = ot.RowNum - 1
WHERE  it.punch = 'IN'		
ORDER BY it.EmpId, it.ptime

Open in new window

Results:

EmpID	PunchInTime	PunchOutTime
94	2013-03-09 06:09:24.000	2013-03-09 15:16:48.000
94	2013-03-11 06:10:56.000	2013-03-11 17:02:04.000
95	2013-02-22 06:00:00.000	2013-02-22 17:06:32.000
95	2013-02-23 05:38:02.000	2013-02-23 14:59:00.000
96	2013-02-23 05:38:02.000	NULL
96	2013-02-23 06:38:02.000	2013-02-23 14:59:00.000

Open in new window

Éric Moreau

something like this:

DECLARE @table TABLE (
	EMPID	INT,
	PUNCH	VARCHAR(10),
	PTIME	DATETIME)

INSERT INTO @table ( EMPID, PUNCH, PTIME )
VALUES 
(94, 'IN', '3/9/2013 6:09:24'),
(94, 'OUT', '3/9/2013 15:16:48'),
(94, 'IN', '3/11/2013 6:10:56'),
(94, 'OUT', '3/11/2013 17:02:04'),
(95, 'IN', '2/22/2013 6:00:00'),
(95, 'OUT', '2/22/2013 17:06:32'),
(95, 'IN', '2/23/2013 5:38:02'),
(95, 'OUT', '2/23/2013 14:59:00')

SELECT * FROM @table

SELECT DISTINCT T1.EMPID, CAST(T1.PTIME AS DATE) 
, (SELECT PTIME FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'in' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-IN'
, (SELECT PTIME FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'out' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-OUT'
FROM @table AS T1

Open in new window

Bitsqueezer

Hi,

you can do that with sub-queries like this:

WITH qryEmp AS
(SELECT EmpID, CAST(FLOOR(CAST(PTIME AS float)) AS int) AS PDATE
 FROM @table AS P0)
SELECT DISTINCT EmpID, 
       (SELECT PTIME FROM @table AS P1
        WHERE P1.EMPID = P.EMPID
        AND CAST(FLOOR(CAST(PTIME AS float)) AS int) = P.PDATE
        AND P1.PUNCH = 'IN') AS [PTIME-IN],
       (SELECT PTIME FROM @table AS P2
        WHERE P2.EMPID = P.EMPID
        AND CAST(FLOOR(CAST(PTIME AS float)) AS int) = P.PDATE
        AND P2.PUNCH = 'OUT') AS [PTIME-OUT]
FROM qryEmp AS P

Open in new window

As I do not have test tables this is of course untested. But you should be aware that this would only work if the employee has exactly one ptime-in and one ptime-out for each day. If that happens more than once each day (and that is very likely) it would not work.

I don't see why you want to move that into a new table with two fields as that makes exactly this no longer possible. The original layout is perfect as it is.

Cheers,

Christian

Rob Rudloff

ASKER

I will give that code a try.
To elaborate on the "why" of this situation:
The original table is a pure "punch" table. There is .net code that loads this table into another "timesheet" table - -the code looks for the "in" and "out" records, and decides if there are any erroneous punches that should be ignored. Among other things, the "timesheet" table has the "in", the "out", and the calculated "hours worked" ( which is "out" minus "in"). I think the reason the timesheet table exists is because there can be erroneous punches.
This experiment is simply to preload the timesheet table with missing punches. But, if I can ultimately do all that in SQL, I could eliminate the the .NET program that does the same thing.

Here's the wrinkle that I alluded to: The data can have erroneous duplicate "in" or "out" punches, so the data can actaully look like this:

EMPID	PUNCH	PTIME
94		IN	3/9/2013 6:09:24
94		IN	3/9/2013 6:09:50
94		OUT	3/9/2013 15:16:48
94		IN	3/11/2013 6:10:56
94		OUT	3/11/2013 17:02:04
94		OUT	3/11/2013 17:02:44
95		IN	2/22/2013 6:00:00

Open in new window

The .NET code decides to throw away any duplicate/earlier "in" punch, and any duplicate/later "out" punch. So the result here would be:

EMPID	PTIME-IN		PTIME-OUT
94		3/9/2013 6:09:50	3/9/2013 15:16:48
94		3/11/2013 6:10:56	3/11/2013 17:02:04

Open in new window

_agx_

Change the query in my earlier post, from a LEFT JOIN to an INNER JOIN.

However, I'd strongly suggest testing this with real data to check for unexpected edge cases not shown in the sample data.

;WITH data
AS
(
	SELECT EmpID, punch, ptime
			, ROW_NUMBER() OVER(PARTITION BY EmpID ORDER BY ptime) AS RowNum
	FROM  #yourTable
)
SELECT it.EmpID, it.ptime AS PunchInTime, ot.ptime AS PunchOutTime
FROM   data it INNER JOIN data ot 
		ON it.EmpID = ot.EmpID 
		AND ot.punch = 'OUT' 
		AND it.ptime < ot.ptime 
		AND it.RowNum = ot.RowNum - 1
WHERE  it.punch = 'IN'		
ORDER BY it.EmpId, it.ptime

Open in new window

Results:

EmpID	PunchInTime	PunchOutTime
94	2013-03-09 06:09:50.000	2013-03-09 15:16:48.000
94	2013-03-11 06:10:56.000	2013-03-11 17:02:04.000

Open in new window

Éric Moreau

for your "wrinkle", just add a MAX to my sub queries:

DECLARE @table TABLE (
	EMPID	INT,
	PUNCH	VARCHAR(10),
	PTIME	DATETIME)

INSERT INTO @table ( EMPID, PUNCH, PTIME )
VALUES 
(94, 'IN', '3/9/2013 6:09:24'),
(94, 'IN', '3/9/2013 6:09:50'),
(94, 'OUT', '3/9/2013 15:16:48'),
(94, 'IN', '3/11/2013 6:10:56'),
(94, 'OUT', '3/11/2013 17:02:04'),
(95, 'IN', '2/22/2013 6:00:00'),
(95, 'OUT', '2/22/2013 17:06:32'),
(95, 'IN', '2/23/2013 5:38:02'),
(95, 'OUT', '2/23/2013 14:59:00')

SELECT * FROM @table

SELECT DISTINCT T1.EMPID, CAST(T1.PTIME AS DATE) 
, (SELECT MAX(PTIME) FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'in' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-IN'
, (SELECT MAX(PTIME) FROM @table WHERE EMPID = T1.EMPID AND PUNCH = 'out' AND CAST(PTIME AS DATE) = CAST(T1.PTIME AS date)) AS 'PTIME-OUT'
FROM @table AS T1

Open in new window

Rob Rudloff

ASKER

Hi.
Both solutions, by _agx_ and Éric Moreau seem to work with my original test data.

So I ran both on the actual table, and both solutions are real close to working on all the "weird" data situations.
The data below illustrates the weird/outlier situations, where there are:

duplicate "outs" for a single "in"
an "out" with no "in" (see first record)
an "in" with no "outs" (see last record)
an "out" on the next day from the "in" (see emp 100)

The solution by _agx_ is real close, in that it handles the outlier records.
If I can get it to take the latest "out" record, rather than the first "out" record, then it's perfect.

The solution by Éric Moreau is also close, in that it gets that "later out record", but it doesn't handle the outliers.

Here's what I am running in SQL:

INSERT INTO @MyTable ( EMP_ID, PunchCode, PunchTime )
VALUES 
(94, 'OUT',  '03/09/2013 06:09'),
(94, 'IN',   '03/09/2013 07:10'),
(94, 'OUT',  '03/09/2013 15:16'),
(95, 'IN',   '02/22/2013 06:00'),
(95, 'OUT',  '02/22/2013 17:06'),
(95, 'OUT',  '02/22/2013 17:07'),
(95, 'IN',   '02/23/2013 05:38'),
(95, 'OUT',  '02/23/2013 14:59'),
(100, 'IN',  '02/24/2013 05:38'),
(100, 'OUT', '02/25/2013 04:00'),
(100, 'IN',  '02/25/2013 10:00')


select * from @MyTable 
order by emp_id, PunchTime 


--** Éric Moreau  ***************** 
SELECT DISTINCT T1.EMP_ID, CAST(T1.PunchTime AS DATE), 
	(SELECT MAX(PunchTime) FROM @MyTable WHERE EMP_ID = T1.EMP_ID AND PunchCode = 'in'  AND CAST(PunchTime AS DATE) = CAST(T1.PunchTime AS date)) AS 'PTIME-IN',
	(SELECT MAX(PunchTime) FROM @MyTable WHERE EMP_ID = T1.EMP_ID AND PunchCode = 'out' AND CAST(PunchTime AS DATE) = CAST(T1.PunchTime AS date)) AS 'PTIME-OUT'
FROM @MyTable AS T1
ORDER BY  t1.EMP_ID , CAST(T1.PunchTime AS DATE) 
--**********************************************************


--** _agx_ *************************
;WITH data
AS
(SELECT	EMP_ID, PunchCode, PunchTime, 
        ROW_NUMBER() OVER(PARTITION BY EMP_ID ORDER BY PunchTime) AS RowNum
 FROM  @MyTable
)

SELECT it.EMP_ID, it.PunchTime AS PunchInTime, ot.PunchTime AS PunchOutTime
FROM data it
INNER JOIN data ot 
		ON it.EMP_ID = ot.EMP_ID 
		AND ot.PunchCode = 'OUT' 
		AND it.PunchTime < ot.PunchTime 
		AND it.RowNum = ot.RowNum - 1
   WHERE it.PunchCode = 'IN'		
   ORDER BY it.EMP_ID, it.PunchTime

--**********************************************************

Open in new window

The results are:

EMP_ID	(No column name)	PTIME-IN		PTIME-OUT
94		2013-03-09		2013-03-09 07:10:00.000	2013-03-09 15:16:00.000
95		2013-02-22		2013-02-22 06:00:00.000	2013-02-22 17:07:00.000
95		2013-02-23		2013-02-23 05:38:00.000	2013-02-23 14:59:00.000
100		2013-02-24		2013-02-24 05:38:00.000	NULL ***
100		2013-02-25		2013-02-25 10:00:00.000	2013-02-25 04:00:00.000

Open in new window

EMP_ID	PunchInTime			PunchOutTime
94		2013-03-09 07:10:00.000		2013-03-09 15:16:00.000
95		2013-02-22 06:00:00.000		2013-02-22 17:06:00.000 ***
95		2013-02-23 05:38:00.000		2013-02-23 14:59:00.000
100		2013-02-24 05:38:00.000		2013-02-25 04:00:00.000

Open in new window

I see how you are both getting the good results, but I can't figure a way to fix the outliers in either solution.
Any suggestions?
This is all greatly appreciated, I must say!
Thanks
Rob

_agx_

Query:

;WITH data
AS
(

	SELECT EmpID, punch, ptime
           , ROW_NUMBER() OVER(PARTITION BY EmpID ORDER BY ptime) AS OverallRow
           , ROW_NUMBER() OVER(PARTITION BY punch ORDER BY EmpID, ptime) AS GroupRow
	FROM  #yourTable
)
SELECT  ci.EmpID
        , ci.ptime AS PunchInTime
        , co.PunchOutTime
--  current and next punch "IN" time
FROM   data ci LEFT JOIN data ni 
            ON ci.EmpID = ni.EmpID 
            AND ci.punch = ni.punch
            AND ci.GroupRow = ni.GroupRow - 1
         CROSS APPLY (
             -- latest punch OUT that's between curr/next punch "IN"
             SELECT MAX(ot.ptime)
             FROM   data ot
             WHERE  ot.EmpID = ci.EmpID 
             AND	   ot.punch = 'OUT' 
             AND    ot.OverallRow > ci.OverallRow
             AND    (ot.OverallRow < ni.OverallRow OR ni.OverallRow IS NULL) 
        ) co(PunchOutTime)
WHERE  ci.punch = 'IN'	
AND    co.PunchOutTime IS NOT NULL
ORDER BY ci.EmpID, ci.ptime

Open in new window

Results:

EmpID	PunchInTime	PunchOutTime
94	2013-03-09 07:10:00.000	2013-03-09 15:16:00.000
95	2013-02-22 06:00:00.000	2013-02-22 17:07:00.000
95	2013-02-23 05:38:00.000	2013-02-23 14:59:00.000
100	2013-02-24 05:38:00.000	2013-02-25 04:00:00.000

Open in new window

ASKER CERTIFIED SOLUTION

_agx_

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Éric Moreau

with the latest dataset, what is your expected result?

Rob Rudloff

ASKER

I ran it by the "deciders" here, and they say to use the latest IN punch and the latest OUT punch, and work-time can span over-nights).
So the data in would look like this the first set of times below, and the result would look like the last set. (It looks like _asx_ last solution and results) :

(94, 'OUT',  '03/09/2013 06:09'),	<--- throw away
(94, 'IN',   '03/09/2013 07:10'),	<--- keep
(94, 'OUT',  '03/09/2013 15:16'),	<--- keep
(95, 'IN',   '02/22/2013 06:00'),	<--- keep
(95, 'OUT',  '02/22/2013 17:06'),	<--- throw away
(95, 'OUT',  '02/22/2013 17:07'),	<--- keep
(95, 'IN',   '02/23/2013 05:38'),	<--- keep
(95, 'OUT',  '02/23/2013 14:59'),	<--- keep
(100, 'IN',  '02/24/2013 05:38'),	<--- keep
(100, 'OUT', '02/25/2013 04:00'),	<--- keep
(100, 'IN',  '02/25/2013 10:00'),	<--- throw away

Open in new window

EMP_ID	PunchInTime			PunchOutTime
94		2013-03-09 07:10:00.000		2013-03-09 15:16
95		2013-02-22 06:00:00.000		2013-02-22 17:07
95		2013-02-23 05:38:00.000		2013-02-23 14:59
100		2013-02-24 05:38:00.000		2013-02-25 04:00

Open in new window

Jim Dettman (EE MVE)

<<Simplifying my issue, I have employee attendance data, including Emp ID, an "IN" or "OUT" record, and the date/time. >>

As a side comment having done more than a few time clock systems, I have found it simpler to have a single record with an in and out date and time rather than having a record for each transaction.

There are too many problems with trying to pair up the transactions afterwards with this type of design. It becomes especially confusing when employees forget to punch in or out.

If you are in control of the design, you might want to consider changing it.

Of course the time clock process would need to be updated as well. When an employee identifies themselves, you then need to figure out if they are clocking in (your going to start a new record), or punching out (updating the last record with an end date/time). Sometimes though depending on what your using for that, that might not be possible to do.

Jim.

Rob Rudloff

ASKER

That last solution from _agx_ seems to do the best job, both for the original sample data situations I put forth, and the actual data encountered in production. I am going to tweak it a bit, and run some tests. This is all to re-platform (and fix) our old timecock system.
-- thanks all!
-- Rob