Cluskitt
asked on
T-SQL to group contiguous overlapping dates
This is related to this question:
https://www.experts-exchange.com/questions/27943350/Tricky-T-SQL-query.html
The solution in there works fine for a table that has no overlapping dates. However, there is another table which has overlapping dates, so we either need a different solution or to perfect the one we have.
Basically, I have a table, in this case called tblTrabalho32 (FK_ID_intEmpresa int,FK_ID_intFuncionario int,FK_ID_intTipoTrabalho int,datHoraEntradaTrabalho smalldatetime,datHoraSaida Trabalho smalldatetime).
FK_ID_intEmpresa is part of a composite key, but in this case it will always be the same, so you can ignore it. FK_ID_intFuncionario is the key (the rest of the composite) and FK_ID_intTipoTrabalho is a foreign key which is important for grouping.
The goal is to group contiguous dates (which we can later check for duration of 30 days or more) and to group the same FK_ID_intTipoTrabalho when it's contiguous.
This is a small sample of the data:
Important things to consider:
1- Any contiguous dates from the same FK_ID_intTipoTrabalho have to be grouped in the same line, regardless of different ones overlapping or even being contained in that range.
2- All different FK_ID_intTipoTrabalho have to be listed, even if they overlap or are contained in another range.
3- Dates for the total contiguous period are required as they are the ones that actually define the range, even though they may be composed of more than one FK_ID_intTipoTrabalho.
The examples provided cover some of the more tricky cases as well as some basic ones. If necessary I can provide more, but I preferred not to include too many as that might confuse the issue.
https://www.experts-exchange.com/questions/27943350/Tricky-T-SQL-query.html
The solution in there works fine for a table that has no overlapping dates. However, there is another table which has overlapping dates, so we either need a different solution or to perfect the one we have.
Basically, I have a table, in this case called tblTrabalho32 (FK_ID_intEmpresa int,FK_ID_intFuncionario int,FK_ID_intTipoTrabalho int,datHoraEntradaTrabalho
FK_ID_intEmpresa is part of a composite key, but in this case it will always be the same, so you can ignore it. FK_ID_intFuncionario is the key (the rest of the composite) and FK_ID_intTipoTrabalho is a foreign key which is important for grouping.
The goal is to group contiguous dates (which we can later check for duration of 30 days or more) and to group the same FK_ID_intTipoTrabalho when it's contiguous.
This is a small sample of the data:
INSERT INTO tblTrabalho32 VALUES (32,128,'Doença','20120823','20120907')
INSERT INTO tblTrabalho32 VALUES (32,2376,'Doença','20120913','20120924')
INSERT INTO tblTrabalho32 VALUES (32,2376,'Doença','20120925','20121006')
INSERT INTO tblTrabalho32 VALUES (32,2376,'Parto','20121007','20121015')
INSERT INTO tblTrabalho32 VALUES (32,2376,'Parto','20121016','20121115')
INSERT INTO tblTrabalho32 VALUES (32,2376,'Parto','20121116','20121215')
INSERT INTO tblTrabalho32 VALUES (32,3746,'Assist. Familia','20120206','20120207')
INSERT INTO tblTrabalho32 VALUES (32,3746,'Assist. Familia','20121025','20121027')
INSERT INTO tblTrabalho32 VALUES (32,6136,'Doença','20120101','20120102')
INSERT INTO tblTrabalho32 VALUES (32,6136,'Doença','20120103','20120106')
INSERT INTO tblTrabalho32 VALUES (32,6136,'Doença','20120212','20120213')
INSERT INTO tblTrabalho32 VALUES (32,6136,'Doença','20120612','20120623')
INSERT INTO tblTrabalho32 VALUES (32,6136,'Doença','20120624','20120711')
INSERT INTO tblTrabalho32 VALUES (32,6136,'Doença','20120712','20120810')
INSERT INTO tblTrabalho32 VALUES (32,6136,'Doença','20120811','20120909')
INSERT INTO tblTrabalho32 VALUES (32,6136,'Doença','20120910','20121009')
INSERT INTO tblTrabalho32 VALUES (32,6136,'Doença','20121010','20121108')
INSERT INTO tblTrabalho32 VALUES (32,6745,'Assist. Familia','20120228','20120315')
INSERT INTO tblTrabalho32 VALUES (32,6745,'Assist. Familia','20120317','20120403')
INSERT INTO tblTrabalho32 VALUES (32,6745,'Acid. Trabalho','20120918','20121012')
INSERT INTO tblTrabalho32 VALUES (32,6853,'Doença','20120621','20120719')
INSERT INTO tblTrabalho32 VALUES (32,6853,'Doença','20120720','20120819')
INSERT INTO tblTrabalho32 VALUES (32,6853,'Doença','20120820','20120927')
INSERT INTO tblTrabalho32 VALUES (32,6853,'Parto','20120828','20120915')
INSERT INTO tblTrabalho32 VALUES (32,6853,'Parto','20120916','20121015')
INSERT INTO tblTrabalho32 VALUES (32,6853,'Parto','20121016','20121115')
INSERT INTO tblTrabalho32 VALUES (32,6853,'Parto','20121116','20121215')
INSERT INTO tblTrabalho32 VALUES (32,7331,'Doença','20120801','20120805')
INSERT INTO tblTrabalho32 VALUES (32,7331,'Doença','20120806','20120813')
INSERT INTO tblTrabalho32 VALUES (32,7331,'Doença','20120814','20120823')
INSERT INTO tblTrabalho32 VALUES (32,7331,'Doença','20120824','20120913')
INSERT INTO tblTrabalho32 VALUES (32,7331,'Doença','20120914','20121013')
INSERT INTO tblTrabalho32 VALUES (32,7331,'Doença','20121014','20121212')
INSERT INTO tblTrabalho32 VALUES (32,7331,'Lic. Parental','20121127','20121231')
INSERT INTO tblTrabalho32 VALUES (32,9045,'Doença','20120101','20120104')
INSERT INTO tblTrabalho32 VALUES (32,9045,'Lic. Parental','20120105','20120602')
INSERT INTO tblTrabalho32 VALUES (32,9045,'Parto','20120105','20120131')
INSERT INTO tblTrabalho32 VALUES (32,9045,'Parto','20120105','20120229')
INSERT INTO tblTrabalho32 VALUES (32,9045,'Parto','20120201','20120331')
INSERT INTO tblTrabalho32 VALUES (32,9045,'Parto','20120315','20120416')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Doença','20120110','20120115')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Doença','20120315','20120326')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Doença','20120327','20120423')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Doença','20120511','20120521')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Doença','20120522','20120618')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Doença','20120619','20120718')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Doença','20120719','20120820')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Doença','20120918','20121008')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Doença','20121009','20121107')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Lic. Parental','20121107','20121130')
INSERT INTO tblTrabalho32 VALUES (32,9280,'Lic. Parental','20121201','20121231')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20120101','20120125')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20120101','20120131')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20120201','20120224')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Doença','20120225','20120325')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Doença','20120326','20120424')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20120326','20120424')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20120425','20120524')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20120524','20120623')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20120624','20120723')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20120724','20120822')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20120823','20120921')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20120922','20121021')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20121022','20121120')
INSERT INTO tblTrabalho32 VALUES (32,13053,'Assist. Familia','20121121','20121220')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Doença','20120112','20120122')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Doença','20120123','20120131')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Doença','20120201','20120302')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Doença','20120326','20120406')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Doença','20120407','20120418')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Doença','20120419','20120518')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Doença','20120519','20120617')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Doença','20120618','20120717')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Doença','20120718','20120726')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Doença','20120727','20120825')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Parto','20120914','20121015')
INSERT INTO tblTrabalho32 VALUES (32,13750,'Parto','20121116','20121215')
INSERT INTO tblTrabalho32 VALUES (32,14853,'Doença','20120102','20120113')
INSERT INTO tblTrabalho32 VALUES (32,14853,'Doença','20120114','20120114')
INSERT INTO tblTrabalho32 VALUES (32,14853,'Doença','20120129','20120205')
INSERT INTO tblTrabalho32 VALUES (32,14853,'Doença','20120302','20120326')
INSERT INTO tblTrabalho32 VALUES (32,14853,'Doença','20120327','20120425')
INSERT INTO tblTrabalho32 VALUES (32,14853,'Doença','20120426','20120525')
INSERT INTO tblTrabalho32 VALUES (32,14853,'Assist. Familia','20120505','20120511')
INSERT INTO tblTrabalho32 VALUES (32,14853,'Assist. Familia','20120512','20120531')
INSERT INTO tblTrabalho32 VALUES (32,14853,'Lic. Parental','20120601','20121101')
Expected results are:Emp Num Tipo Ent Sai Dias Periodo DiasTotal
32 128 Doença 23-08-2012 07-09-2012 16 23-08-2012 to 07-09-2012 16
32 2376 Doença 13-09-2012 06-10-2012 24 13-09-2012 to 15-12-2012 94
32 2376 Parto 07-10-2012 15-12-2012 70 13-09-2012 to 15-12-2012 94
32 3746 Assist. Familia 06-02-2012 07-02-2012 2 06-02-2012 to 07-02-2012 2
32 3746 Assist. Familia 25-10-2012 27-10-2012 3 25-10-2012 to 27-10-2012 3
32 6136 Doença 01-01-2012 06-01-2012 6 01-01-2012 to 06-01-2012 6
32 6136 Doença 12-02-2012 13-02-2012 2 12-02-2012 to 13-02-2012 2
32 6136 Doença 12-06-2012 08-11-2012 150 12-06-2012 to 08-11-2012 150
32 6745 Assist. Familia 28-02-2012 15-03-2012 17 28-02-2012 to 15-03-2012 17
32 6745 Assist. Familia 17-03-2012 03-04-2012 18 17-03-2012 to 03-04-2012 18
32 6745 Acid. Trabalho 18-09-2012 12-10-2012 25 18-09-2012 to 12-10-2012 25
32 6853 Doença 21-06-2012 27-09-2012 99 21-06-2012 to 15-12-2012 178
32 6853 Parto 28-08-2012 15-12-2012 110 21-06-2012 to 15-12-2012 178
32 7331 Doença 01-08-2012 12-12-2012 134 01-08-2012 to 31-12-2012 153
32 7331 Lic. Parental 27-11-2012 31-12-2012 35 01-08-2012 to 31-12-2012 153
32 9045 Doença 01-01-2012 04-01-2012 4 01-01-2012 to 02-06-2012 154
32 9045 Parto 05-01-2012 16-04-2012 103 01-01-2012 to 02-06-2012 154
32 9045 Lic. Parental 05-01-2012 02-06-2012 150 01-01-2012 to 02-06-2012 154
32 9280 Doença 10-01-2012 15-01-2012 6 10-01-2012 to 15-01-2012 6
32 9280 Doença 15-03-2012 23-04-2012 40 15-03-2012 to 23-04-2012 40
32 9280 Doença 11-05-2012 20-08-2012 102 11-05-2012 to 20-08-2012 102
32 9280 Doença 18-09-2012 07-11-2012 51 18-09-2012 to 31-12-2012 105
32 9280 Lic. Parental 07-11-2012 31-12-2012 55 18-09-2012 to 31-12-2012 105
32 13053 Assist. Familia 01-01-2012 24-02-2012 55 01-01-2012 to 20-12-2012 355
32 13053 Doença 25-02-2012 24-04-2012 60 01-01-2012 to 20-12-2012 355
32 13053 Assist. Familia 26-03-2012 20-12-2012 270 01-01-2012 to 20-12-2012 355
32 13750 Doença 12-01-2012 02-03-2012 51 12-01-2012 to 02-03-2012 51
32 13750 Doença 26-03-2012 25-08-2012 153 26-03-2012 to 25-08-2012 153
32 13750 Parto 14-09-2012 15-10-2012 32 14-09-2012 to 15-10-2012 32
32 13750 Parto 16-11-2012 15-12-2012 30 16-11-2012 to 15-12-2012 30
32 14853 Doença 02-01-2012 14-01-2012 13 02-01-2012 to 14-01-2012 13
32 14853 Doença 29-01-2012 05-02-2012 8 29-01-2012 to 05-02-2012 8
32 14853 Doença 02-03-2012 25-05-2012 85 02-03-2012 to 01-11-2012 55
32 14853 Assist. Familia 05-05-2012 31-05-2012 27 02-03-2012 to 01-11-2012 245
32 14853 Lic. Parental 01-06-2012 01-11-2012 154 02-03-2012 to 01-11-2012 245
I've used the same output that evolved from the previous question, but it does contain all the important info.Important things to consider:
1- Any contiguous dates from the same FK_ID_intTipoTrabalho have to be grouped in the same line, regardless of different ones overlapping or even being contained in that range.
2- All different FK_ID_intTipoTrabalho have to be listed, even if they overlap or are contained in another range.
3- Dates for the total contiguous period are required as they are the ones that actually define the range, even though they may be composed of more than one FK_ID_intTipoTrabalho.
The examples provided cover some of the more tricky cases as well as some basic ones. If necessary I can provide more, but I preferred not to include too many as that might confuse the issue.
ASKER
Ok, there are two problems with your code:
1- It doesn't check for total period. That is, if you have 20 days of one FK_ID_intTipoTrabalho and another 20 with a different one in contiguous (overlapping or not) dates, none will be selected.
2- It doesn't correctly group all records. I've commented the 30 days line, just so I can get grouped results, and there were results which weren't correctly returned. Namely, for these values:
Not only that, there should have been some column(s) that define(s) the total period. In this case, from 2012-08-01 to 2012-12-31. It is essential that I have the full period returned, as that is the most important information I require.
1- It doesn't check for total period. That is, if you have 20 days of one FK_ID_intTipoTrabalho and another 20 with a different one in contiguous (overlapping or not) dates, none will be selected.
2- It doesn't correctly group all records. I've commented the 30 days line, just so I can get grouped results, and there were results which weren't correctly returned. Namely, for these values:
32 7331 Doença 2012-08-01 00:00:00 2012-08-05 00:00:00
32 7331 Doença 2012-08-06 00:00:00 2012-08-13 00:00:00
32 7331 Doença 2012-08-14 00:00:00 2012-08-23 00:00:00
32 7331 Doença 2012-08-24 00:00:00 2012-09-13 00:00:00
32 7331 Doença 2012-09-14 00:00:00 2012-10-13 00:00:00
32 7331 Doença 2012-10-14 00:00:00 2012-12-12 00:00:00
32 7331 Lic. Parental 2012-11-27 00:00:00 2012-12-31 00:00:00
It returns:32 7331 Doença 2012-08-01 00:00:00 2012-10-13 00:00:00 73
32 7331 Doença 2012-10-14 00:00:00 2012-12-12 00:00:00 59
32 7331 Lic. Parental 2012-11-27 00:00:00 2012-12-31 00:00:00 34
The first two lines should have been grouped into one.Not only that, there should have been some column(s) that define(s) the total period. In this case, from 2012-08-01 to 2012-12-31. It is essential that I have the full period returned, as that is the most important information I require.
It is there. Please check my solution above again. Note also, I did not include the FK_ID_intFuncionario column because you said it is unique, which will cause it not to group rows. However, it is very possible I do not understand your data structure or how you want to group, so just adjust the query accordingly. If you do not understand where to change, please ask specific questions on syntax. The language of the columns makes it harder for me to communicate because I may not be expressing my intent. For example, Ent and Sai columns in my result represent the beginning and ending of the full period of any grouping. If you are trying to do this with detail, then you will need to look at the windowing functions with the OVER analytic clause.
ASKER
Actually, FK_ID_intEmpresa is the one I said is unique. I changed the group to FK_ID_intFuncionario but it took 32mins and it returned way too many cases:
In this case, it should have returned 2 rows. Also, I don't think you're quite understanding what I meant for total period. I will elaborate using this specific case. It has to return something like:
emp num Tipo Ent Sai diasTotal
32 7331 Doença 2012-08-14 00:00:00 2012-12-12 00:00:00 120
32 7331 Doença 2012-08-14 00:00:00 2012-12-12 00:00:00 120
32 7331 Doença 2012-08-01 00:00:00 2012-12-12 00:00:00 133
32 7331 Doença 2012-08-06 00:00:00 2012-12-12 00:00:00 128
32 7331 Doença 2012-08-14 00:00:00 2012-12-12 00:00:00 120
32 7331 Doença 2012-08-24 00:00:00 2012-12-12 00:00:00 110
32 7331 Doença 2012-08-06 00:00:00 2012-12-12 00:00:00 128
32 7331 Doença 2012-08-24 00:00:00 2012-12-12 00:00:00 110
32 7331 Doença 2012-08-06 00:00:00 2012-12-12 00:00:00 128
32 7331 Doença 2012-09-14 00:00:00 2012-12-12 00:00:00 89
32 7331 Doença 2012-08-14 00:00:00 2012-12-12 00:00:00 120
32 7331 Doença 2012-10-14 00:00:00 2012-12-12 00:00:00 59
32 7331 Lic. Parental 2012-11-27 00:00:00 2012-12-31 00:00:00 34
In this case, it should have returned 2 rows. Also, I don't think you're quite understanding what I meant for total period. I will elaborate using this specific case. It has to return something like:
Emp Num Tipo Start End Days TotalStart TotalEnd TotalDays
32 7331 Doença 2012-08-01 2012-12-12 132 2012-08-01 2012-12-31 152
32 7331 Lic. Parental 2012-11-27 2012-12-31 34 2012-08-01 2012-12-31 152
Notice that the total period is the same for both rows, because they are contiguous. That means that the total period has to be the start of the earliest until the end of the latest.
You have to change more than the GROUP BY. Remember PARTITION BY is the same logically as GROUP BY, so replace FK_ID_intEmpresa with FK_ID_intFuncionario in JOINs, PARTITION BYs, and GROUP BYs. In other words, try the following.
The unique value cannot be included in the results unless you want details, which is what the windowing functions would help with. However, from your descriptions above on the number of results, I suspect you do not want that.
;WITH rank_cte AS (
/* Rank rows by entry date for each Tipo. */
SELECT FK_ID_intEmpresa, FK_ID_intFuncionario, FK_ID_intTipoTrabalho, datHoraEntradaTrabalho, datHoraSaidaTrabalho
, ROW_NUMBER() OVER(PARTITION BY FK_ID_intFuncionario, FK_ID_intTipoTrabalho ORDER BY datHoraEntradaTrabalho) RN
FROM @tblTrabalho32 a
)
, recursive_cte AS (
/* Select rows with rank 1 from above. */
SELECT FK_ID_intEmpresa, FK_ID_intFuncionario, FK_ID_intTipoTrabalho, datHoraEntradaTrabalho, datHoraSaidaTrabalho, RN
, datHoraEntradaTrabalho AS groupHoraEntradaTrabalho
FROM rank_cte
WHERE RN = 1
UNION ALL
/* Evaluate rows with with 2+ to determine contiguous dates. */
SELECT nxt.FK_ID_intEmpresa, nxt.FK_ID_intFuncionario, nxt.FK_ID_intTipoTrabalho, nxt.datHoraEntradaTrabalho, nxt.datHoraSaidaTrabalho, nxt.RN
, CASE WHEN nxt.datHoraEntradaTrabalho <= DATEADD(DAY, +1, prv.datHoraSaidaTrabalho) THEN prv.groupHoraEntradaTrabalho ELSE nxt.datHoraEntradaTrabalho END
FROM rank_cte nxt
JOIN recursive_cte prv
ON prv.FK_ID_intFuncionario = nxt.FK_ID_intFuncionario AND prv.FK_ID_intTipoTrabalho = nxt.FK_ID_intTipoTrabalho
AND prv.RN = (nxt.RN - 1)
)
SELECT FK_ID_intFuncionario AS Num
, FK_ID_intTipoTrabalho AS Tipo
, MIN(datHoraEntradaTrabalho) AS Ent
, MAX(datHoraSaidaTrabalho) AS Sai
, DATEDIFF(DAY, MIN(datHoraEntradaTrabalho), MAX(datHoraSaidaTrabalho)) AS diasTotal
FROM recursive_cte
GROUP BY FK_ID_intFuncionario, FK_ID_intTipoTrabalho, groupHoraEntradaTrabalho
HAVING DATEDIFF(DAY, MIN(datHoraEntradaTrabalho), MAX(datHoraSaidaTrabalho)) >= 30
;
The unique value cannot be included in the results unless you want details, which is what the windowing functions would help with. However, from your descriptions above on the number of results, I suspect you do not want that.
ASKER
Actually, I should have been able to change the field on my own.
Anyway, your query returns the results partly right. That is, it groups for each FK_ID_intTipoTrabalho, but it doesn't identify any range that covers more than one FK_ID_intTipoTrabalho. Maybe I wasn't clear enough, but that is what I want with the TotalPeriod. That is, and using the same example again, the total running period for that FK_ID_intFuncionario (7331) is actually 2012-08-01 to 2012-12-31, composed of two different types. However, they're still contiguous.
So, what I need is:
1- Total period of contiguous dates for each FK_ID_intFuncionario (this may be composed of many rows or just one, and may have multiple FK_ID_intTipoTrabalho)
2- List of each period grouped FK_ID_intTipoTrabalho
If that wasn't clear in the description, I'm sorry. But that is what the expected results show, so I thought that would be enough.
Anyway, your query returns the results partly right. That is, it groups for each FK_ID_intTipoTrabalho, but it doesn't identify any range that covers more than one FK_ID_intTipoTrabalho. Maybe I wasn't clear enough, but that is what I want with the TotalPeriod. That is, and using the same example again, the total running period for that FK_ID_intFuncionario (7331) is actually 2012-08-01 to 2012-12-31, composed of two different types. However, they're still contiguous.
So, what I need is:
1- Total period of contiguous dates for each FK_ID_intFuncionario (this may be composed of many rows or just one, and may have multiple FK_ID_intTipoTrabalho)
2- List of each period grouped FK_ID_intTipoTrabalho
If that wasn't clear in the description, I'm sorry. But that is what the expected results show, so I thought that would be enough.
Yes. I probably missed the point early on and this could be much simpler, but we can layer with another CTE and use the windowing functions. Again, this may be much simpler but I do not want to try to rewrite and cause more confusion.
I just hope some of these techniques help you by getting your thoughts going.
Kevin
;WITH rank_cte AS (
/* Rank rows by entry date for each Tipo. */
SELECT FK_ID_intEmpresa, FK_ID_intFuncionario, FK_ID_intTipoTrabalho, datHoraEntradaTrabalho, datHoraSaidaTrabalho
, ROW_NUMBER() OVER(PARTITION BY FK_ID_intFuncionario, FK_ID_intTipoTrabalho ORDER BY datHoraEntradaTrabalho) RN
FROM @tblTrabalho32 a
)
, recursive_cte AS (
/* Select rows with rank 1 from above. */
SELECT FK_ID_intEmpresa, FK_ID_intFuncionario, FK_ID_intTipoTrabalho, datHoraEntradaTrabalho, datHoraSaidaTrabalho, RN
, datHoraEntradaTrabalho AS groupHoraEntradaTrabalho
FROM rank_cte
WHERE RN = 1
UNION ALL
/* Evaluate rows with with 2+ to determine contiguous dates. */
SELECT nxt.FK_ID_intEmpresa, nxt.FK_ID_intFuncionario, nxt.FK_ID_intTipoTrabalho, nxt.datHoraEntradaTrabalho, nxt.datHoraSaidaTrabalho, nxt.RN
, CASE WHEN nxt.datHoraEntradaTrabalho <= DATEADD(DAY, +1, prv.datHoraSaidaTrabalho) THEN prv.groupHoraEntradaTrabalho ELSE nxt.datHoraEntradaTrabalho END
FROM rank_cte nxt
JOIN recursive_cte prv
ON prv.FK_ID_intFuncionario = nxt.FK_ID_intFuncionario AND prv.FK_ID_intTipoTrabalho = nxt.FK_ID_intTipoTrabalho
AND prv.RN = (nxt.RN - 1)
)
, aggregate_cte AS (
SELECT FK_ID_intFuncionario AS Num
, FK_ID_intTipoTrabalho AS Tipo
, MIN(datHoraEntradaTrabalho) AS Ent
, MAX(datHoraSaidaTrabalho) AS Sai
, DATEDIFF(DAY, MIN(datHoraEntradaTrabalho), MAX(datHoraSaidaTrabalho)) AS Dias
FROM recursive_cte
GROUP BY FK_ID_intFuncionario, FK_ID_intTipoTrabalho, groupHoraEntradaTrabalho
)
SELECT *
, CONVERT(CHAR(10), MIN(Ent) OVER(PARTITION BY Num), 105) + ' to ' + CONVERT(CHAR(10), MAX(Sai) OVER(PARTITION BY Num), 105) AS Periodo
,DATEDIFF(DAY, MIN(Ent) OVER(PARTITION BY Num), MAX(Sai) OVER(PARTITION BY Num)) AS DiasTotal
FROM aggregate_cte
;
I just hope some of these techniques help you by getting your thoughts going.
Kevin
ASKER
I don't mind you rewriting or starting from scratch. It's more important to get it working reliably than me understanding it. Besides, once it's working, I'll have plenty of time to test it later.
It's close to the solution now, but some results aren't correct, like:
It's close to the solution now, but some results aren't correct, like:
Num Tipo Ent Sai Dias Periodo DiasTotal
13750 Doença 2012-01-12 00:00:00 2012-03-02 00:00:00 50 12-01-2012 to 15-12-2012 338
13750 Doença 2012-03-26 00:00:00 2012-08-25 00:00:00 152 12-01-2012 to 15-12-2012 338
13750 Parto 2012-09-14 00:00:00 2012-10-15 00:00:00 31 12-01-2012 to 15-12-2012 338
13750 Parto 2012-11-16 00:00:00 2012-12-15 00:00:00 29 12-01-2012 to 15-12-2012 338
The total period in this case should be the same as each individual period, seeing as they aren't all contiguous.
Ah. I think I understand better now. Sorry it took so long. I will look at this to rewrite, so it doesn't become a big add-on mess.
Here is my current understanding:
Group by Num and Tipo to determine contiguous dates.
Separately, group by Num to determine contiguous dates regardless of Tipo.
Show the results as details of the first grouping but include second on the same line.
Unless I hear otherwise, I will provide an example with the above understanding later.
Here is my current understanding:
Group by Num and Tipo to determine contiguous dates.
Separately, group by Num to determine contiguous dates regardless of Tipo.
Show the results as details of the first grouping but include second on the same line.
Unless I hear otherwise, I will provide an example with the above understanding later.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Very nice, Mark! Thank you for chiming in. I had some family commitments last night and again today, so I will not be online as planned; therefore, it is good to have another perspective Cluskitt.
ASKER
I won't have a chance to check on anything until monday, but thanks for the effort so far.
@cluskitt, No Problems... I can also go through the examples from the other thread.
@mwvisa1, Very good to see you here. I had a slight advantage because I was involved in the original question, and it took a while to digest. There must be better SQL out there, but kinda locked into an approach... Seems to work OK so might play with it (and if you want to please help tune it). Trust family and new bub are all doing fine :)
@mwvisa1, Very good to see you here. I had a slight advantage because I was involved in the original question, and it took a while to digest. There must be better SQL out there, but kinda locked into an approach... Seems to work OK so might play with it (and if you want to please help tune it). Trust family and new bub are all doing fine :)
ASKER
Ok, I've tested that query. On the table in question, it works fine. All results from the sample above work as expected and I have no reason to suspect that any might produce any unwanted results.
However, when I tested it on the original table from the original question, without any filters, I get this:
(20478 row(s) affected)
Msg 530, Level 16, State 1, Line 54
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
Is there a limit to how many rows the temp table can have? This might be important in the future. Right now, with filters on (most data on the original table is irrelevant to this, so I exclude it in the first select) it works fine. But there will come a time when there are many more records. So this will be an issue at some point in time. I'd rather have it dealt with now.
BTW, without filtering records, there are many overlapping records in the original table. Some are even exactly the same, except for Tipo.
However, when I tested it on the original table from the original question, without any filters, I get this:
(20478 row(s) affected)
Msg 530, Level 16, State 1, Line 54
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
Is there a limit to how many rows the temp table can have? This might be important in the future. Right now, with filters on (most data on the original table is irrelevant to this, so I exclude it in the first select) it works fine. But there will come a time when there are many more records. So this will be an issue at some point in time. I'd rather have it dealt with now.
BTW, without filtering records, there are many overlapping records in the original table. Some are even exactly the same, except for Tipo.
It isnt the temp table - it can have as many as you want...
The recursion (by default) is 100 - basically means there are another 100 rows in the sample set for that empresa + funcionario combo (in the first temp table) and/or for that empresa + funcionario + tipo combo in the second one.
Now, if it is possible to have more than 100, then have to use the query hint
OPTION (MAXRECURSION 0)
when doing the select from the CTE
In the fist temp table, we would not expect overlapping rows. In the second one, we would expect overlapping rows because we are looking for all instances of tipo.
Now, we probably should check (the same way as we do in the first one except we add tipo as the third key to check) because we are still only interested in contiguous dates.
That would also limit the number of rows in the CTE and would also limit the maxrecursion.
The recursion (by default) is 100 - basically means there are another 100 rows in the sample set for that empresa + funcionario combo (in the first temp table) and/or for that empresa + funcionario + tipo combo in the second one.
Now, if it is possible to have more than 100, then have to use the query hint
OPTION (MAXRECURSION 0)
when doing the select from the CTE
In the fist temp table, we would not expect overlapping rows. In the second one, we would expect overlapping rows because we are looking for all instances of tipo.
Now, we probably should check (the same way as we do in the first one except we add tipo as the third key to check) because we are still only interested in contiguous dates.
That would also limit the number of rows in the CTE and would also limit the maxrecursion.
ASKER
Using OPTION on both selects, it took 4:19 to return 14k rows. It could probably use some indexes. On the other hand, this is sort of an extreme example. There are many records here which aren't used for this purpose, but which I haven't filtered out so I could test an ultimate case scenario.
Other than that, it seems to work fine. The most complicated cases are returning correct values, the filtered select (which returns non-overlapping records) also works fine, so it seems that this question can be closed. I'm going to wait a while to see if there are any tweaks you guys want to make, though, just to be thorough.
Other than that, it seems to work fine. The most complicated cases are returning correct values, the filtered select (which returns non-overlapping records) also works fine, so it seems that this question can be closed. I'm going to wait a while to see if there are any tweaks you guys want to make, though, just to be thorough.
ASKER
OK, I found a strange result:
32 4339 Parto 2012-01-01 00:00:00 2012-03-31 00:00:00
32 4339 Parto 2012-01-01 00:00:00 2012-01-31 00:00:00
32 4339 Parto 2012-02-01 00:00:00 2012-02-29 00:00:00
returns:32 4339 Parto 01-01-2012 29-02-2012 60 De 01-01-2012 a 31-03-2012 91
The "inner" range isn't correct with the outer one.
ASKER
There are a few more cases. It seems that the total period is always correct, but not the grouped range. This also happens with non-overlapping records. One has two rows, from 2012-01-01 to 2012-01-15 and 2012-01-16 to 2012-02-01 and it's returning only the last one as the grouped "inner" range.
ASKER
I have adapted your code slightly to fit in with my requirements. I don't think there was any mistake when making changes (most was indenting and casing to my liking), but I'll post the current one:
USE [RHAjustes]
GO
/****** Object: StoredProcedure [dbo].[usp_UserMail] Script Date: 11/29/2012 11:46:37 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[usp_Baixas32]
@iAno char(4),
@iDias int
AS
DECLARE @Emp int
SET @Emp=32;
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Dates','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Dates
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Tipo','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Tipo;
-- extract rows to a working table
-- first we really just need contiguous dates by empresa and funcionario
SELECT ROW_NUMBER() OVER (PARTITION BY FK_ID_intEmpresa,FK_ID_intFuncionario
ORDER BY CASE WHEN YEAR(datHoraSaidaTrabalho)>@iAno
THEN CAST(@iAno+'1231' AS smalldatetime) ELSE datHoraSaidaTrabalho END DESC,
CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime) ELSE datHoraEntradaTrabalho END DESC) DRN,
ROW_NUMBER() OVER (PARTITION BY FK_ID_intEmpresa,FK_ID_intFuncionario
ORDER BY CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime) ELSE datHoraEntradaTrabalho END ASC,
CASE WHEN YEAR(datHoraSaidaTrabalho)>@iAno
THEN CAST(@iAno+'1231' AS smalldatetime) ELSE datHoraSaidaTrabalho END ASC) ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime)
ELSE datHoraEntradaTrabalho
END datHoraEntradaTrabalho,
CASE WHEN YEAR(datHoraSaidaTrabalho)>@iAno
THEN CAST(@iAno+'1231' AS smalldatetime)
ELSE datHoraSaidaTrabalho
END datHoraSaidaTrabalho,
CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime)
ELSE datHoraEntradaTrabalho
END PeriodStart,
CASE WHEN YEAR(datHoraSaidaTrabalho)>@iAno
THEN CAST(@iAno+'1231' AS smalldatetime)
ELSE datHoraSaidaTrabalho
END PeriodEnd
INTO #tmp_tblTrabalho_Dates
FROM tblTrabalho32 T
WHERE FK_ID_intEmpresa=@Emp
AND (YEAR(datHoraEntradaTrabalho)=@iAno
OR YEAR(datHoraSaidaTrabalho)=@iAno)
AND NOT EXISTS (SELECT NULL -- exclude wholly contained dupe periods
FROM tblTrabalho32 T1
WHERE T.FK_ID_intEmpresa=T1.FK_ID_intEmpresa
AND T.FK_ID_intFuncionario=T1.FK_ID_intFuncionario
AND ((T.datHoraEntradaTrabalho>=T1.datHoraEntradaTrabalho
AND T.datHoraSaidaTrabalho<T1.datHoraSaidaTrabalho)
OR (T.datHoraEntradaTrabalho>T1.datHoraEntradaTrabalho
AND T.datHoraSaidaTrabalho<=T1.datHoraSaidaTrabalho)));
-- now traverse the tmp table to get "last dates" by DRN seq, then secondpart is to get "first dates" by ARn seq
WITH CTE_Period_Ends AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
datHoraSaidaTrabalho NextEnd,
datHoraEntradaTrabalho PriorStart
FROM #tmp_tblTrabalho_Dates
WHERE DRN = 1
UNION ALL
SELECT T.DRN,
T.ARN,
T.FK_ID_intEmpresa,
T.FK_ID_intFuncionario,
T.datHoraEntradaTrabalho,
T.datHoraSaidaTrabalho,
CASE WHEN DATEDIFF(day,T.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN E.NextEnd ELSE T.datHoraSaidaTrabalho END,
T.datHoraEntradaTrabalho
FROM CTE_Period_Ends E
INNER JOIN #tmp_tblTrabalho_Dates T
ON T.FK_ID_intEmpresa=E.FK_ID_intEmpresa
AND T.FK_ID_intFuncionario=E.FK_ID_intFuncionario
AND T.DRN=E.DRN+1),
CTE_Period_Commence AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
NextEnd,
PriorStart
FROM CTE_Period_Ends
WHERE ARN=1
UNION ALL
SELECT E.DRN,
E.ARN,
E.FK_ID_intEmpresa,
E.FK_ID_intFuncionario,
E.datHoraEntradaTrabalho,
E.datHoraSaidaTrabalho,
E.NextEnd,
CASE WHEN DATEDIFF(day,c.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN C.PriorStart ELSE E.datHoraEntradaTrabalho END
FROM CTE_Period_Commence C
INNER JOIN CTE_Period_Ends E
ON E.FK_ID_intEmpresa=C.FK_ID_intEmpresa
AND E.FK_ID_intFuncionario=C.FK_ID_intFuncionario
AND E.ARN=C.ARN+1),
CTE_Distinct_Periods AS (
SELECT DISTINCT FK_ID_intEmpresa,
FK_ID_intFuncionario,
PriorStart,
NextEnd
FROM CTE_Period_Commence
WHERE DATEDIFF(day,PriorStart,NextEnd)+1>=@iDias)
-- now we should have discrete distinct periods and have to find the detail rows by TipoTrabalho
-- then essentially go through the same process as above but at the deeper Tipo level.
SELECT ROW_NUMBER() OVER (PARTITION BY T1.FK_ID_intEmpresa,T1.FK_ID_intFuncionario,T1.FK_ID_intTipoTrabalho
ORDER BY datHoraSaidaTrabalho DESC,datHoraEntradaTrabalho DESC) DRN,
ROW_NUMBER() OVER (PARTITION BY T1.FK_ID_intEmpresa,T1.FK_ID_intFuncionario,T1.FK_ID_intTipoTrabalho
ORDER BY datHoraEntradaTrabalho ASC,datHoraSaidaTrabalho ASC) ARN,
T1.FK_ID_intEmpresa,
T1.FK_ID_intFuncionario,
T1.FK_ID_intTipoTrabalho,
T1.datHoraEntradaTrabalho,
T1.datHoraSaidaTrabalho,
T.PriorStart PeriodStart,
T.NextEnd PeriodEnd
INTO #tmp_tblTrabalho_Tipo
FROM tblTrabalho32 T1
INNER JOIN CTE_Distinct_Periods T
ON T1.FK_ID_intEmpresa=T.FK_ID_intEmpresa
AND T1.FK_ID_intFuncionario=T.FK_ID_intFuncionario
AND T1.datHoraEntradaTrabalho BETWEEN T.PriorStart AND T.NextEnd
AND T1.datHoraSaidaTrabalho BETWEEN T.PriorStart AND T.NextEnd;
WITH CTE_Period_Ends AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
datHoraSaidaTrabalho NextEnd,
datHoraEntradaTrabalho PriorStart,
PeriodStart,
PeriodEnd
FROM #tmp_tblTrabalho_Tipo
WHERE DRN=1
UNION ALL
SELECT T.DRN,
T.ARN,
T.FK_ID_intEmpresa,
T.FK_ID_intFuncionario,
T.FK_ID_intTipoTrabalho,
T.datHoraEntradaTrabalho,
T.datHoraSaidaTrabalho,
CASE WHEN DATEDIFF(day,T.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN E.NextEnd ELSE T.datHoraSaidaTrabalho END,
T.datHoraEntradaTrabalho,
T.PeriodStart,
T.PeriodEnd
FROM CTE_Period_Ends E
INNER JOIN #tmp_tblTrabalho_Tipo T
ON T.FK_ID_intEmpresa=E.FK_ID_intEmpresa
AND T.FK_ID_intFuncionario=E.FK_ID_intFuncionario
AND T.FK_ID_intTipoTrabalho=E.FK_ID_intTipoTrabalho
AND T.DRN=E.DRN+1),
CTE_Period_Commence AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
NextEnd,
PriorStart,
PeriodStart,
PeriodEnd
FROM CTE_Period_Ends
WHERE ARN=1
UNION ALL
SELECT E.DRN,
E.ARN,
E.FK_ID_intEmpresa,
E.FK_ID_intFuncionario,
E.FK_ID_intTipoTrabalho,
E.datHoraEntradaTrabalho,
E.datHoraSaidaTrabalho,
E.NextEnd,
CASE WHEN DATEDIFF(day,c.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN C.PriorStart ELSE E.datHoraEntradaTrabalho END,
E.PeriodStart,
E.PeriodEnd
FROM CTE_Period_Commence C
INNER JOIN CTE_Period_Ends E
ON E.FK_ID_intEmpresa=C.FK_ID_intEmpresa
AND E.FK_ID_intFuncionario=C.FK_ID_intFuncionario
AND E.FK_ID_intTipoTrabalho=C.FK_ID_intTipoTrabalho
AND E.ARN=C.ARN+1)
SELECT FK_ID_intFuncionario Num,
REPLACE(REPLACE(Regime,'F','FT'),'X','F') Reg,
FK_ID_intTipoTrabalho Tipo,
CONVERT(char(10),MIN(datHoraEntradaTrabalho),105) Ent,
CONVERT(char(10),MAX(datHoraSaidaTrabalho),105) Sai,
DATEDIFF(day,MIN(datHoraEntradaTrabalho),MAX(datHoraSaidaTrabalho))+1 Dias,
'De '+CONVERT(char(10),PeriodStart,105)+' a '+CONVERT(char(10),PeriodEnd,105) Periodo,
DATEDIFF(day,PeriodStart,periodEnd)+1 TotalDias
FROM CTE_Period_Commence
INNER JOIN PortalSGP..uv_Cadastro2
ON FK_ID_intEmpresa=LEFT(psa_emp_empresa,2)
AND FK_ID_intFuncionario=psa_num_empregado
GROUP BY FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
Regime,
PriorStart,
NextEnd,
PeriodStart,
PeriodEnd
ORDER BY FK_ID_intFuncionario,
Regime,
MIN(datHoraEntradaTrabalho),
MAX(datHoraSaidaTrabalho),
FK_ID_intTipoTrabalho;
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Dates','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Dates
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Tipo','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Tipo;
It should be noted that, even though the first select has all those cases in them, all the data I posted has already taken this into consideration. The data we've been working on is the actual data returned from the query, not the one it's on the table (lots of irrelevant records from earlier years).
ASKER
Nevermind, fixed it. I had forgotten to do a similar thing to the second select. I couldn't get a case working on the actual select, so I just changed the table into a subquery:
USE [RHAjustes]
GO
/****** Object: StoredProcedure [dbo].[usp_UserMail] Script Date: 11/29/2012 11:46:37 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[usp_Baixas32]
@iAno char(4),
@iDias int
AS
DECLARE @Emp int
SET @Emp=32;
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Dates','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Dates
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Tipo','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Tipo;
-- extract rows to a working table
-- first we really just need contiguous dates by empresa and funcionario
SELECT ROW_NUMBER() OVER (PARTITION BY FK_ID_intEmpresa,FK_ID_intFuncionario
ORDER BY CASE WHEN YEAR(datHoraSaidaTrabalho)>@iAno
THEN CAST(@iAno+'1231' AS smalldatetime) ELSE datHoraSaidaTrabalho END DESC,
CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime) ELSE datHoraEntradaTrabalho END DESC) DRN,
ROW_NUMBER() OVER (PARTITION BY FK_ID_intEmpresa,FK_ID_intFuncionario
ORDER BY CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime) ELSE datHoraEntradaTrabalho END ASC,
CASE WHEN YEAR(datHoraSaidaTrabalho)>@iAno
THEN CAST(@iAno+'1231' AS smalldatetime) ELSE datHoraSaidaTrabalho END ASC) ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime)
ELSE datHoraEntradaTrabalho
END datHoraEntradaTrabalho,
CASE WHEN YEAR(datHoraSaidaTrabalho)>@iAno
THEN CAST(@iAno+'1231' AS smalldatetime)
ELSE datHoraSaidaTrabalho
END datHoraSaidaTrabalho,
CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime)
ELSE datHoraEntradaTrabalho
END PeriodStart,
CASE WHEN YEAR(datHoraSaidaTrabalho)>@iAno
THEN CAST(@iAno+'1231' AS smalldatetime)
ELSE datHoraSaidaTrabalho
END PeriodEnd
INTO #tmp_tblTrabalho_Dates
FROM tblTrabalho32 T
WHERE FK_ID_intEmpresa=@Emp
AND (YEAR(datHoraEntradaTrabalho)=@iAno
OR YEAR(datHoraSaidaTrabalho)=@iAno)
AND NOT EXISTS (SELECT NULL -- exclude wholly contained dupe periods
FROM tblTrabalho32 T1
WHERE T.FK_ID_intEmpresa=T1.FK_ID_intEmpresa
AND T.FK_ID_intFuncionario=T1.FK_ID_intFuncionario
AND ((T.datHoraEntradaTrabalho>=T1.datHoraEntradaTrabalho
AND T.datHoraSaidaTrabalho<T1.datHoraSaidaTrabalho)
OR (T.datHoraEntradaTrabalho>T1.datHoraEntradaTrabalho
AND T.datHoraSaidaTrabalho<=T1.datHoraSaidaTrabalho)));
-- now traverse the tmp table to get "last dates" by DRN seq, then secondpart is to get "first dates" by ARn seq
WITH CTE_Period_Ends AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
datHoraSaidaTrabalho NextEnd,
datHoraEntradaTrabalho PriorStart
FROM #tmp_tblTrabalho_Dates
WHERE DRN = 1
UNION ALL
SELECT T.DRN,
T.ARN,
T.FK_ID_intEmpresa,
T.FK_ID_intFuncionario,
T.datHoraEntradaTrabalho,
T.datHoraSaidaTrabalho,
CASE WHEN DATEDIFF(day,T.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN E.NextEnd ELSE T.datHoraSaidaTrabalho END,
T.datHoraEntradaTrabalho
FROM CTE_Period_Ends E
INNER JOIN #tmp_tblTrabalho_Dates T
ON T.FK_ID_intEmpresa=E.FK_ID_intEmpresa
AND T.FK_ID_intFuncionario=E.FK_ID_intFuncionario
AND T.DRN=E.DRN+1),
CTE_Period_Commence AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
NextEnd,
PriorStart
FROM CTE_Period_Ends
WHERE ARN=1
UNION ALL
SELECT E.DRN,
E.ARN,
E.FK_ID_intEmpresa,
E.FK_ID_intFuncionario,
E.datHoraEntradaTrabalho,
E.datHoraSaidaTrabalho,
E.NextEnd,
CASE WHEN DATEDIFF(day,c.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN C.PriorStart ELSE E.datHoraEntradaTrabalho END
FROM CTE_Period_Commence C
INNER JOIN CTE_Period_Ends E
ON E.FK_ID_intEmpresa=C.FK_ID_intEmpresa
AND E.FK_ID_intFuncionario=C.FK_ID_intFuncionario
AND E.ARN=C.ARN+1),
CTE_Distinct_Periods AS (
SELECT DISTINCT FK_ID_intEmpresa,
FK_ID_intFuncionario,
PriorStart,
NextEnd
FROM CTE_Period_Commence
WHERE DATEDIFF(day,PriorStart,NextEnd)+1>=@iDias)
-- now we should have discrete distinct periods and have to find the detail rows by TipoTrabalho
-- then essentially go through the same process as above but at the deeper Tipo level.
SELECT ROW_NUMBER() OVER (PARTITION BY T1.FK_ID_intEmpresa,T1.FK_ID_intFuncionario,T1.FK_ID_intTipoTrabalho
ORDER BY datHoraSaidaTrabalho DESC,datHoraEntradaTrabalho DESC) DRN,
ROW_NUMBER() OVER (PARTITION BY T1.FK_ID_intEmpresa,T1.FK_ID_intFuncionario,T1.FK_ID_intTipoTrabalho
ORDER BY datHoraEntradaTrabalho ASC,datHoraSaidaTrabalho ASC) ARN,
T1.FK_ID_intEmpresa,
T1.FK_ID_intFuncionario,
T1.FK_ID_intTipoTrabalho,
T1.datHoraEntradaTrabalho,
T1.datHoraSaidaTrabalho,
T.PriorStart PeriodStart,
T.NextEnd PeriodEnd
INTO #tmp_tblTrabalho_Tipo
FROM (SELECT FK_ID_intEmpresa,
FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime)
ELSE datHoraEntradaTrabalho
END datHoraEntradaTrabalho,
CASE WHEN YEAR(datHoraSaidaTrabalho)<@iAno
THEN CAST(@iAno+'1231' AS smalldatetime)
ELSE datHoraSaidaTrabalho
END datHoraSaidaTrabalho
FROM tblTrabalho32) T1
INNER JOIN CTE_Distinct_Periods T
ON T1.FK_ID_intEmpresa=T.FK_ID_intEmpresa
AND T1.FK_ID_intFuncionario=T.FK_ID_intFuncionario
AND T1.datHoraEntradaTrabalho BETWEEN T.PriorStart AND T.NextEnd
AND T1.datHoraSaidaTrabalho BETWEEN T.PriorStart AND T.NextEnd;
WITH CTE_Period_Ends AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
datHoraSaidaTrabalho NextEnd,
datHoraEntradaTrabalho PriorStart,
PeriodStart,
PeriodEnd
FROM #tmp_tblTrabalho_Tipo
WHERE DRN=1
UNION ALL
SELECT T.DRN,
T.ARN,
T.FK_ID_intEmpresa,
T.FK_ID_intFuncionario,
T.FK_ID_intTipoTrabalho,
T.datHoraEntradaTrabalho,
T.datHoraSaidaTrabalho,
CASE WHEN DATEDIFF(day,T.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN E.NextEnd ELSE T.datHoraSaidaTrabalho END,
T.datHoraEntradaTrabalho,
T.PeriodStart,
T.PeriodEnd
FROM CTE_Period_Ends E
INNER JOIN #tmp_tblTrabalho_Tipo T
ON T.FK_ID_intEmpresa=E.FK_ID_intEmpresa
AND T.FK_ID_intFuncionario=E.FK_ID_intFuncionario
AND T.FK_ID_intTipoTrabalho=E.FK_ID_intTipoTrabalho
AND T.DRN=E.DRN+1),
CTE_Period_Commence AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
NextEnd,
PriorStart,
PeriodStart,
PeriodEnd
FROM CTE_Period_Ends
WHERE ARN=1
UNION ALL
SELECT E.DRN,
E.ARN,
E.FK_ID_intEmpresa,
E.FK_ID_intFuncionario,
E.FK_ID_intTipoTrabalho,
E.datHoraEntradaTrabalho,
E.datHoraSaidaTrabalho,
E.NextEnd,
CASE WHEN DATEDIFF(day,c.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN C.PriorStart ELSE E.datHoraEntradaTrabalho END,
E.PeriodStart,
E.PeriodEnd
FROM CTE_Period_Commence C
INNER JOIN CTE_Period_Ends E
ON E.FK_ID_intEmpresa=C.FK_ID_intEmpresa
AND E.FK_ID_intFuncionario=C.FK_ID_intFuncionario
AND E.FK_ID_intTipoTrabalho=C.FK_ID_intTipoTrabalho
AND E.ARN=C.ARN+1)
SELECT FK_ID_intFuncionario Num,
REPLACE(REPLACE(Regime,'F','FT'),'X','F') Reg,
FK_ID_intTipoTrabalho Tipo,
CONVERT(char(10),MIN(datHoraEntradaTrabalho),105) Ent,
CONVERT(char(10),MAX(datHoraSaidaTrabalho),105) Sai,
DATEDIFF(day,MIN(datHoraEntradaTrabalho),MAX(datHoraSaidaTrabalho))+1 Dias,
'De '+CONVERT(char(10),PeriodStart,105)+' a '+CONVERT(char(10),PeriodEnd,105) Periodo,
DATEDIFF(day,PeriodStart,periodEnd)+1 TotalDias
FROM CTE_Period_Commence
INNER JOIN PortalSGP..uv_Cadastro2
ON FK_ID_intEmpresa=LEFT(psa_emp_empresa,2)
AND FK_ID_intFuncionario=psa_num_empregado
GROUP BY FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
Regime,
PriorStart,
NextEnd,
PeriodStart,
PeriodEnd
ORDER BY FK_ID_intFuncionario,
Regime,
MIN(datHoraEntradaTrabalho),
MAX(datHoraSaidaTrabalho),
FK_ID_intTipoTrabalho;
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Dates','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Dates
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Tipo','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Tipo;
Everything seems to be working fine.
ASKER
Sorry for spamming so much, but there was still a problem with the query as it was. For some reason, some end inner dates were being omitted. I'm not sure why, so I just placed the table inside a cte (had to place it twice) and it seems fine now. But I'd appreciate if someone would check this final query to make sure I'm not missing anything.
USE [RHAjustes]
GO
/****** Object: StoredProcedure [dbo].[usp_UserMail] Script Date: 11/29/2012 11:46:37 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[usp_Baixas32]
@iAno char(4),
@iDias int
AS
DECLARE @Emp int
SET @Emp=32;
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Dates','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Dates
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Tipo','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Tipo;
-- extract rows to a working table
-- first we really just need contiguous dates by empresa and funcionario
WITH tblTrab AS (
SELECT FK_ID_intEmpresa,
FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime)
ELSE datHoraEntradaTrabalho
END datHoraEntradaTrabalho,
CASE WHEN YEAR(datHoraSaidaTrabalho)>@iAno
THEN CAST(@iAno+'1231' AS smalldatetime)
ELSE datHoraSaidaTrabalho
END datHoraSaidaTrabalho
FROM tblTrabalho32)
SELECT ROW_NUMBER() OVER (PARTITION BY FK_ID_intEmpresa,FK_ID_intFuncionario
ORDER BY datHoraSaidaTrabalho DESC,datHoraEntradaTrabalho DESC) DRN,
ROW_NUMBER() OVER (PARTITION BY FK_ID_intEmpresa,FK_ID_intFuncionario
ORDER BY datHoraEntradaTrabalho ASC,datHoraSaidaTrabalho ASC) ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
datHoraEntradaTrabalho PeriodStart,
datHoraSaidaTrabalho PeriodEnd
INTO #tmp_tblTrabalho_Dates
FROM tblTrab T
WHERE FK_ID_intEmpresa=@Emp
AND (YEAR(datHoraEntradaTrabalho)=@iAno
OR YEAR(datHoraSaidaTrabalho)=@iAno)
AND NOT EXISTS (SELECT NULL -- exclude wholly contained dupe periods
FROM tblTrab T1
WHERE T.FK_ID_intEmpresa=T1.FK_ID_intEmpresa
AND T.FK_ID_intFuncionario=T1.FK_ID_intFuncionario
AND ((T.datHoraEntradaTrabalho>=T1.datHoraEntradaTrabalho
AND T.datHoraSaidaTrabalho<T1.datHoraSaidaTrabalho)
OR (T.datHoraEntradaTrabalho>T1.datHoraEntradaTrabalho
AND T.datHoraSaidaTrabalho<=T1.datHoraSaidaTrabalho)));
-- now traverse the tmp table to get "last dates" by DRN seq, then secondpart is to get "first dates" by ARn seq
WITH CTE_Period_Ends AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
datHoraSaidaTrabalho NextEnd,
datHoraEntradaTrabalho PriorStart
FROM #tmp_tblTrabalho_Dates
WHERE DRN = 1
UNION ALL
SELECT T.DRN,
T.ARN,
T.FK_ID_intEmpresa,
T.FK_ID_intFuncionario,
T.datHoraEntradaTrabalho,
T.datHoraSaidaTrabalho,
CASE WHEN DATEDIFF(day,T.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN E.NextEnd ELSE T.datHoraSaidaTrabalho END,
T.datHoraEntradaTrabalho
FROM CTE_Period_Ends E
INNER JOIN #tmp_tblTrabalho_Dates T
ON T.FK_ID_intEmpresa=E.FK_ID_intEmpresa
AND T.FK_ID_intFuncionario=E.FK_ID_intFuncionario
AND T.DRN=E.DRN+1),
CTE_Period_Commence AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
NextEnd,
PriorStart
FROM CTE_Period_Ends
WHERE ARN=1
UNION ALL
SELECT E.DRN,
E.ARN,
E.FK_ID_intEmpresa,
E.FK_ID_intFuncionario,
E.datHoraEntradaTrabalho,
E.datHoraSaidaTrabalho,
E.NextEnd,
CASE WHEN DATEDIFF(day,c.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN C.PriorStart ELSE E.datHoraEntradaTrabalho END
FROM CTE_Period_Commence C
INNER JOIN CTE_Period_Ends E
ON E.FK_ID_intEmpresa=C.FK_ID_intEmpresa
AND E.FK_ID_intFuncionario=C.FK_ID_intFuncionario
AND E.ARN=C.ARN+1),
CTE_Distinct_Periods AS (
SELECT DISTINCT FK_ID_intEmpresa,
FK_ID_intFuncionario,
PriorStart,
NextEnd
FROM CTE_Period_Commence
WHERE DATEDIFF(day,PriorStart,NextEnd)+1>=@iDias),
tblTrab AS (
SELECT FK_ID_intEmpresa,
FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
CASE WHEN YEAR(datHoraEntradaTrabalho)<@iAno
THEN CAST(@iAno+'0101' AS smalldatetime)
ELSE datHoraEntradaTrabalho
END datHoraEntradaTrabalho,
CASE WHEN YEAR(datHoraSaidaTrabalho)>@iAno
THEN CAST(@iAno+'1231' AS smalldatetime)
ELSE datHoraSaidaTrabalho
END datHoraSaidaTrabalho
FROM tblTrabalho32)
-- now we should have discrete distinct periods and have to find the detail rows by TipoTrabalho
-- then essentially go through the same process as above but at the deeper Tipo level.
SELECT ROW_NUMBER() OVER (PARTITION BY T1.FK_ID_intEmpresa,T1.FK_ID_intFuncionario,T1.FK_ID_intTipoTrabalho
ORDER BY datHoraSaidaTrabalho DESC,datHoraEntradaTrabalho DESC) DRN,
ROW_NUMBER() OVER (PARTITION BY T1.FK_ID_intEmpresa,T1.FK_ID_intFuncionario,T1.FK_ID_intTipoTrabalho
ORDER BY datHoraEntradaTrabalho ASC,datHoraSaidaTrabalho ASC) ARN,
T1.FK_ID_intEmpresa,
T1.FK_ID_intFuncionario,
T1.FK_ID_intTipoTrabalho,
T1.datHoraEntradaTrabalho,
T1.datHoraSaidaTrabalho,
T.PriorStart PeriodStart,
T.NextEnd PeriodEnd
INTO #tmp_tblTrabalho_Tipo
FROM tblTrab T1
INNER JOIN CTE_Distinct_Periods T
ON T1.FK_ID_intEmpresa=T.FK_ID_intEmpresa
AND T1.FK_ID_intFuncionario=T.FK_ID_intFuncionario
AND T1.datHoraEntradaTrabalho BETWEEN T.PriorStart AND T.NextEnd
AND T1.datHoraSaidaTrabalho BETWEEN T.PriorStart AND T.NextEnd;
WITH CTE_Period_Ends AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
datHoraSaidaTrabalho NextEnd,
datHoraEntradaTrabalho PriorStart,
PeriodStart,
PeriodEnd
FROM #tmp_tblTrabalho_Tipo
WHERE DRN=1
UNION ALL
SELECT T.DRN,
T.ARN,
T.FK_ID_intEmpresa,
T.FK_ID_intFuncionario,
T.FK_ID_intTipoTrabalho,
T.datHoraEntradaTrabalho,
T.datHoraSaidaTrabalho,
CASE WHEN DATEDIFF(day,T.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN E.NextEnd ELSE T.datHoraSaidaTrabalho END,
T.datHoraEntradaTrabalho,
T.PeriodStart,
T.PeriodEnd
FROM CTE_Period_Ends E
INNER JOIN #tmp_tblTrabalho_Tipo T
ON T.FK_ID_intEmpresa=E.FK_ID_intEmpresa
AND T.FK_ID_intFuncionario=E.FK_ID_intFuncionario
AND T.FK_ID_intTipoTrabalho=E.FK_ID_intTipoTrabalho
AND T.DRN=E.DRN+1),
CTE_Period_Commence AS (
SELECT DRN,
ARN,
FK_ID_intEmpresa,
FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
datHoraEntradaTrabalho,
datHoraSaidaTrabalho,
NextEnd,
PriorStart,
PeriodStart,
PeriodEnd
FROM CTE_Period_Ends
WHERE ARN=1
UNION ALL
SELECT E.DRN,
E.ARN,
E.FK_ID_intEmpresa,
E.FK_ID_intFuncionario,
E.FK_ID_intTipoTrabalho,
E.datHoraEntradaTrabalho,
E.datHoraSaidaTrabalho,
E.NextEnd,
CASE WHEN DATEDIFF(day,c.datHoraSaidaTrabalho,E.datHoraEntradaTrabalho)<2
THEN C.PriorStart ELSE E.datHoraEntradaTrabalho END,
E.PeriodStart,
E.PeriodEnd
FROM CTE_Period_Commence C
INNER JOIN CTE_Period_Ends E
ON E.FK_ID_intEmpresa=C.FK_ID_intEmpresa
AND E.FK_ID_intFuncionario=C.FK_ID_intFuncionario
AND E.FK_ID_intTipoTrabalho=C.FK_ID_intTipoTrabalho
AND E.ARN=C.ARN+1)
SELECT FK_ID_intFuncionario Num,
REPLACE(REPLACE(Regime,'F','FT'),'X','F') Reg,
FK_ID_intTipoTrabalho Tipo,
CONVERT(char(10),MIN(datHoraEntradaTrabalho),105) Ent,
CONVERT(char(10),MAX(datHoraSaidaTrabalho),105) Sai,
DATEDIFF(day,MIN(datHoraEntradaTrabalho),MAX(datHoraSaidaTrabalho))+1 Dias,
'De '+CONVERT(char(10),PeriodStart,105)+' a '+CONVERT(char(10),PeriodEnd,105) Periodo,
DATEDIFF(day,PeriodStart,periodEnd)+1 TotalDias
FROM CTE_Period_Commence
INNER JOIN PortalSGP..uv_Cadastro2
ON FK_ID_intEmpresa=LEFT(psa_emp_empresa,2)
AND FK_ID_intFuncionario=psa_num_empregado
GROUP BY FK_ID_intFuncionario,
FK_ID_intTipoTrabalho,
Regime,
PriorStart,
NextEnd,
PeriodStart,
PeriodEnd
ORDER BY FK_ID_intFuncionario,
Regime,
MIN(datHoraEntradaTrabalho),
MAX(datHoraSaidaTrabalho),
FK_ID_intTipoTrabalho;
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Dates','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Dates
IF OBJECT_ID('tempdb..#tmp_tblTrabalho_Tipo','U') IS NOT NULL DROP TABLE #tmp_tblTrabalho_Tipo;
Glad you got that sorted out, Cluskitt! Nice work, Mark!
Apologies, had not seen the email alerts...
That "inner range" is a wholly contained date period and can be excluded in the second temp table (in a similar way as we do the first one except down to / including tipo). The date range was seen as having a break so the group by would make sure that the three rows were reported, not just the one.
Not sure why the direct select into temp table didnt work - looks as if it should be fine, and can investigate if you like, but, what you have done also looks fine.
Will run up your code against some of the various test I was doing.
And yes, Indexes will help - on temp tables and on the original data source.
Think that covers it...
That "inner range" is a wholly contained date period and can be excluded in the second temp table (in a similar way as we do the first one except down to / including tipo). The date range was seen as having a break so the group by would make sure that the three rows were reported, not just the one.
Not sure why the direct select into temp table didnt work - looks as if it should be fine, and can investigate if you like, but, what you have done also looks fine.
Will run up your code against some of the various test I was doing.
And yes, Indexes will help - on temp tables and on the original data source.
Think that covers it...
Oh, and worked out why the dates would have been excluded...
Because of the row_number() function - needed the "corrected" dates, not the original dates.
Because of the row_number() function - needed the "corrected" dates, not the original dates.
ASKER
Thank you for your help.
Pleasure, happy to have been able to help :) And a pleasure to have been able to work with you. So, a big Thank you back at ya :)
I hope I have a good understanding of what you are trying to do. If so, here is an approach that may work as long as you do not have an excessive amount of records per Emp-Tipo combination.
Sample data setup:
Open in new window
I altered the columns a bit because you had VARCHAR data in the sample INSERT but INT in the table definition.Using the above table, the following is a sample solution.
Open in new window
The first common table expression sorts the records by Ent under each Emp-Tipo combination. The second table recursively looks at each ranked row to see if it is contiguous with the row previous to it, storing this evaluation in a column groupHoraEntradaTrabalho. The final selection then groups by Emp, Tipo, and groupHoraEntradaTrabalho to find the range of days. To complete the solution, I used a HAVING clause to filter to only those values where the total days is 30 or more.The results based on the sample data is:
Open in new window
I hope that helps!
Best regards and happy coding,
Kevin