We help IT Professionals succeed at work.

SQL query to tie contiguous months together

460 Views
Last Modified: 2008-01-09
I'm hoping someone has a better way of populating a table than we currently have.

This is all happing in a MS SQL 2005 db.

Let's say we have a table we'll call customer_months (millions of rows in the table) and it contains the following columns:

Customer_ID varchar(20)
Customer_Month smalldatetime (always MM/01/YYYY)
Month_Type varchar(50)

The rows look like this:
12345E1 01/01/2006 MonthType1
12345E1 02/01/2006 MonthType1
12345E1 03/01/2006 MonthType1
12345E1 04/01/2006 MonthType1
12345E1 03/01/2006 MonthType2
12345E1 04/01/2006 MonthType2
12345E1 08/01/2006 MonthType1
12345E1 09/01/2006 MonthType1
34404S1 02/01/2007 MonthType1
22237S1 08/01/2007 MonthType4
22237S1 09/01/2007 MonthType4
22237S1 03/01/2006 MonthType3

We then need to take this data and populate a table we'll call customer_month_spans containing the following columns:

Customer_ID varchar(20)
Customer_Month_Span_Start smalldatetime
Customer_Month_Span_End smalldatetime
Customer_Month_Span_Months_Count int
Month_Type varchar(50)


Using our example data the table customer_month_spans would look like this once populated:
12345E1 01/01/2006 04/01/2006 4 MonthType1
12345E1 08/01/2006 09/01/2006 2 MonthType1
12345E1 03/01/2006 04/01/2006 2 MonthType2
34404S1 02/01/2007 02/01/2007 1 MonthType1
22237S1 08/01/2007 09/01/2007 2 MonthType4
22237S1 03/01/2006 03/01/2006 1 MonthType3

A couple of things to note:

-The purpose of populating customer_month_spans is to show a rolled up version of what is going on with a customer over a date range.
-We only want contiguous spans. For example 12345E1 has a MonthType1 of for 6 months but in two separate contiguous spans.
-Sometimes a span is only 1 month.
-A contiguous span can span years like 2006 to 2007 and we would want to show it as such.  For example a span might be 09/01/2006 - 02/01/2008 which would produce 18 for Customer_Month_Span_Months_Count.


Our current way of populating Customer_Month_Spans is using a cursor to load a Customer_ID and a Month_Type then querying for the Min and Max Customer_Month where the Customer_ID = @Customer_ID and Month_Type = @Month_Type. Then figuring out if the spans are contiguous or not and inserting the proper row(s) into customer_month_spans based on what we find. As you can guess the process is extremely slow. I'm hoping someone has a query that is not cursor based.

Thank you.
Comment
Watch Question

Jai STech Arch

Commented:
this will give you your start and end month
select id,max(Customer_Month),min(Customer_Month) from yourTable
group by id,month_type

use a insert statement like
insert into somenewtable (colnames)
select id,max(Customer_Month),min(Customer_Month) from yourTable
group by id,month_type

but you also need Customer_Month_Span_Months_Count ...can you clarify this
based on your record
12345E1 01/01/2006 04/01/2006 4 MonthType1
12345E1 08/01/2006 09/01/2006 2 MonthType1

how do you say the first one is "4" and the second is "2" ...i mean how are you selecting the dates...on what basis ?explain the logic and you can get a better solution,..

Author

Commented:
Thanks jaiganeshsrinivasan. I've got the part about picking up the min and max date but it is the non contiguous spans that I can't come up with. For example if you run the T-SQL below you will see that the end result is 5 rows when we should really have 6 rows. The reason that we should have 6 rows is that customer 12345E1 has 6 months with a month type of MonthType1. However the months are not contiguous which means we should have 2 records for this customer in the table. One for each contiguous span. I'm not sure if that made sense but our rules say that a span is only a span if the months are contiguous (with out a gap). If there is a gap then we would need to treat it as two spans.

--************ START T-SQL ***********************

CREATE TABLE #customer_months (Customer_ID varchar(20),
Customer_Month smalldatetime,
Month_Type varchar(50))

INSERT #customer_months
SELECT '12345E1', '01/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '02/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '03/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '04/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '03/01/2006', 'MonthType2'

INSERT #customer_months
SELECT '12345E1', '04/01/2006', 'MonthType2'

INSERT #customer_months
SELECT '12345E1', '08/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '09/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '34404S1', '02/01/2007', 'MonthType1'

INSERT #customer_months
SELECT '22237S1', '08/01/2007', 'MonthType4'

INSERT #customer_months
SELECT '22237S1', '09/01/2007', 'MonthType4'

INSERT #customer_months
SELECT '22237S1', '03/01/2006', 'MonthType3'

CREATE TABLE #customer_month_spans (
Customer_ID varchar(20),
Customer_Month_Span_Start smalldatetime,
Customer_Month_Span_End smalldatetime,
Customer_Month_Span_Months_Count int,
Month_Type varchar(50),
Contiguous varchar(3)
)

--INSERT THE SPANS WE KNOW ARE CONTIGUOUS
INSERT #customer_month_spans
SELECT
Customer_ID ,
MIN(Customer_Month) AS Customer_Month_Span_Start ,
MAX(Customer_Month) AS Customer_Month_Span_End,
DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1 AS Customer_Month_Span_Months_Count,
Month_Type,
'YES'
FROM #customer_months
GROUP BY Customer_ID, Month_Type
HAVING COUNT(Customer_Month) = (DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1)

--INSERT THE SPANS WE KNOW ARE NOT CONTIGUOUS
INSERT #customer_month_spans
SELECT
Customer_ID ,
MIN(Customer_Month) AS Customer_Month_Span_Start ,
MAX(Customer_Month) AS Customer_Month_Span_End,
DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1 AS Customer_Month_Span_Months_Count,
Month_Type,
'NO'
FROM #customer_months
GROUP BY Customer_ID, Month_Type
HAVING COUNT(Customer_Month) <> (DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1)

--shows 5 rows when we should have 6
SELECT * FROM #customer_month_spans
ORDER BY CUSTOMER_ID, MONTH_TYPE, Customer_Month_Span_Start

DROP TABLE #customer_months
DROP TABLE #customer_month_spans


--************ END T-SQL ***********************

Author

Commented:
Just to throw out a different way of thinking about the problem. Let's say you are an online video rental company. John Doe signs up on 1/1/2006 as a customer and then cancels his service on 5/1/2006. Then he signs up again on 8/1/2006 and then cancels his service again on 12/1/2006. If I asked you what was John Doe's history with us would you say he was a customer from 1/1/2006 - 12/1/2006? The answer is no because that is not correct. He was a customer from 1/1/2006 through 5/1/2006 and then 8/1/2006 through 12/1/2006 which should be shown as 2 records. I hope that makes it a bit more clear. Thank you for the help. I'll be a happy man if this one is solved.
CERTIFIED EXPERT
Top Expert 2011

Commented:
try this ....

use tempdb
go
drop table #customer

Select *
  into #Customer
from (
select '12345E1' as customer_id, '20060101' as customer_month, 'MonthType1' as month_type
union select '12345E1', '20060201', 'MonthType1'
union select '12345E1' ,'20060301', 'MonthType1'
union select '12345E1', '20060401', 'MonthType1'
union select '12345E1' ,'20060301', 'MonthType2'
union select '12345E1', '20060401', 'MonthType2'
union select '12345E1', '20060801', 'MonthType1'
union select '12345E1', '20060901', 'MonthType1'
union select '34404S1', '20070201', 'MonthType1'
union select '22237S1', '20070801', 'MonthType4'
union select '22237S1', '20070901', 'MonthType4'
union select '22237S1', '20060301', 'MonthType3'
) as x

Select
  c.customer_id
 ,c.customer_month as customer_month_span_start
 ,coalesce(c1.customer_month,c.customer_month) as customer_month_span_end
,coalesce(datediff(m,c.customer_month,c1.customer_month)+1,1) as interval
 ,c.Month_type
 from #customer as c
 left outer Join #customer as c1
   on c.customer_id=c1.customer_id
  and c.month_type=c1.month_type
  and c.customer_month < c1.customer_month
 Where not exists (select customer_id
                     from #customer as x
                    where c.customer_id=x.customer_id
                      and c.month_type=x.month_type
                      and x.customer_month=dateadd(m,-1,c.customer_month)
                  )  
   and not exists (select customer_id
                     from #customer as x
                    where c.customer_id=x.customer_id
                      and c.month_type=x.month_type
                      and x.customer_month=dateadd(m,1,c1.customer_month)
                  )
   and (datediff(m,c.customer_month,coalesce(c1.customer_month,c.customer_month))+1
        = (select count(distinct customer_month)
             from #customer as x
            Where x.customer_id=c.customer_id
              and x.month_type=c.month_type
              and x.customer_month between c.customer_month and c1.customer_month
          )    
        or c1.customer_month is null
         )  
CERTIFIED EXPERT
Top Expert 2011
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
Lowfatspread many many thanks. Also thank you for the comments in the SQL to help me understand the solution. I modified the query to use our real table name and column names and ran it against production data and it ran very quickly so I'm happy about that. More importantly I spot checked a number of records I would consider tricky and everything looks correct. I'm amazed at how quickly you put it together. Thanks again.
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.