• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 439
  • Last Modified:

SQL query to tie contiguous months together

I'm hoping someone has a better way of populating a table than we currently have.

This is all happing in a MS SQL 2005 db.

Let's say we have a table we'll call customer_months (millions of rows in the table) and it contains the following columns:

Customer_ID varchar(20)
Customer_Month smalldatetime (always MM/01/YYYY)
Month_Type varchar(50)

The rows look like this:
12345E1 01/01/2006 MonthType1
12345E1 02/01/2006 MonthType1
12345E1 03/01/2006 MonthType1
12345E1 04/01/2006 MonthType1
12345E1 03/01/2006 MonthType2
12345E1 04/01/2006 MonthType2
12345E1 08/01/2006 MonthType1
12345E1 09/01/2006 MonthType1
34404S1 02/01/2007 MonthType1
22237S1 08/01/2007 MonthType4
22237S1 09/01/2007 MonthType4
22237S1 03/01/2006 MonthType3

We then need to take this data and populate a table we'll call customer_month_spans containing the following columns:

Customer_ID varchar(20)
Customer_Month_Span_Start smalldatetime
Customer_Month_Span_End smalldatetime
Customer_Month_Span_Months_Count int
Month_Type varchar(50)


Using our example data the table customer_month_spans would look like this once populated:
12345E1 01/01/2006 04/01/2006 4 MonthType1
12345E1 08/01/2006 09/01/2006 2 MonthType1
12345E1 03/01/2006 04/01/2006 2 MonthType2
34404S1 02/01/2007 02/01/2007 1 MonthType1
22237S1 08/01/2007 09/01/2007 2 MonthType4
22237S1 03/01/2006 03/01/2006 1 MonthType3

A couple of things to note:

-The purpose of populating customer_month_spans is to show a rolled up version of what is going on with a customer over a date range.
-We only want contiguous spans. For example 12345E1 has a MonthType1 of for 6 months but in two separate contiguous spans.
-Sometimes a span is only 1 month.
-A contiguous span can span years like 2006 to 2007 and we would want to show it as such.  For example a span might be 09/01/2006 - 02/01/2008 which would produce 18 for Customer_Month_Span_Months_Count.


Our current way of populating Customer_Month_Spans is using a cursor to load a Customer_ID and a Month_Type then querying for the Min and Max Customer_Month where the Customer_ID = @Customer_ID and Month_Type = @Month_Type. Then figuring out if the spans are contiguous or not and inserting the proper row(s) into customer_month_spans based on what we find. As you can guess the process is extremely slow. I'm hoping someone has a query that is not cursor based.

Thank you.
0
ReconIT
Asked:
ReconIT
  • 3
  • 2
1 Solution
 
Jai STech ArchCommented:
this will give you your start and end month
select id,max(Customer_Month),min(Customer_Month) from yourTable
group by id,month_type

use a insert statement like
insert into somenewtable (colnames)
select id,max(Customer_Month),min(Customer_Month) from yourTable
group by id,month_type

but you also need Customer_Month_Span_Months_Count ...can you clarify this
based on your record
12345E1 01/01/2006 04/01/2006 4 MonthType1
12345E1 08/01/2006 09/01/2006 2 MonthType1

how do you say the first one is "4" and the second is "2" ...i mean how are you selecting the dates...on what basis ?explain the logic and you can get a better solution,..
0
 
ReconITAuthor Commented:
Thanks jaiganeshsrinivasan. I've got the part about picking up the min and max date but it is the non contiguous spans that I can't come up with. For example if you run the T-SQL below you will see that the end result is 5 rows when we should really have 6 rows. The reason that we should have 6 rows is that customer 12345E1 has 6 months with a month type of MonthType1. However the months are not contiguous which means we should have 2 records for this customer in the table. One for each contiguous span. I'm not sure if that made sense but our rules say that a span is only a span if the months are contiguous (with out a gap). If there is a gap then we would need to treat it as two spans.

--************ START T-SQL ***********************

CREATE TABLE #customer_months (Customer_ID varchar(20),
Customer_Month smalldatetime,
Month_Type varchar(50))

INSERT #customer_months
SELECT '12345E1', '01/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '02/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '03/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '04/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '03/01/2006', 'MonthType2'

INSERT #customer_months
SELECT '12345E1', '04/01/2006', 'MonthType2'

INSERT #customer_months
SELECT '12345E1', '08/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '09/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '34404S1', '02/01/2007', 'MonthType1'

INSERT #customer_months
SELECT '22237S1', '08/01/2007', 'MonthType4'

INSERT #customer_months
SELECT '22237S1', '09/01/2007', 'MonthType4'

INSERT #customer_months
SELECT '22237S1', '03/01/2006', 'MonthType3'

CREATE TABLE #customer_month_spans (
Customer_ID varchar(20),
Customer_Month_Span_Start smalldatetime,
Customer_Month_Span_End smalldatetime,
Customer_Month_Span_Months_Count int,
Month_Type varchar(50),
Contiguous varchar(3)
)

--INSERT THE SPANS WE KNOW ARE CONTIGUOUS
INSERT #customer_month_spans
SELECT
Customer_ID ,
MIN(Customer_Month) AS Customer_Month_Span_Start ,
MAX(Customer_Month) AS Customer_Month_Span_End,
DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1 AS Customer_Month_Span_Months_Count,
Month_Type,
'YES'
FROM #customer_months
GROUP BY Customer_ID, Month_Type
HAVING COUNT(Customer_Month) = (DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1)

--INSERT THE SPANS WE KNOW ARE NOT CONTIGUOUS
INSERT #customer_month_spans
SELECT
Customer_ID ,
MIN(Customer_Month) AS Customer_Month_Span_Start ,
MAX(Customer_Month) AS Customer_Month_Span_End,
DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1 AS Customer_Month_Span_Months_Count,
Month_Type,
'NO'
FROM #customer_months
GROUP BY Customer_ID, Month_Type
HAVING COUNT(Customer_Month) <> (DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1)

--shows 5 rows when we should have 6
SELECT * FROM #customer_month_spans
ORDER BY CUSTOMER_ID, MONTH_TYPE, Customer_Month_Span_Start

DROP TABLE #customer_months
DROP TABLE #customer_month_spans


--************ END T-SQL ***********************
0
 
ReconITAuthor Commented:
Just to throw out a different way of thinking about the problem. Let's say you are an online video rental company. John Doe signs up on 1/1/2006 as a customer and then cancels his service on 5/1/2006. Then he signs up again on 8/1/2006 and then cancels his service again on 12/1/2006. If I asked you what was John Doe's history with us would you say he was a customer from 1/1/2006 - 12/1/2006? The answer is no because that is not correct. He was a customer from 1/1/2006 through 5/1/2006 and then 8/1/2006 through 12/1/2006 which should be shown as 2 records. I hope that makes it a bit more clear. Thank you for the help. I'll be a happy man if this one is solved.
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LowfatspreadCommented:
try this ....

use tempdb
go
drop table #customer

Select *
  into #Customer
from (
select '12345E1' as customer_id, '20060101' as customer_month, 'MonthType1' as month_type
union select '12345E1', '20060201', 'MonthType1'
union select '12345E1' ,'20060301', 'MonthType1'
union select '12345E1', '20060401', 'MonthType1'
union select '12345E1' ,'20060301', 'MonthType2'
union select '12345E1', '20060401', 'MonthType2'
union select '12345E1', '20060801', 'MonthType1'
union select '12345E1', '20060901', 'MonthType1'
union select '34404S1', '20070201', 'MonthType1'
union select '22237S1', '20070801', 'MonthType4'
union select '22237S1', '20070901', 'MonthType4'
union select '22237S1', '20060301', 'MonthType3'
) as x

Select
  c.customer_id
 ,c.customer_month as customer_month_span_start
 ,coalesce(c1.customer_month,c.customer_month) as customer_month_span_end
,coalesce(datediff(m,c.customer_month,c1.customer_month)+1,1) as interval
 ,c.Month_type
 from #customer as c
 left outer Join #customer as c1
   on c.customer_id=c1.customer_id
  and c.month_type=c1.month_type
  and c.customer_month < c1.customer_month
 Where not exists (select customer_id
                     from #customer as x
                    where c.customer_id=x.customer_id
                      and c.month_type=x.month_type
                      and x.customer_month=dateadd(m,-1,c.customer_month)
                  )  
   and not exists (select customer_id
                     from #customer as x
                    where c.customer_id=x.customer_id
                      and c.month_type=x.month_type
                      and x.customer_month=dateadd(m,1,c1.customer_month)
                  )
   and (datediff(m,c.customer_month,coalesce(c1.customer_month,c.customer_month))+1
        = (select count(distinct customer_month)
             from #customer as x
            Where x.customer_id=c.customer_id
              and x.month_type=c.month_type
              and x.customer_month between c.customer_month and c1.customer_month
          )    
        or c1.customer_month is null
         )  
0
 
LowfatspreadCommented:
e.g.

Select
  c.customer_id
 ,c.customer_month as customer_month_span_start
 ,coalesce(c1.customer_month,c.customer_month) as customer_month_span_end
,coalesce(datediff(m,c.customer_month,c1.customer_month)+1,1) as interval
 ,c.Month_type

--join the table to itself such that later month are joined to the lower months...
 from #customer as c
 left outer Join #customer as c1
   on c.customer_id=c1.customer_id
  and c.month_type=c1.month_type
  and c.customer_month < c1.customer_month

-- Initial  condition is that there isnt a row for the prior month...
 Where not exists (select customer_id
                     from #customer as x
                    where c.customer_id=x.customer_id
                      and c.month_type=x.month_type
                      and x.customer_month=dateadd(m,-1,c.customer_month)
                  )  

-- Also Exclude rows
-- if the next month exists
   and not exists (select customer_id
                     from #customer as x
                    where c.customer_id=x.customer_id
                      and c.month_type=x.month_type
                      and x.customer_month=dateadd(m,1,c1.customer_month)
                  )


-- now for the contiguous month requirement
-- the difference in the dates must equal the number of records in the table between
-- the two dates...

--  remove the distinct from the count if duplicate date rows cannot exist in the table...
 
   and (datediff(m,c.customer_month,coalesce(c1.customer_month,c.customer_month))+1
        = (select count(distinct customer_month)
             from #customer as x
            Where x.customer_id=c.customer_id
              and x.month_type=c.month_type
              and x.customer_month between c.customer_month and c1.customer_month
          )    

-- deal with the last /single month spans

        or c1.customer_month is null
         )  


hth
0
 
ReconITAuthor Commented:
Lowfatspread many many thanks. Also thank you for the comments in the SQL to help me understand the solution. I modified the query to use our real table name and column names and ran it against production data and it ran very quickly so I'm happy about that. More importantly I spot checked a number of records I would consider tricky and everything looks correct. I'm amazed at how quickly you put it together. Thanks again.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now