?
Solved

SQL query to tie contiguous months together

Posted on 2007-10-08
6
Medium Priority
?
435 Views
Last Modified: 2008-01-09
I'm hoping someone has a better way of populating a table than we currently have.

This is all happing in a MS SQL 2005 db.

Let's say we have a table we'll call customer_months (millions of rows in the table) and it contains the following columns:

Customer_ID varchar(20)
Customer_Month smalldatetime (always MM/01/YYYY)
Month_Type varchar(50)

The rows look like this:
12345E1 01/01/2006 MonthType1
12345E1 02/01/2006 MonthType1
12345E1 03/01/2006 MonthType1
12345E1 04/01/2006 MonthType1
12345E1 03/01/2006 MonthType2
12345E1 04/01/2006 MonthType2
12345E1 08/01/2006 MonthType1
12345E1 09/01/2006 MonthType1
34404S1 02/01/2007 MonthType1
22237S1 08/01/2007 MonthType4
22237S1 09/01/2007 MonthType4
22237S1 03/01/2006 MonthType3

We then need to take this data and populate a table we'll call customer_month_spans containing the following columns:

Customer_ID varchar(20)
Customer_Month_Span_Start smalldatetime
Customer_Month_Span_End smalldatetime
Customer_Month_Span_Months_Count int
Month_Type varchar(50)


Using our example data the table customer_month_spans would look like this once populated:
12345E1 01/01/2006 04/01/2006 4 MonthType1
12345E1 08/01/2006 09/01/2006 2 MonthType1
12345E1 03/01/2006 04/01/2006 2 MonthType2
34404S1 02/01/2007 02/01/2007 1 MonthType1
22237S1 08/01/2007 09/01/2007 2 MonthType4
22237S1 03/01/2006 03/01/2006 1 MonthType3

A couple of things to note:

-The purpose of populating customer_month_spans is to show a rolled up version of what is going on with a customer over a date range.
-We only want contiguous spans. For example 12345E1 has a MonthType1 of for 6 months but in two separate contiguous spans.
-Sometimes a span is only 1 month.
-A contiguous span can span years like 2006 to 2007 and we would want to show it as such.  For example a span might be 09/01/2006 - 02/01/2008 which would produce 18 for Customer_Month_Span_Months_Count.


Our current way of populating Customer_Month_Spans is using a cursor to load a Customer_ID and a Month_Type then querying for the Min and Max Customer_Month where the Customer_ID = @Customer_ID and Month_Type = @Month_Type. Then figuring out if the spans are contiguous or not and inserting the proper row(s) into customer_month_spans based on what we find. As you can guess the process is extremely slow. I'm hoping someone has a query that is not cursor based.

Thank you.
0
Comment
Question by:ReconIT
  • 3
  • 2
6 Comments
 
LVL 14

Expert Comment

by:Jai S
ID: 20035108
this will give you your start and end month
select id,max(Customer_Month),min(Customer_Month) from yourTable
group by id,month_type

use a insert statement like
insert into somenewtable (colnames)
select id,max(Customer_Month),min(Customer_Month) from yourTable
group by id,month_type

but you also need Customer_Month_Span_Months_Count ...can you clarify this
based on your record
12345E1 01/01/2006 04/01/2006 4 MonthType1
12345E1 08/01/2006 09/01/2006 2 MonthType1

how do you say the first one is "4" and the second is "2" ...i mean how are you selecting the dates...on what basis ?explain the logic and you can get a better solution,..
0
 
LVL 1

Author Comment

by:ReconIT
ID: 20035362
Thanks jaiganeshsrinivasan. I've got the part about picking up the min and max date but it is the non contiguous spans that I can't come up with. For example if you run the T-SQL below you will see that the end result is 5 rows when we should really have 6 rows. The reason that we should have 6 rows is that customer 12345E1 has 6 months with a month type of MonthType1. However the months are not contiguous which means we should have 2 records for this customer in the table. One for each contiguous span. I'm not sure if that made sense but our rules say that a span is only a span if the months are contiguous (with out a gap). If there is a gap then we would need to treat it as two spans.

--************ START T-SQL ***********************

CREATE TABLE #customer_months (Customer_ID varchar(20),
Customer_Month smalldatetime,
Month_Type varchar(50))

INSERT #customer_months
SELECT '12345E1', '01/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '02/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '03/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '04/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '03/01/2006', 'MonthType2'

INSERT #customer_months
SELECT '12345E1', '04/01/2006', 'MonthType2'

INSERT #customer_months
SELECT '12345E1', '08/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '12345E1', '09/01/2006', 'MonthType1'

INSERT #customer_months
SELECT '34404S1', '02/01/2007', 'MonthType1'

INSERT #customer_months
SELECT '22237S1', '08/01/2007', 'MonthType4'

INSERT #customer_months
SELECT '22237S1', '09/01/2007', 'MonthType4'

INSERT #customer_months
SELECT '22237S1', '03/01/2006', 'MonthType3'

CREATE TABLE #customer_month_spans (
Customer_ID varchar(20),
Customer_Month_Span_Start smalldatetime,
Customer_Month_Span_End smalldatetime,
Customer_Month_Span_Months_Count int,
Month_Type varchar(50),
Contiguous varchar(3)
)

--INSERT THE SPANS WE KNOW ARE CONTIGUOUS
INSERT #customer_month_spans
SELECT
Customer_ID ,
MIN(Customer_Month) AS Customer_Month_Span_Start ,
MAX(Customer_Month) AS Customer_Month_Span_End,
DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1 AS Customer_Month_Span_Months_Count,
Month_Type,
'YES'
FROM #customer_months
GROUP BY Customer_ID, Month_Type
HAVING COUNT(Customer_Month) = (DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1)

--INSERT THE SPANS WE KNOW ARE NOT CONTIGUOUS
INSERT #customer_month_spans
SELECT
Customer_ID ,
MIN(Customer_Month) AS Customer_Month_Span_Start ,
MAX(Customer_Month) AS Customer_Month_Span_End,
DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1 AS Customer_Month_Span_Months_Count,
Month_Type,
'NO'
FROM #customer_months
GROUP BY Customer_ID, Month_Type
HAVING COUNT(Customer_Month) <> (DATEDIFF(MONTH,MIN(Customer_Month),MAX(Customer_Month)) + 1)

--shows 5 rows when we should have 6
SELECT * FROM #customer_month_spans
ORDER BY CUSTOMER_ID, MONTH_TYPE, Customer_Month_Span_Start

DROP TABLE #customer_months
DROP TABLE #customer_month_spans


--************ END T-SQL ***********************
0
 
LVL 1

Author Comment

by:ReconIT
ID: 20035495
Just to throw out a different way of thinking about the problem. Let's say you are an online video rental company. John Doe signs up on 1/1/2006 as a customer and then cancels his service on 5/1/2006. Then he signs up again on 8/1/2006 and then cancels his service again on 12/1/2006. If I asked you what was John Doe's history with us would you say he was a customer from 1/1/2006 - 12/1/2006? The answer is no because that is not correct. He was a customer from 1/1/2006 through 5/1/2006 and then 8/1/2006 through 12/1/2006 which should be shown as 2 records. I hope that makes it a bit more clear. Thank you for the help. I'll be a happy man if this one is solved.
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 50

Expert Comment

by:Lowfatspread
ID: 20035553
try this ....

use tempdb
go
drop table #customer

Select *
  into #Customer
from (
select '12345E1' as customer_id, '20060101' as customer_month, 'MonthType1' as month_type
union select '12345E1', '20060201', 'MonthType1'
union select '12345E1' ,'20060301', 'MonthType1'
union select '12345E1', '20060401', 'MonthType1'
union select '12345E1' ,'20060301', 'MonthType2'
union select '12345E1', '20060401', 'MonthType2'
union select '12345E1', '20060801', 'MonthType1'
union select '12345E1', '20060901', 'MonthType1'
union select '34404S1', '20070201', 'MonthType1'
union select '22237S1', '20070801', 'MonthType4'
union select '22237S1', '20070901', 'MonthType4'
union select '22237S1', '20060301', 'MonthType3'
) as x

Select
  c.customer_id
 ,c.customer_month as customer_month_span_start
 ,coalesce(c1.customer_month,c.customer_month) as customer_month_span_end
,coalesce(datediff(m,c.customer_month,c1.customer_month)+1,1) as interval
 ,c.Month_type
 from #customer as c
 left outer Join #customer as c1
   on c.customer_id=c1.customer_id
  and c.month_type=c1.month_type
  and c.customer_month < c1.customer_month
 Where not exists (select customer_id
                     from #customer as x
                    where c.customer_id=x.customer_id
                      and c.month_type=x.month_type
                      and x.customer_month=dateadd(m,-1,c.customer_month)
                  )  
   and not exists (select customer_id
                     from #customer as x
                    where c.customer_id=x.customer_id
                      and c.month_type=x.month_type
                      and x.customer_month=dateadd(m,1,c1.customer_month)
                  )
   and (datediff(m,c.customer_month,coalesce(c1.customer_month,c.customer_month))+1
        = (select count(distinct customer_month)
             from #customer as x
            Where x.customer_id=c.customer_id
              and x.month_type=c.month_type
              and x.customer_month between c.customer_month and c1.customer_month
          )    
        or c1.customer_month is null
         )  
0
 
LVL 50

Accepted Solution

by:
Lowfatspread earned 2000 total points
ID: 20035600
e.g.

Select
  c.customer_id
 ,c.customer_month as customer_month_span_start
 ,coalesce(c1.customer_month,c.customer_month) as customer_month_span_end
,coalesce(datediff(m,c.customer_month,c1.customer_month)+1,1) as interval
 ,c.Month_type

--join the table to itself such that later month are joined to the lower months...
 from #customer as c
 left outer Join #customer as c1
   on c.customer_id=c1.customer_id
  and c.month_type=c1.month_type
  and c.customer_month < c1.customer_month

-- Initial  condition is that there isnt a row for the prior month...
 Where not exists (select customer_id
                     from #customer as x
                    where c.customer_id=x.customer_id
                      and c.month_type=x.month_type
                      and x.customer_month=dateadd(m,-1,c.customer_month)
                  )  

-- Also Exclude rows
-- if the next month exists
   and not exists (select customer_id
                     from #customer as x
                    where c.customer_id=x.customer_id
                      and c.month_type=x.month_type
                      and x.customer_month=dateadd(m,1,c1.customer_month)
                  )


-- now for the contiguous month requirement
-- the difference in the dates must equal the number of records in the table between
-- the two dates...

--  remove the distinct from the count if duplicate date rows cannot exist in the table...
 
   and (datediff(m,c.customer_month,coalesce(c1.customer_month,c.customer_month))+1
        = (select count(distinct customer_month)
             from #customer as x
            Where x.customer_id=c.customer_id
              and x.month_type=c.month_type
              and x.customer_month between c.customer_month and c1.customer_month
          )    

-- deal with the last /single month spans

        or c1.customer_month is null
         )  


hth
0
 
LVL 1

Author Comment

by:ReconIT
ID: 20035861
Lowfatspread many many thanks. Also thank you for the comments in the SQL to help me understand the solution. I modified the query to use our real table name and column names and ran it against production data and it ran very quickly so I'm happy about that. More importantly I spot checked a number of records I would consider tricky and everything looks correct. I'm amazed at how quickly you put it together. Thanks again.
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
This shares a stored procedure to retrieve permissions for a given user on the current database or across all databases on a server.
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Suggested Courses

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question