Link to home
Start Free TrialLog in
Avatar of platdesign
platdesignFlag for United States of America

asked on

Customer Attrition: SQL or VBA?

I have a table which lists the activity of each of my customers. For example:

John Doe, 1/1/2009
Jane Doe, 1/2009
Eric Doe, 2/2009
John Doe, 2/2009
John Doe, 3/2009
Jane Doe, 3/2009

I need to pull out a variety of reports based on attrition and am not sure if I should use VBA or SQL to get it done. One specific report I need to pull is Customer Retention: what percentage of my customers have had consecutive months of activity for 1 month, 2 months, 3 months, etc (all the way to the maximum number of months - based on the first month of activity until the present - somewhere around five years and growing).

Another report is to show which customers are new and which left in any given month. This report will simply check if the customer has activity in the previous month - if yes, they are not new, if no, they are new. Then vice versa - for each customer that has activity in last month, do they also have activity in this month - if no, they are considered "dropped" for this month.

To answer this question, please provide a working solution for the first report (not as a report - it should be either SQL or VBA). It'd be nice to have a working solution for the second as well, but I can always split that into a second question.

Please accept my sincerest thanks in advance for considering this challenge and working with me on it. I truly appreciate the hard work, time and effort all the Experts on this site put in.
Avatar of Patrick Matthews
Patrick Matthews
Flag of United States of America image

Hello platdesign,

It would be useful to see some sample data and/or a sample file.  EE now allows you to directly upload files
to your question.

Please be advised that once you upload a file, it can be publicly accessed, and that it may not be possible
to fully and permanently delete it.  Therefore, be very careful about posting proprietary, confidential, or
other sensitive information.  If necessary, use "fake" and/or obfuscated data in your sample.

Please note that at present EE restricts uploads to certain file types.  If your file type does not match
those in the list, you can use http://www.ee-stuff.com instead, which is not officially an EE site, but is run
by people connected to EE.

Regards,

Patrick
For the first report, all you want is a percentage?

What if you have someone whose first entry is 2 years ago, and you have entries in your table for all but two months.  I assume that he/she should not be counted as continuous, is that correct?
platdesign,

In addition, you are asking for, in effect, three separate queries.  Please keep in mind that the Experts here are
all volunteers, and that the EE culture is one of "one question per question".  I suggest you revise the scope to
encompass one of those queries, and then open separate questions to cover the others.

:)

Patrick
<ot> patrick, didn't know, you can type that fast!!! ;-) </ot>
capricorn1 said:
>> patrick, didn't know, you can type that fast!!! ;-)

Well, that "please post sample file" comment is canned :)
<ot> I Know!! i've seen that before !!! ;-) </ot>
Create a query to summarize monthly activity by customer, call it CustActivity
- assuming a table Activities, with fileds CustID and ActivityDate

Select Distinct CustID, CLng(Format(ActivityDate,"yyyymm")) as ActivityMonth From Activities

Then to find customers that have continuous activity:

Select CustID from CustActivity
Where Count(ActivityMonth) = CLng(Format(Now(),"yyyymm")) - Min(ActivityMonth) + 1
Group By CustID

I haven't tested this, but I think I have the logic right
Avatar of platdesign

ASKER

Wow - I was busy working on putting together a sample, and there's already a possible solution.

First - attached is a sample of my table.

Second - Matthew: I specified in my question that only one report needs to be generated. I was just trying to give a feel for how I will be working with the data so that any solution will be dynamic or versatile enough that I can modify them to create some of the other reports.

"To answer this  question, please provide a working solution for the first report (not as  a report - it should be either SQL or VBA). It'd be nice to have a  working solution for the second as well, but I can always split that  into a second question."
Third - I'll test Mike's solution now and post results.

sampleAttritionData.xlsx
Mike: the first query worked fine. The second query produced this error:

Cannot have an aggregate function in WHERE clause (Count(ActivityMonth)=Clng(Format(Now(),"yyyymm"))-Min(ActivityMonth)+1)

fyed: I can calculate the percentage myself - I'd much rather produce a list of each customer and how many consecutive months they have activity. And yes, if they skip a month they are considered a new customer.
My mistake - Aggregate functions need to go in a HAVING clause instead of WHERE
Try this version:

Select CustID from CustActivity
Group By CustID
Having Count(ActivityMonth) = CLng(Format(Now(),"yyyymm")) - Min(ActivityMonth) + 1
Mike: it runs but doesn't produce any results...
Mike: If you subtract 200708 from 201006 you get 297 which is why it's not producing any results. Not sure what logic you are using.

Min([ActivityMonth])+1 = 200709
CLng(Format(Now(),"yyyymm")) = 201006
Maybe convert it to a dateserial first? I'll try it and see if I can get it to work...
responding to platdesign #32970596

So if I had entries for:

fyed  10/1/2009
fyed  11/1/2009
fyed   1/1/2010
fyed   2/1/2010
fyed   4/1/2010
fyed   5/1/2010
fyed   6/1/2010

That would be counted as three separate customers? with only one of them (the one from Apr-Jun) meeting your criteria of continuous?  Seems like a really bad statistic to me.  

I think I would prefer to know how frequent (1 record per month) they were a customer since they started.  In my example above this would result in 7/11.  
fyed: If there is no activity for a particular month that means they switched to another company. If they resume activity it means they've returned to us and they are then considered a new customer. So, while it may seem like a bad statistic, it's the one I need :)
That's what I get for not testing
The idea is to get a month number
It should have been
Select Distinct CustID, Year(ActivityDate) * 12 + Month(ActivityDate) as ActivityMonth From Activities
and
Having Count(ActivityMonth) = Year(Now()) * 12 + Month(Now())  - Min(ActivityMonth) + 1

The logic in my suggestion is that, for continuous activity, the count of the number of months with activity should be the same as 1 + the difference in months between the first month's activity and the current date.
The first attempt fell over because there are 12 months in a year, not 10.
Regarding the SQL query provided by MikeToole above.  As stated by platdesign, stated above, ActivityMonth will be a text value since it is a Formatted string.  

Maybe if you modified the right hand side of the WHERE clause so that it converted ActivityMonth back to a date using DateSerial, then you could use the DateDiff function to count the number of months, something like:

Where Count(ActivityMonth) = DateDiff("m", DateSerial(clng(left(min(ActivityMonth),4)),   clng(mid(min(ActivityMonth), 5)), 1), DateSerial(clng(left(max(ActivityMonth), 4)), clng(mid(max(ActivityMonth), 5)), 1))

to account for "current" accounts, you will need to add a HAVING clause:

HAVING Max(ActivityMonth) = Format(date(), "yyyymm")

I apologize for not entering the entire SQL string, but our system is blocking SQL statements (I think it is a guard against SQL injection attacks).
Mike: second attempt also produced no results. I'll try fyed's suggestion now...
Here is the modified SQL from fyed. Unfortunately it still produces no results.
SELECT CustID
FROM qryCustActivity
GROUP BY CustID
HAVING (((Count(qryCustActivity.[ActivityMonth]))=DateDiff("m",DateSerial(CLng(Left(Min([ActivityMonth]),4)),CLng(Mid(Min([ActivityMonth]),5)),1),DateSerial(CLng(Left(Max([ActivityMonth]),4)),CLng(Mid(Max([ActivityMonth]),5)),1))) AND ((Max(qryCustActivity.[ActivityMonth]))=Format(Date(),"yyyymm")));
Using your data and the attached query I got the results below.

Cname      CountOfActivityMonth      MinOfActivityMonth      FromStart
Acme Corp.      4      24123      4
John Doe      7      24120      7
SELECT CustActivity.Cname, Count(CustActivity.ActivityMonth) AS CountOfActivityMonth, Min(CustActivity.ActivityMonth) AS MinOfActivityMonth, 1+12*Year(Now())+Month(Now())-Min([ActivityMonth]) AS FromStart
FROM CustActivity
GROUP BY CustActivity.Cname
HAVING (((1+12*Year(Now())+Month(Now())-Min([ActivityMonth]))=Count([ActivityMonth])));

Open in new window

I get the same on my sample data, however, this doesn't produce the results requested. I need to see ALL customers with the count of consecutive months for each of them. See the original question above - I need to show a count of consecutive months for each customer.

In addition, as fyed pointed out, if they leave and return they are considered a new customer.
Simplifying by removing the debug columns - I left in a potentially informative MonthsActive column - this result is produced by the attached:

Cname      MonthsActive
Acme Corp.      4
John Doe                      7
SELECT CustActivity.Cname, Count(CustActivity.ActivityMonth) AS MonthsActive
FROM CustActivity
GROUP BY CustActivity.Cname
HAVING (((1+12*Year(Now())+Month(Now())-Min([ActivityMonth]))=Count([ActivityMonth])));

Open in new window

>> I need to see ALL customers with the count of consecutive months for each of them
All that have activity in the current month therefore?
Mike - no I manually summed up my sample data and re-attached as an example of what I'm looking for.

sampleAttritionData.xls
OK. I think that that would be a little complex in SQL.
Essentially you need to identify the gaps.
That would need a 'helper' table of integers to create a query that outputs all posible customer-month combinations, Left join that to the actual customer-month query and use a Where clause to pick the Customer-month combinations where there was no customer activity.
So, it should be possible going the SQL route - I set-up test queries that identify the gaps - but there's still an amount of work to do.
Possibly for this one the quicker route is via VBA.
Hope this helps,
Mike
Thanks Mike. I'm working on creating the logic in Excel, which I know better than Access. Not sure if that will help, but I figure if I can get the logic worked out in Excel, there should be a way to replicate it in Access. I'll post my Excel if and when I get it working.
Well, It's 6:25 PM here now, so I'm heading off. Good luck!
It's 13:30 here, but I'm going to have to wait until later this evening to take another look.
Okay, I got it working in Excel. The part I'm having trouble with in Access is determining if a particular record is consecutive or not. If I can test for that, I think I got the rest down. Any ideas?

sampleAttritionData.xls
I'm getting there. Here's what I came up with. It almost works but I must be going wrong somewhere. The idea is to return a TRUE value for each record which also has a corresponding record one month prior. It's giving some false "TRUE"s however. Maybe one of you can figure it out....

SELECT Activities.[CName], Activities.[period], Activities.[cname], IIf(EXISTS (SELECT [CName] FROM [Activities] As Tmp GROUP BY [CName],[period] HAVING [period] = dateadd("m",-1,[Activities].[period])), "True", "False") As Consecutive
FROM Activities
ORDER BY Activities.[CName], Activities.[period];

Open in new window

SOLUTION
Avatar of Dale Fye
Dale Fye
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Here's what I came up with. It joins each customer-activitymonth row with the row from the previous month, if it exists, and returns a tru/false column to flag the 'new' customer.

SELECT CA.Cname, CA.ActivityMonth, iif(PM.ActivityMonth is null, True,False) AS NewCustomer
FROM CustActivity AS CA LEFT JOIN CustActivity AS PM ON (CA.Cname = PM.Cname) AND (CA.ActivityMonth-1 = PM.ActivityMonth);
Mike: Your SQL worked. Not sure why yours did and mine didn't - it looks like you just used IsNull instead of my EXISTS. Either way, yours runs faster, so I'm going to try to build out the rest of my Excel file based on yours. I'll keep you posted.

Fyed: I tried the SQL first since it was simpler and a more elegant solution. If I can't get the rest of it to work, I'll try your VBA. I hope that's the right thing to do.

Thank you both for your help!
Mike: I'm kinda stuck here. The logic I'm using is as follows:
Make a new field called "GroupCount"
If Period = Min(Period) Then
GroupCount = 1
Else If NewCustomer is False Then
GroupCount = {same as previous month}
Else
GroupCount = {same as previous month}+1
Then I make another field called "GroupID" which is essentially equal to GroupCount & CName.

I'll run a totals query on this one to count records in each GroupID which will give me the total number of months each customer stayed consecutively. From there it's simple to get my results.

Anyway, I'm not sure how to calculate the GroupCount field and I'm thinking I might end up having to add a field to my table and use fyed's solution.

Thoughts?
I have the Min(Period) running in a separate query, so testing for the first month is easy. It's the GroupCount that I'm having trouble generating on the fly. If I add it as a field and run an update query with your code first, then it might work.
I much prefer an "elegant SQL solution", but sometimes they just don't jump out at you.  Like I said, I spent well over an hour looking for a SQL solution before I gave up ane spent 15 minutes writing the VBA code.  

If you would rather not have a "permanent" table to store these results, create a temp table that you create every time you need this procedure, and which gets dropped after each use.

You might want to check out Joe Celko's "Joe Celko's SQL for Smarties: Advanced SQL Programming Second Edition", he has some great examples of queries for stuff like this.  Unfortunately, I'm currently unable to locate my copy.

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
fyed: Thanks for the advice! I'm working my way through Access 2007 VBA from Wrox - I'm not a programmer or Access guy by training, but my new job requires a lot of it. So far I pretty much just figure things out on the fly, with copious amounts of help from EE Experts like yourself, Mike and Capricorn. :)

That being said, I'm simply not sure what the "best" solution to this will be. The table I'm running off now has 250,000 records, and that will grow monthly. I'm leery of using VBA on such a large volume of records, but that may be my own ignorance. I was told VBA is much slower than SQL.

At this point, since even Mike's solution is utilizing VBA, I'll give your solution a try and see where it gets me.

Thank you both. I think at this point you should split the points regardless of which solution I choose. Do you both concur?
Happy for a split - though the VBA function call hardly counts since the equivalent could instead have been coded in-line in the SQL :-)

As I said in my last post, depending on what's in your data, the SQL approach can be simplified a lot.
If all the period dates are for the 1st of the month, and there's just one record per customer per period, the attached SQL does the whole job.

PS
You might wonder about the use of CDbl()
It works because dates are stored internally in Access as floating point values.
When the comparison was done without CDbl() it failed. Access was probably preforming a string conversion before concatenating the comparison date to the expression. Using CDbl() forced a numeric comparison.
Select Cname, StartPeriod, Count(Cname) From (
SELECT Attrition.Cname, Attrition.Period, DMax("Period","NewCustomer2","Cname='" & [Cname] & "' And Period <= " & CDbl([Period])) AS StartPeriod
FROM Attrition) As TagStartPeriod
Group By Cname, StartPeriod;


Query NewCustomer2:

SELECT Attrition.Cname, Attrition.Period
FROM Attrition LEFT JOIN Attrition AS PrevMonth ON (Attrition.Period = DateAdd("M",1,PrevMonth.Period)) AND (Attrition.Cname = PrevMonth.Cname)
Where PrevMonth.Period is null;

Open in new window

Mike: Your first query returned "The Microsoft Access database engine cannot find the input table or query 'NewCustomer2'. Make sure it exists and that its name is spelled correctly."

Every time I click ok, it just pops up again, so I have to close Access and restart.
Mike: nevermind. I found my error.
Mike: I created the Query NewCustomer2 and then tried running your other query. It's stuck - it's been running for half hour and no change.

fyed: I'm not comfortable with having a second table and using append. I can easily end up with hundreds of thousands of duplicates because every time I run the code, it'll append new records for everything in the Attrition table (which is constantly growing).

I think a combo of SQL and VBA is the only way to go. I'm gonna spend some time trying to figure it out and I'll post updates.

The steps I'm gonna take at this point:
1. Update the customer data table with the two fields from Mike's and my query: Is this record consecutive, and is this the first month this customer is active.
2. Calculate the incrementing GroupID and Update it to the table using a modified version of fyed's VBA code.
3. Create a SQL query to summarize the number of months in each combo of Customer & GroupID.

I think that should do it.
So add a line of code to my routine:

currentdb.execute "DELETE * FROM tbl_Attrition_Stats", dbfailOnError

right before:

strSQL = "SELECT CName, Period FROM AttritionData ORDER BY CName, Period"

If you would post copy your attrition table to a new mdb, and purge all of the unneccessary fields, compact it, then post it here, I'll take a look.
250,000 calls to DMax might slow it some.
Do you have any indexes on Attrition? I presume that Cname + Period is the Primary Key of the table
Ok, attached is a sample file. I also attached my "just-begun" vba code.

The logic I'm using is:

If {this is the first month the AcctNo appears in the Activity table} Then
GroupID = 1
ElseIf {it's not the first month, but it is consecutive (i.e. there is a record matching the Period minus one month)} Then
GroupID = {the previous month's GroupID}
Else {It's not the first month and it's not consecutive}
GroupID = {the most recent GroupID for this AcctNo}+2
End If


However, this will only work if I first sort the table by AcctNo and then Period. Otherwise, the check for {the most recetn GroupID for this AcctNo} may turn up a blank.

I recognize that this might be a really primitive way of running this, and it may also be very slow, but most importantly I need it to work. I am by all means open to alternatives!!

Public Sub AnalyzePoolReports()
    Dim db As DAO.Database, rs1 As DAO.Recordset, rs2 As DAO.Recordset
    Dim strAcctNo As String, strSQL As String
    
    Set db = CurrentDb
    Set rs1 = db.OpenRecordset("tblActivityReports")
    
    rs1.MoveLast
    rs1.MoveFirst
    
    strSQL = "SELECT "
                
    Do Until rs1.EOF
        strAcctNo = rs1("AcctNo")
        
        If rs1("Period") = DMin("Period", "tblActivityReports", "[AcctNo]='" & strAcctNo & "'") Then 'this is the first month of activity
            rs1.Edit
                rs1("FirstMonth") = True
                rs1("GroupID") = 1
            rs1.Update
        Else
            
        End If
        rs1.MoveNext
        
    Loop
End Sub

Open in new window

sampleAttrition.mdb
Followup: Just reminding you that the goal is to produce a list of customers with the number of consecutive months each stayed with the company. Each customer may appear multiple times in the list.
This table is just the 85 rows you supplied before. Is it possible to supply more data? Maybe 50K rows?
Just Cname (or account ID)  and Period will do. Any format, Excel, csv, mdb.
That way there would be a chance of addressing performance.
Not sure how to do that without disclosing confidential data...
AccNo and period shouldn't be too much of an exposure.
Add a fixed number to AccNo if you don't want the real values to go ouside.
just now getting a chance to look at your code.

The DMIN function is going to slow this down significantly.  That is why in my code, I Sorted by Cname and Period, and then tested to see whether the period is continuous.
Ok, here is a sample with 250,000 rows. I removed CName and changed the AcctNo enough that it can't hurt to release.

sampleAttrition.mdb
fyed: Any way to update the table instead of appending to a different table?
Solved!! I finished a VBA script which runs in under a minute and updates all the records. I left the office already, so I'll post it tomorrow.
Good news :-)
My original code only took about 30 seconds to run on my ancient Pentium 4.

Of course you could just update the original table, but your Yes/No fields for FirstMonth and Consecutive won't help you with the queries you wanted to run to identify the percentage of "customer/periods" are current.

You could change [FirstMonth] to [LastMonth] as a date field, or [SequentialMonths] as an integer.  Then, instead of writing a record to the new table, you could write a to those fields for the first record in each Account/Period grouping.

With that setup, you could then run your query for where [LastMonth] or [SeqMonths] is not NULL to identify all of the group start/stop sequences.  I did this, and am sending your sample mdb back to you with my modified code, your modified table, and a query that identifies all of the unique account/period groupings, the number of consecutive months in the grouping, and whether that particular account/period group is current.  Total run time was 28 seconds on my P4.


sampleAttrition.mdb
BTW, with qry_Attrition_Results, you could then determine the number of "current" accounts as simply as:

SELECT Sum(Abs([Current])) AS [Current Cust] FROM qry_Attrition_Results;

Or you could determine how many of your customers from December, 2009 have been concontinuous.

SELECT Sum(Abs([Current])) AS [Conseq Since 12-1-09]
FROM qry_Attrition_Results
WHERE (((qry_Attrition_Results.Period)<=#12/31/2009#) AND ((qry_Attrition_Results.LastMonth)=#6/1/2010#));

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Also, here is my summary query which gives me a count of how many months each customer was consecutive for each iteration.

SELECT tblActivity.AcctNo, tblActivity.GroupID, Count(tblActivity.Period) AS CountOfPeriod
FROM tblActivity
GROUP BY tblActivity.AcctNo, tblActivity.GroupID;

Open in new window

Looks like that will work.  Don't forget to close out the question and assign points.

Dale