fb1990
asked on
Cohort Analysis
Hello EE,
Can someone please help me with a SQL code to generate a cohort analysis from the Oracle database? I have attached a sample data. I want to see calculate what % of the customers acquired in a specified month are still active in Month1, Month2, Month3, etc. The goal is to take the data to excel for final rollup and visualization, but I need the code to model the data
Cohort-Dataset.xlsx
Can someone please help me with a SQL code to generate a cohort analysis from the Oracle database? I have attached a sample data. I want to see calculate what % of the customers acquired in a specified month are still active in Month1, Month2, Month3, etc. The goal is to take the data to excel for final rollup and visualization, but I need the code to model the data
Cohort-Dataset.xlsx
Can you take the first 20 records for example and show the results you would expect?
I ran this with the first 19 records and tested for customers acquired in January, 2017 and only showed three months, but you can expand it to meet your requirements -
with cte as
(select trunc(registrationdt,'mm') registrationdt, lastactivedt, ceil(months_between(lastactivedt, last_day(registrationdt))) monthsactive,
count(*) over (order by 1) tot
from yourtable
where trunc(registrationdt,'mm') = to_date('201701','YYYYMM')),--Select the date of the month to check
cte2 as
(select trunc(registrationdt,'mm') month,
sum(case when monthsactive >= 1 then 1 else 0 end) as month1,
sum(case when monthsactive >= 2 then 1 else 0 end) as month2,
sum(case when monthsactive >= 3 then 1 else 0 end) as month3
from cte
group by trunc(registrationdt,'mm')
order by trunc(registrationdt,'mm'))
select distinct cte2.month,
to_char(round(cte2.month1/cte.tot,2)*100)||'%' month1pct,
to_char(round(cte2.month2/cte.tot,2)*100)||'%' month2pct,
to_char(round(cte2.month3/cte.tot,2)*100)||'%' month3pct
from cte inner join cte2
on trunc(cte.registrationdt,'mm') = cte2.month;
with cte as
(select trunc(registrationdt,'mm') registrationdt, lastactivedt, ceil(months_between(lastactivedt, last_day(registrationdt))) monthsactive,
count(*) over (order by 1) tot
from yourtable
where trunc(registrationdt,'mm') = to_date('201701','YYYYMM')),--Select the date of the month to check
cte2 as
(select trunc(registrationdt,'mm') month,
sum(case when monthsactive >= 1 then 1 else 0 end) as month1,
sum(case when monthsactive >= 2 then 1 else 0 end) as month2,
sum(case when monthsactive >= 3 then 1 else 0 end) as month3
from cte
group by trunc(registrationdt,'mm')
order by trunc(registrationdt,'mm'))
select distinct cte2.month,
to_char(round(cte2.month1/cte.tot,2)*100)||'%' month1pct,
to_char(round(cte2.month2/cte.tot,2)*100)||'%' month2pct,
to_char(round(cte2.month3/cte.tot,2)*100)||'%' month3pct
from cte inner join cte2
on trunc(cte.registrationdt,'mm') = cte2.month;
ASKER
Thanks, awking00. I really appreciate your help. Can you please help add record count to the resultset to show the number of records in the cohort?
My example produced the following result fro customers acquired in January, 2017:
MONTH MONTH1P MONTH2P MONTH3P
--------- ------- ------- -------
01-JAN-17 100% 100% 40%
There were five records that were acquired in January, 2017. So, would you like to see something like this?MONTH MONTH1P MONTH2P MONTH3P COHORTS
--------- ------- ------- ------- -------
01-JAN-17 100% 100% 40% 5
If that is what you want change the final query removing the distinct keyword, adding a count(*) and group by statements -
select cte2.month, -- Distinct keyword removed
to_char(round(cte2.month1/cte.tot,2)*100)||'%' month1pct,
to_char(round(cte2.month2/cte.tot,2)*100)||'%' month2pct,
to_char(round(cte2.month3/cte.tot,2)*100)||'%' month3pct,
count(*) cohorts
from cte inner join cte2
on trunc(cte.registrationdt,'mm') = cte2.month
group by cte2.month,
to_char(round(cte2.month1/cte.tot,2)*100)||'%',
to_char(round(cte2.month2/cte.tot,2)*100)||'%',
to_char(round(cte2.month3/cte.tot,2)*100)||'%';
select cte2.month, -- Distinct keyword removed
to_char(round(cte2.month1/cte.tot,2)*100)||'%' month1pct,
to_char(round(cte2.month2/cte.tot,2)*100)||'%' month2pct,
to_char(round(cte2.month3/cte.tot,2)*100)||'%' month3pct,
count(*) cohorts
from cte inner join cte2
on trunc(cte.registrationdt,'mm') = cte2.month
group by cte2.month,
to_char(round(cte2.month1/cte.tot,2)*100)||'%',
to_char(round(cte2.month2/cte.tot,2)*100)||'%',
to_char(round(cte2.month3/cte.tot,2)*100)||'%';
ASKER
Thanks again, awking00 . The result should look like this for 201701 and 201702 cohorts
and the raw numbers should look like this for the number of customers who are still active each month from the original cohort
Cohort | Count | Month1 | Month2 | Month3 | Month4 | Month5 | Month6 | Month7 | Month8 | Month9 | Month10 | Month11 |
201701 | 422 | 100% | 91% | 84% | 76% | 69% | 61% | 54% | 45% | 39% | 29% | 21% |
201702 | 342 | 100% | 94% | 88% | 80% | 74% | 65% | 57% | 49% | 40% | 31% | 23% |
and the raw numbers should look like this for the number of customers who are still active each month from the original cohort
Cohort | Count | Month1 | Month2 | Month3 | Month4 | Month5 | Month6 | Month7 | Month8 | Month9 | Month10 | Month11 |
201701 | 422 | 420 | 386 | 353 | 320 | 293 | 259 | 228 | 192 | 164 | 124 | 90 |
201702 | 342 | 342 | 322 | 301 | 274 | 252 | 224 | 196 | 166 | 138 | 105 | 78 |
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
To get headings exactly as you wanted -
to get raw numbers
to get raw numbers
with cte as
(select trunc(registrationdt,'mm') registrationdt, count(*) over (partition by trunc(registrationdt,'mm') order by 1) tot, lastactivedt,
ceil(months_between(lastactivedt, last_day(registrationdt))) monthsactive
from yourtable
where trunc(registrationdt,'mm') in (to_date('201701','YYYYMM'),to_date('201707','YYYYMM')))
select to_char(registrationdt,'YYYYMM') cohort, tot as count,
sum(case when monthsactive >= 1 then 1 else 0 end) as month1,
sum(case when monthsactive >= 2 then 1 else 0 end) as month2,
sum(case when monthsactive >= 3 then 1 else 0 end) as month3,
sum(case when monthsactive >= 4 then 1 else 0 end) as month4,
sum(case when monthsactive >= 5 then 1 else 0 end) as month5,
sum(case when monthsactive >= 6 then 1 else 0 end) as month6,
sum(case when monthsactive >= 7 then 1 else 0 end) as month7,
sum(case when monthsactive >= 8 then 1 else 0 end) as month8,
sum(case when monthsactive >= 9 then 1 else 0 end) as month9,
sum(case when monthsactive >= 10 then 1 else 0 end) as month10,
sum(case when monthsactive >= 11 then 1 else 0 end) as month11
from cte
group by to_char(registrationdt,'YYYYMM'),tot
order by to_char(registrationdt,'YYYYMM'),tot;
to get percentageswith cte as
(select trunc(registrationdt,'mm') registrationdt, lastactivedt, ceil(months_between(lastactivedt, last_day(registrationdt))) monthsactive,
count(*) over (order by 1) tot
from yourtable
where trunc(registrationdt,'mm') in (to_date('201701','YYYYMM'),to_date('201707','YYYYMM'))),
cte2 as
(select trunc(registrationdt,'mm') month,
sum(case when monthsactive >= 1 then 1 else 0 end) as month1,
sum(case when monthsactive >= 2 then 1 else 0 end) as month2,
sum(case when monthsactive >= 3 then 1 else 0 end) as month3,
sum(case when monthsactive >= 4 then 1 else 0 end) as month4,
sum(case when monthsactive >= 5 then 1 else 0 end) as month5,
sum(case when monthsactive >= 6 then 1 else 0 end) as month6,
sum(case when monthsactive >= 7 then 1 else 0 end) as month7,
sum(case when monthsactive >= 8 then 1 else 0 end) as month8,
sum(case when monthsactive >= 9 then 1 else 0 end) as month9,
sum(case when monthsactive >= 10 then 1 else 0 end) as month10,
sum(case when monthsactive >= 11 then 1 else 0 end) as month11
from cte
group by trunc(registrationdt,'mm')
order by trunc(registrationdt,'mm'))
select to_char(cte2.month,'YYYYMM') cohort,count(*) count,
to_char(round(cte2.month1/cte.tot,2)*100)||'%' month1pct,
to_char(round(cte2.month2/cte.tot,2)*100)||'%' month2pct,
to_char(round(cte2.month3/cte.tot,2)*100)||'%' month3pct,
to_char(round(cte2.month4/cte.tot,2)*100)||'%' month4pct,
to_char(round(cte2.month5/cte.tot,2)*100)||'%' month5pct,
to_char(round(cte2.month6/cte.tot,2)*100)||'%' month6pct,
to_char(round(cte2.month7/cte.tot,2)*100)||'%' month7pct,
to_char(round(cte2.month8/cte.tot,2)*100)||'%' month8pct,
to_char(round(cte2.month9/cte.tot,2)*100)||'%' month9pct,
to_char(round(cte2.month10/cte.tot,2)*100)||'%' month10pct,
to_char(round(cte2.month11/cte.tot,2)*100)||'%' month11pct
from cte inner join cte2
on trunc(cte.registrationdt,'mm') = cte2.month
group by cte2.month,
to_char(round(cte2.month1/cte.tot,2)*100)||'%',
to_char(round(cte2.month2/cte.tot,2)*100)||'%',
to_char(round(cte2.month3/cte.tot,2)*100)||'%',
to_char(round(cte2.month4/cte.tot,2)*100)||'%',
to_char(round(cte2.month5/cte.tot,2)*100)||'%',
to_char(round(cte2.month6/cte.tot,2)*100)||'%',
to_char(round(cte2.month7/cte.tot,2)*100)||'%',
to_char(round(cte2.month8/cte.tot,2)*100)||'%',
to_char(round(cte2.month9/cte.tot,2)*100)||'%',
to_char(round(cte2.month10/cte.tot,2)*100)||'%',
to_char(round(cte2.month11/cte.tot,2)*100)||'%';
ASKER
Hello awking00, Can check this part of the code for me:
count(*) over (partition by trunc(registrationdt,'mm') order by 1)
and
count(*) over (order by 1) tot
The numbers are higher than the ones displayed in the example cohort that I gave above.
Thanks for all your help
count(*) over (partition by trunc(registrationdt,'mm') order by 1)
and
count(*) over (order by 1) tot
The numbers are higher than the ones displayed in the example cohort that I gave above.
Thanks for all your help
Sorry about that. My first attempt took the position that you would only be entering one month to check. Try changing that to count(*) over (partition by trunc(registrationdt,'mm') order by 1) tot.
ASKER
Hello awking00,
Yes, I am checking one month at a time. But the numbers are still high thereby creating lower active % in outer periods.
Yes, I am checking one month at a time. But the numbers are still high thereby creating lower active % in outer periods.
I may be wrong about changing that count statement. Can you send me insert statements for the cohorts that I can use to test to see if we're coming up with the same results. I don't know why the active percentages would be lower and need to look at actual data.
I know that was a fair amount of effort, but what I would really like to see is the inserts in a text file that I could simply copy and paste in sqlplus (the code snippet shows 5:Insert ..., for example so I can't copy and paste from there). Also the inserts as you have created them show errors. There's no apostrophe after the registrationdt plus the inserts need to include the to_date function. The following format works for me:
insert into test_tab values (to_date('1/1/2017','mm/dd/yyyy'), to_date('3/22/2017','mm/dd/yyyy'));
Note - the attribute list after test_tab is optional since the table only has the two fields.
insert into test_tab values (to_date('1/1/2017','mm/dd/yyyy'), to_date('3/22/2017','mm/dd/yyyy'));
Note - the attribute list after test_tab is optional since the table only has the two fields.
ASKER
Ok, thank you. Do you still want me to send the inserts? Thanks again for your help.
Sorry I couldn't get back to you sooner. I've been out all afternoon for a medical appointment. I would still like to have those inserts to see if I get the expected results or if I also get higher numbers and, subsequently, lower percentages.
ASKER
Running the query with the data provided from your inserts returns the following results:
For the raw numbers -
For the raw numbers -
COHORT COUNT MONTH1 MONTH2 MONTH3 MONTH4 MONTH5 MONTH6 MONTH7 MONTH8 MONTH9 MONTH10 MONTH11
------ ---------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------- -------
201701 401 387 348 314 288 264 229 195 168 137 98 71
For the percentages -COHORT COUNT MONT MONT MONT MONT MONT MONT MONT MONT MONT MONT MONT
------ ---------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
201701 401 97% 87% 78% 72% 66% 57% 49% 42% 34% 24% 18%
It seems the only thing that differs from what you indicated earlier for results comes from the total number of records where you showed 422 but there were only 401 inserts.
ASKER
Yes, this is better. Can you please send me the code? WOW
Or did use the code you already posted.
Or did use the code you already posted.
ASKER
Thank you so much for all your help. I really appreciate you. Thanks.
This was an intriguing problem. Glad I could help :-)
ASKER
Indeed. Thank you.