Solved

# most valued customer ranking SQL

Posted on 2007-04-05
723 Views
I’m not a data warehouse guy and have an interesting problem that I think may be able to be answered by warehouse style SQL (or good old SQL that I can’t think of).  For obvious reasons, I’ve had to dumb it down some but I think it still makes sense.

I need to come up with top 20 ‘most valued’ customers on a weekly basis.  This ranking is based on 2 items:  amount and age of order.  One problem is that the groupings for age and amount aren’t linear.  For example:

Age ranges:
<30 days old
30-60
61-120
120-360
360-720
720-1080
> 1080

Amount ranges:
<1000
1000-5000
5001-10000
10000-50000
50000-200000
>200000

The more interesting problem is I also need to weight age and amount in some way.
Example:
orderA:  \$5000 40 days old may have a similar value of orderB: 360 days old \$20000

orderB is a lot more money but since it’s older, orderA is just as attractive.

I’ll be providing a sample dataset.  In the following, Barney is most valued since he has a recent BIG dollar order. He is followed by Mr. Slate.  Mr. Slate is next because:  Even though all his recent orders are small amounts, he has placed big orders in the past and that has some value.  Fred is last.  Even though he has decent orders and has the 2nd greatest total order amount, he is a new customer and Mr Slate is a more consistent customer over time.

Running Enterprise Edition 10.2.0.3
-------------------------------------------------------------------
drop table orders;
create table orders (
cust_id      varchar2(10),
order_date      timestamp,
order_amount      number
);

insert into orders values('Fred',systimestamp-1,20000);
insert into orders values('Fred',systimestamp-2,30000);

insert into orders values('Barney',systimestamp-3,100000);
insert into orders values('Barney',systimestamp-700,5000);

insert into orders values('Mr Slate',systimestamp-1,50);
insert into orders values('Mr Slate',systimestamp-2,50);
insert into orders values('Mr Slate',systimestamp-3,50);
insert into orders values('Mr Slate',systimestamp-4,50);
insert into orders values('Mr Slate',systimestamp-30,1000);
insert into orders values('Mr Slate',systimestamp-100,500);
insert into orders values('Mr Slate',systimestamp-100,1000);
insert into orders values('Mr Slate',systimestamp-600,3000);
insert into orders values('Mr Slate',systimestamp-700,10000);
insert into orders values('Mr Slate',systimestamp-800,50);
insert into orders values('Mr Slate',systimestamp-800,50);
insert into orders values('Mr Slate',systimestamp-800,50);
insert into orders values('Mr Slate',systimestamp-800,50);
insert into orders values('Mr Slate',systimestamp-800,50);
insert into orders values('Mr Slate',systimestamp-850,15000);
commit;

0
• 7
• 6

LVL 76

Author Comment

ID: 18861308
I always forget to add something.

I'd like to be able to see the 'rank' as part of the resultset.
0

LVL 23

Expert Comment

ID: 18862201
slightwv,

I'm trying to understand your logic. First, if I place Age and Amount side by side, it looks like a range is missing.

>> orderA:  \$5000 40 days old may have a similar value of orderB: 360 days old \$20000

Is \$20000 or \$200000?

Age ranges:            Amount ranges:
<30 days old           <1000
30-60                      1000-5000
61-120                    5001-10000
120-360                  10000-50000
360-720                  50000-200000
720-1080                >200000
> 1080

0

LVL 76

Author Comment

ID: 18863297
No range is missing.  There can be more age ranges than amount ranges and visa versa.

>>Is \$20000 or \$200000?

It really doesn't matter.  Just as long as I can set a weight that takes age and amount into account when doing the ranking.

It needs to be flexible enough that I can add/remove specific ranges (age or amount) and change weight factors.

For example, management may decide that they want to break out age 61-120 to 2 new ranges:  61-80 and 81-120 and maybe colapse amount ranges and go from 1000-10000.

They may also decide that they want to make orders in the 10000-50000 range a little more important so they should receive a higher weight.
0

LVL 76

Author Comment

ID: 18864749
Just letting the everyone know.  I'm not ignoring you.  I'm on the road for a week and won't have internet access.  Hope I have a nice solution waiting for me when I get back.
0

LVL 23

Expert Comment

ID: 18865079
slightwv,

The key here is to find a formula that can easily and precisely be changed.

Take a look at the following example (pay attention to these values LIMIT_AMT, LIMIT_AGE)

1  SELECT
2  DENSE_RANK() OVER (ORDER BY VALUE) RANKING,
3  CUST_ID,
4  ORDER_DATE,
5  ORDER_AMOUNT
6  FROM
7  (SELECT
8  MAX(ABS((ORDER_AMOUNT-LIMIT_AMT)/LIMIT_AMT)*ABS((MAX_AGE-LIMIT_AGE))) OVER (PARTITION BY CUST_ID) VALUE,
9  AGE,
10  MAX_AGE,
11  CUST_ID,
12  ORDER_DATE,
13  ORDER_AMOUNT
14  FROM
15  (SELECT
16  EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE) AGE,
17  MAX(EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE)) OVER (PARTITION BY CUST_ID) MAX_AGE,
18  90000 LIMIT_AMT,
19  720   LIMIT_AGE,
20  CUST_ID,
21  ORDER_DATE,
22  ORDER_AMOUNT
23  FROM ORDERS))
24* ORDER BY 1
PAQUI@PROD > /

RANKING CUST_ID    ORDER_DATE                     ORDER_AMOUNT
---------- ---------- ------------------------------ ------------
1 Barney     03-APR-07 09.40.03.000000 AM         100000
1 Barney     06-MAY-05 09.40.03.000000 AM           5000
2 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
2 Mr Slate   07-MAR-07 09.40.03.000000 AM           1000
2 Mr Slate   06-MAY-05 09.40.03.000000 AM          10000
2 Mr Slate   27-DEC-06 09.40.03.000000 AM            500
2 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
2 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
2 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
2 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
2 Mr Slate   02-APR-07 09.40.03.000000 AM             50
2 Mr Slate   03-APR-07 09.40.03.000000 AM             50
2 Mr Slate   04-APR-07 09.40.03.000000 AM             50
2 Mr Slate   05-APR-07 09.40.03.000000 AM             50
2 Mr Slate   14-AUG-05 09.40.03.000000 AM           3000
2 Mr Slate   27-DEC-06 09.40.03.000000 AM           1000
2 Mr Slate   07-DEC-04 09.40.03.000000 AM          15000
3 Fred       04-APR-07 09.40.03.000000 AM          30000
3 Fred       05-APR-07 09.40.03.000000 AM          20000

19 rows selected.

Elapsed: 00:00:00.01
PAQUI@PROD > EDIT
Wrote file afiedt.buf

1  SELECT
2  DENSE_RANK() OVER (ORDER BY VALUE) RANKING,
3  CUST_ID,
4  ORDER_DATE,
5  ORDER_AMOUNT
6  FROM
7  (SELECT
8  MAX(ABS((ORDER_AMOUNT-LIMIT_AMT)/LIMIT_AMT)*ABS((MAX_AGE-LIMIT_AGE))) OVER (PARTITION BY CUST_ID) VALUE,
9  AGE,
10  MAX_AGE,
11  CUST_ID,
12  ORDER_DATE,
13  ORDER_AMOUNT
14  FROM
15  (SELECT
16  EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE) AGE,
17  MAX(EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE)) OVER (PARTITION BY CUST_ID) MAX_AGE,
18  90000 LIMIT_AMT,
19  2   LIMIT_AGE,
20  CUST_ID,
21  ORDER_DATE,
22  ORDER_AMOUNT
23  FROM ORDERS))
24* ORDER BY 1
PAQUI@PROD > /

RANKING CUST_ID    ORDER_DATE                     ORDER_AMOUNT
---------- ---------- ------------------------------ ------------
1 Fred       04-APR-07 09.40.03.000000 AM          30000
1 Fred       05-APR-07 09.40.03.000000 AM          20000
2 Barney     06-MAY-05 09.40.03.000000 AM           5000
2 Barney     03-APR-07 09.40.03.000000 AM         100000
3 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
3 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
3 Mr Slate   02-APR-07 09.40.03.000000 AM             50
3 Mr Slate   03-APR-07 09.40.03.000000 AM             50
3 Mr Slate   04-APR-07 09.40.03.000000 AM             50
3 Mr Slate   05-APR-07 09.40.03.000000 AM             50
3 Mr Slate   14-AUG-05 09.40.03.000000 AM           3000
3 Mr Slate   27-DEC-06 09.40.03.000000 AM           1000
3 Mr Slate   07-DEC-04 09.40.03.000000 AM          15000
3 Mr Slate   07-MAR-07 09.40.03.000000 AM           1000
3 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
3 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
3 Mr Slate   27-DEC-06 09.40.03.000000 AM            500
3 Mr Slate   06-MAY-05 09.40.03.000000 AM          10000
3 Mr Slate   26-JAN-05 09.40.03.000000 AM             50

19 rows selected.

Elapsed: 00:00:00.01
PAQUI@PROD > EDIT
Wrote file afiedt.buf

1  SELECT
2  DENSE_RANK() OVER (ORDER BY VALUE) RANKING,
3  CUST_ID,
4  ORDER_DATE,
5  ORDER_AMOUNT
6  FROM
7  (SELECT
8  MAX(ABS((ORDER_AMOUNT-LIMIT_AMT)/LIMIT_AMT)*ABS((MAX_AGE-LIMIT_AGE))) OVER (PARTITION BY CUST_ID) VALUE,
9  AGE,
10  MAX_AGE,
11  CUST_ID,
12  ORDER_DATE,
13  ORDER_AMOUNT
14  FROM
15  (SELECT
16  EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE) AGE,
17  MAX(EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE)) OVER (PARTITION BY CUST_ID) MAX_AGE,
18  1000 LIMIT_AMT,
19  2   LIMIT_AGE,
20  CUST_ID,
21  ORDER_DATE,
22  ORDER_AMOUNT
23  FROM ORDERS))
24* ORDER BY 1
PAQUI@PROD > /

RANKING CUST_ID    ORDER_DATE                     ORDER_AMOUNT
---------- ---------- ------------------------------ ------------
1 Fred       04-APR-07 09.40.03.000000 AM          30000
1 Fred       05-APR-07 09.40.03.000000 AM          20000
2 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
2 Mr Slate   07-MAR-07 09.40.03.000000 AM           1000
2 Mr Slate   06-MAY-05 09.40.03.000000 AM          10000
2 Mr Slate   27-DEC-06 09.40.03.000000 AM            500
2 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
2 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
2 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
2 Mr Slate   26-JAN-05 09.40.03.000000 AM             50
2 Mr Slate   02-APR-07 09.40.03.000000 AM             50
2 Mr Slate   03-APR-07 09.40.03.000000 AM             50
2 Mr Slate   04-APR-07 09.40.03.000000 AM             50
2 Mr Slate   05-APR-07 09.40.03.000000 AM             50
2 Mr Slate   14-AUG-05 09.40.03.000000 AM           3000
2 Mr Slate   27-DEC-06 09.40.03.000000 AM           1000
2 Mr Slate   07-DEC-04 09.40.03.000000 AM          15000
3 Barney     03-APR-07 09.40.03.000000 AM         100000
3 Barney     06-MAY-05 09.40.03.000000 AM           5000

19 rows selected.

Elapsed: 00:00:00.01

This may not work as you expect, but at least is something to start with. Changing the limits will change the rankings.

SELECT
DENSE_RANK() OVER (ORDER BY VALUE) RANKING,
CUST_ID,
ORDER_DATE,
ORDER_AMOUNT
FROM
(SELECT
MAX(ABS((ORDER_AMOUNT-LIMIT_AMT)/LIMIT_AMT)*ABS((MAX_AGE-LIMIT_AGE))) OVER (PARTITION BY CUST_ID) VALUE,
AGE,
MAX_AGE,
CUST_ID,
ORDER_DATE,
ORDER_AMOUNT
FROM
(SELECT
EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE) AGE,
MAX(EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE)) OVER (PARTITION BY CUST_ID) MAX_AGE,
1000 LIMIT_AMT,
2   LIMIT_AGE,
CUST_ID,
ORDER_DATE,
ORDER_AMOUNT
FROM ORDERS))
ORDER BY 1
0

LVL 23

Expert Comment

ID: 18866346
Here is some kind of revised version:

SELECT
DENSE_RANK() OVER (ORDER BY VALUE) RANKING,
CUST_ID,
VALUE,
AGE,
MAX_AGE,
ORDER_DATE,
ORDER_AMOUNT
FROM
(SELECT
MIN((ABS((ORDER_AMOUNT-LIMIT_AMT)/LIMIT_AMT)+1)
*(ABS((MAX_AGE-LIMIT_AGE)/LIMIT_AGE)+1)) OVER (PARTITION BY CUST_ID) VALUE,
AGE,
MAX_AGE,
CUST_ID,
ORDER_DATE,
ORDER_AMOUNT
FROM
(SELECT
EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE) AGE,
MAX(EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE)) OVER (PARTITION BY CUST_ID) MAX_AGE,
1000 LIMIT_AMT,
600   LIMIT_AGE,
CUST_ID,
ORDER_DATE,
ORDER_AMOUNT
FROM ORDERS))
ORDER BY 1
0

LVL 23

Expert Comment

ID: 18868372
Hey buddy,
I'm gonna be out of town until next Thursday, I'll be driving to Miami, FL and back to Virgnia. So I may not be able to see any comments or new posting until then.

Good luck!
0

LVL 76

Author Comment

ID: 18934496
Virginia to Miami and back?  Hopefully you get a couple days rest in between....

Just took a look at your proposed suggestions and not sure they will work as-is and I'm still getting over the trip.

I'd like the ability to assign a different weight to each of the ranges and not treat them equally.  For example:  the first 2 age ranges may be twice as important as the last age range or each age range may have a unique weight.

I'm thinking I might be able to add a CASE statement to assign the different weights but I'm not 100% sure how to add it to your SQL.
0

LVL 23

Accepted Solution

paquicuba earned 500 total points
ID: 18939354
I'm taking two weeks off in June  :)

Look at this:

SELECT
DENSE_RANK() OVER (ORDER BY ABS(AVG_AMT-AVG_AGE)) "RANK",
ORDER_AMOUNT, AGE,
CUST_ID, ORDER_DATE
FROM
(SELECT ORDER_AMOUNT, AGE,
AVG(AMT_RANK) OVER (PARTITION BY CUST_ID) AVG_AMT,
AVG(AGE_RANK) OVER (PARTITION BY CUST_ID) AVG_AGE,
CUST_ID, ORDER_DATE
FROM
(SELECT
ORDER_AMOUNT,
CASE
WHEN ORDER_AMOUNT < 1000 THEN 1
WHEN ORDER_AMOUNT BETWEEN 1000 AND 5000 THEN 2
WHEN ORDER_AMOUNT BETWEEN 5001 AND 10000 THEN 3
WHEN ORDER_AMOUNT BETWEEN 10001 AND 50000 THEN 4
WHEN ORDER_AMOUNT BETWEEN 50001 AND 200000 THEN 5
WHEN ORDER_AMOUNT > 200000 THEN 6
END AMT_RANK,
AGE,
CASE
WHEN AGE < 30 THEN 1
WHEN AGE BETWEEN 30 AND 60 THEN 2
WHEN AGE BETWEEN 61 AND 120 THEN 3
WHEN AGE BETWEEN 121 AND 360 THEN 4
WHEN AGE BETWEEN 361 AND 720 THEN 5
WHEN AGE BETWEEN 721 AND 1080 THEN 6
WHEN AGE > 1080 THEN 7
END AGE_RANK ,
CUST_ID, ORDER_DATE
FROM
(SELECT
ORDER_AMOUNT,
EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE) AGE,
MAX(EXTRACT(DAY FROM SYSTIMESTAMP - ORDER_DATE)) OVER (PARTITION BY CUST_ID) MAX_AGE,
CUST_ID,
ORDER_DATE
FROM ORDERS)))

Your query is a tough one. You really need to play with some scenarios, like the one above where I used averages, see the result:

RANK ORDER_AMOUNT        AGE CUST_ID    ORDER_DATE
---------- ------------ ---------- ---------- ----------
1       100000         16 Barney     03-APR-07
1         5000        713 Barney     06-MAY-05
2           50        813 Mr Slate   26-JAN-05
2         1000         43 Mr Slate   07-MAR-07
2        10000        713 Mr Slate   06-MAY-05
2          500        113 Mr Slate   27-DEC-06
2           50        813 Mr Slate   26-JAN-05
2           50        813 Mr Slate   26-JAN-05
2           50        813 Mr Slate   26-JAN-05
2           50        813 Mr Slate   26-JAN-05
2           50         17 Mr Slate   02-APR-07
2           50         16 Mr Slate   03-APR-07
2           50         15 Mr Slate   04-APR-07
2           50         14 Mr Slate   05-APR-07
2         3000        613 Mr Slate   14-AUG-05
2         1000        113 Mr Slate   27-DEC-06
2        15000        863 Mr Slate   07-DEC-04
3        30000         15 Fred       04-APR-07
3        20000         14 Fred       05-APR-07

Once you're back, we can try modifying and playing with different logics...

Take care!
0

LVL 76

Author Comment

ID: 18941615
Thanks for all the help.  It is getting VERY close to what I already had.  I just figured that I may be missing some neat Warehouse trick using cubes and/or rollups.

If the 2 of us can't come up with it then:
1: We're both missing the obvious.
or
2: It can't be done all that easily.

Either way, I'll go ahead and close this out.
0

LVL 23

Expert Comment

ID: 18942071
Thanks slightwv!

Another cool feature of 10g is the MODEL clause where you can pretty much manipulate your data in the same fashion you work with Excel, but with a bunch of extra functions.

With the MODEL clause you can set up rules and even forecast unexisting values.

Here is an example:
http://www.experts-exchange.com/Database/Oracle/Q_22109698.html?sfQueryTermInfo=1+iter+join+model+refer
0

LVL 23

Expert Comment

ID: 18942108
>> It can't be done all that easily
I agree on this one

>> neat Warehouse trick using cubes and/or rollups
I use cube, rollup, grouping, grouping sets, grouping_id and group_id() a lot, and I don't think they're going to help in your case.

I'll go with the ANALYTICAL functions or MODEL clause.
0

LVL 76

Author Comment

ID: 18946265
Thanks.  I'll go read up on MODEL.
0

## Featured Post

### Suggested Solutions

Why doesn't the Oracle optimizer use my index? Querying too much data Most Oracle developers know that an index is useful when you can use it to restrict your result set to a small number of the total rows in a table. So, the obvious side…
Working with Network Access Control Lists in Oracle 11g (part 1) Part 2: http://www.e-e.com/A_9074.html So, you upgraded to a shiny new 11g database and all of a sudden every program that used UTL_MAIL, UTL_SMTP, UTL_TCP, UTL_HTTP or any oth…
This video explains at a high level about the four available data types in Oracle and how dates can be manipulated by the user to get data into and out of the database.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function