Link to home
Start Free TrialLog in
Avatar of TechNovation
TechNovationFlag for Netherlands

asked on

Pareto Analysis in SQL Server 2005

Hello,

How do I make Pareto Analysis in SQL server, say I have a table structure like:

Customer       Revenu
a                         4
b                         5
a                         6
a                         3
f                         6
b                         4
c                         4
a                         4
r                         4


After ading up (grouping by customer) The idea is to find out cumulatively which customers once added up are making 80% of the revenu, which needs to be later presented in a chart?

Output looks like:

Customer      Revenue     Accumulated Revenue      % (acc. / Total)
a                        16                      16                                    20
b                          4                      20                                   35
...                         ...                      ...                                   ...
f                         6                     56                                   81
...                          ...                   ...                                  ...              
Avatar of 8080_Diver
8080_Diver
Flag of United States of America image

!^%%$^@!  I hit the tab button a minute ago (out of habit) and then tried to back-tab and lost my response!  SHAZBAT!
Okay, re-typing now. ;-)
SQL_1 below gives the totals by Customer.  So, I'd use that in a Stored Procedure (see SQL_2) which I think kshould provide you with what you want.  

SQL_1:
 
SELECT Customer,
       SUM(Revenu)
FROM   YourTable
GROUP BY Customer;
 
SQL2:
 
CREATE PRCOEDURE usp_Pareto_Analysis 
   @Pct_Limit   AS   INT
AS
BEGIN
   
   SET NOCOUNT ON;
 
   DECLARE @TempTtls (
                       Customer AS varchar(XX),
                       LastName AS (appropriate datatype
                     );
   
   INSERT INTO @TempTtls 
   SELECT Customer,
          SUM(Revenu) AS Revenu
   FROM   YourTable
   GROUP BY Customer;
   
   SELECT Customer, 
          Revenu,
          AccumulatedRevenu,
          (Revenu / AccumulatedRevenu) AS PctgAccumulatedRevenu
   FROM
   (
    SELECT SUM(Revenu) AS TtlAccumRevenu
    FROM   @TempTtl
   )Y,
   ( 
    SELECT T.Customer,
           T.Revenu, 
           (
            SELECT SUM(Revenu) 
            FROM   @TempTtl
            WHERE Customer <= T.Customer 
           ) AS AccumulatedRevenu
    FROM   @TempTtl T
   ) Z
   ORDER BY Customer
   WHERE ((Revenu / AccumulatedRevenu) < @Pct_Limit);
    
END

Open in new window

Avatar of Mark Wills
might like to have a look at (starts with a test / dummy table representing the data source - it is NOT part of the solution - the view and the CTE query is)...



-- first up create a table with some test data...
 
select * into my_revenu_table from ( 
select 'a' as customer ,  4 as revenu union all
select 'b' as customer ,  5 as revenu union all
select 'a' as customer ,  6 as revenu union all
select 'a' as customer ,  3 as revenu union all
select 'f' as customer ,  6 as revenu union all
select 'b' as customer ,  4 as revenu union all
select 'c' as customer ,  4 as revenu union all
select 'a' as customer ,  4 as revenu union all
select 'r' as customer ,  4 as revenu 
) src
 
 
-- you might already have an identity or something like it, so might change the approach (ie might not need the view)
-- but in the abscense of more information....
 
-- now, because we want some kind of order, decided to create a view over that data so we can get a row_number or ordering
-- we can also do the aggregation by customer that way
 
create view vw_my_revenu_view as
select convert(int, (row_number() over (order by revenu desc, customer))) as rn, customer, revenu
from   (select customer, sum(revenu) as revenu from my_revenu_table group by customer) a
go
 
-- now we have a data source, with order, we can use that to create a recursive query
 
;with cte_revenu as
( select convert(int,0) as rn, '' as customer, convert(int,0) as revenu, convert(int,0) as accumulated_revenu, (select sum(revenu) from my_revenu_table) as total_revenu 
  union all
  select c.rn + 1, r.customer, r.revenu, r.revenu + c.accumulated_revenu, c.total_revenu
  from cte_revenu c
  inner join vw_my_revenu_view r on r.rn - 1 = c.rn
) 
select *, (accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution from cte_revenu
go

Open in new window

Also, I explicitly convert to INT and it is more likely a money or decimal value, but hopefully you get the idea...
Avatar of TechNovation

ASKER

8080 DIVER:

 I tried your code butit returns errors:
Affected code is:
  DECLARE @TempTtls (
                       Customer AS (nvarchar(35),null),
                       Revenu AS (decimal (29,2),null)
                     )
and related error is:
Incorrect Syntax near '('

And since the table variable isn't declared as consequence of error, the rest isn't working.
any idea?
Thanks


MARK WILLS:
This query will be run several times to populate a table, by creating a view, wouldn't the next query return an error as 'view already exists'. Do I really need to create a view? is there another way?

appreciate your help

Thanks
Only create the view once - it is a once-off step and you can then re-use it as much as you like. No need to create everytime.

And, if you already have an indentity, or time, or anything representing sequentially incrementing rows, you might not need the view. But with the view we have taken full control of the information we want to play with. The view does not contain data itself, it points to the datatables and is resolved at runtime / at time of use...

The advantage is you have a single CTE query (starts with the ";with" and ends with the "select ... from cte_revenu"  - yep, just a "single" command)

 DECLARE @TempTtls (
                       Customer AS (nvarchar(35),null),
                       Revenu AS (decimal (29,2),null)
                     )

Oops, forgot a piece.  Add the word TABLE as shown in SQL_1 below.
The SP can be used to populate a table by simply adding INSERT INTO thetablename at the top of the last SELECT statement.  If you are repopulating the table, (e.g. did this yesterday but want to get new results today), you can remove the appropriate rows from the table or update them.  (However, if you add an Datetime column that you feed the current date to, you can build up a historical set of data for trend analysis. ;-)
I have attached a revised version of the SP (commenting out the pieces needed for the historical/trending analysis table ;-):

SQL_1:
 
SELECT Customer,
       SUM(Revenu)
FROM   YourTable
GROUP BY Customer;
 
SQL2:
 
CREATE PRCOEDURE usp_Pareto_Analysis 
   @Pct_Limit   AS   INT
AS
BEGIN
   
   SET NOCOUNT ON;
 
--   DECLARE @Today DATETIME;
--
--   SET @Today = CAST(CONVERT(VARCHAR(10), GETDATE(), 101) AS DATETIME);
 
   DECLARE @TempTtls TABLE
           (
            Customer AS varchar(XX),
            LastName AS (appropriate datatype
           );
   
   INSERT INTO @TempTtls 
   SELECT Customer,
          SUM(Revenu) AS Revenu
   FROM   YourTable
   GROUP BY Customer;
 
		SELECT   
 
--   INSERT INTO YourHistoryTable
   SELECT --Today , 
          Customer, 
          Revenu,
          AccumulatedRevenu,
          (Revenu / AccumulatedRevenu) AS PctgAccumulatedRevenu
   FROM
   (
    SELECT SUM(Revenu) AS TtlAccumRevenu
    FROM   @TempTtl
   )Y,
   ( 
    SELECT T.Customer,
           T.Revenu, 
           (
            SELECT SUM(Revenu) 
            FROM   @TempTtl
            WHERE Customer <= T.Customer 
           ) AS AccumulatedRevenu
    FROM   @TempTtl T
   ) Z
   ORDER BY Customer
   WHERE ((Revenu / AccumulatedRevenu) < @Pct_Limit);
    
END

Open in new window

Hi TechNovation,
Was extremely busy these last couple days and just say this question.
The ultimate query would look something like snippet and produces this output.  I attached the script to set up and run this also.
Love these types of questions!
;With
tblItemsWithRow
as
( SELECT Row_Number() Over(Order by Customer) as RowNum, Customer, SUM(Revenue) Revenue
FROM PareTo PT
GROUP BY Customer
)
--Final Query to return the running total and pcts
Select Customer,Revenue,
( Select Sum(Revenue) from tblItemsWithRow m where m.RowNum<= s.RowNum) as [Accumulated Revenue],
(( Select Sum(CONVERT(FLOAT,Revenue)) from tblItemsWithRow m where m.RowNum<= s.RowNum)/SUM(Revenue) OVER ())*100.0  as [Acc Rev %]
from tblItemsWithRow s
ORDER BY Customer

Open in new window

PareToOutput.png
PareTo.txt
Actually the revenu column should be sorted descending before accumulating fo f should come before c in the order of row for customers,
this is because the purpose is to know which customers once aggregated make up 80% of the revenu.

how would the code look like?
Line 4 in CGLuttrell's code should be similar to the row_function() previously posted...  Just include REVENU DESC,  after the order by and before Customer...
sorry, that is SUM(REVENU)
CGluttrell, are you interested in discussing the differences in our CTE's ?
The query (of CGluttrell) has been running now for more than 11 hours, is it more the server that might be busy or the query is 'demanding'?
... and the table I'm testing in has only 20 million records, does is mean if I move to real capacity of hundreds of millions of records I will have to take something into account?
The query is a little demanding - it is doing an inline query back on the CTE query which does take some effort. 11 hours is a long time though...
wow,yeah we need to rethink this!
That many rows, this is where Analysis Server cubes would be real handy, but we aint got them so we will make do.
mark, sorry I have been away from the computer, but yes we can compare the cte statements if we need to, but we might need to look in another direction to avoid 11+ hours, I am thinking a aggregate temp table maybe like you were trying to do with the view?
guys, I've gotta go mow some grass so will be away for a bit again, sorry.
Did you have a quick look at creating the view ? that step (ie create the view) takes no time, then do a select * from the_view  can always can it and drop the view. The difference is that the view is doing a lot of the work and the CTE then just does the running total...
@CGluttrell, yep, think the view might be worth a shot. the aggregation is done there and the sort with row number so that the CTE "just" has to do the running total and calcs. Think it is worth trying to do a select * from the_view and see what kind of response. It wont be too dissimilar (in fact should be quicker) if there is a temp table.
Mark I will have a go at your query as well with the view, but as it is quite complicated for me I could replace column names in the right places: the first header isn't customer but [customer name] and the second column is [Amount in local Currency], if you could please give me the query with these headings Iwill be gratefull. In the meantime CGLuttrell's query is still running.
OK, just the view first. A "once-off" step to create the view, and once created can then select from it... Now, all we really have to do is change the subquery, because we can use column aliases to give them new names. In our case we will use the real names as part of the subquery. You will have to change the table name in that same subquery from "my_revenu_table"  to whatever it really is. So, should look something like :

create view vw_my_revenu_view
as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from my_revenu_table group by [customer name]) a
go
 
Thats done. That easy. Do not have to worry about creating the view again (might need to change / fix) and so it sits there waiting, wanting to be used, and not costing anything  (the luxury of being a view).

But that will change when we do go to use it. The subquery does the aggregations first, Then it will sort and calculcate a row_number(). So lets give it a try :

select * from vw_my_revenu_view

See how that goes, and we can move on from there.
Select * run ok, I get 3 columns with revenu descending. But when I run the rest I get error:

Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "customer" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
create view Customer_Invoices_view 
as
select 
   convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, 
   customer, 
   revenu
from   
   (
    select 
      [customer name] as customer, 
      sum([Amount in local currency]) as revenu 
    from 
      dbo.SourceTable 
    group by 
      [customer name]
    ) a
go
 
----
select * 
from 
   Customer_Invoices_view
---- this select * returns 3 columns: rn,customer, and revenu descending
 
;with cte_revenu as
( select 
    convert(int,0) as rn, 
    '' as customer, 
    convert(int,0) as revenu, 
    convert(int,0) as accumulated_revenu, 
    (select sum([Amount in local currency]) from dbo.SourceTable) as total_revenu 
  union all
  select 
    c.rn + 1, 
    r.customer, 
    r.revenu, 
    r.revenu + c.accumulated_revenu, 
    c.total_revenu
  from 
    cte_revenu c
  inner join 
    Customer_Invoices_view r 
  on r.rn - 1 = c.rn
) 
select *, 
	(accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution 
from 
	cte_revenu
go

Open in new window

No, not running the CTE yet :)  Just getting the performance of the view and if it is reasonable enough.

But what you are now observing is one nuance of the CTE. The data types must be identical (including scope and scale, not just type - e.g. varchar(1) will error with a base of varchar(10)).

You will note that I did have a series of converts in the CTE. That was mainly due to using numbers as INT so had to explicitly beat it into submission. Those INT of course is not what you are using in your real table.

So, what datatypes and size is [Customer Name] and [Amount in local currency]  (we already know that rn is bigint, not int) ?

[Customer name] (nvarchar(35),null)
[Amount in local Currency] (decimal (29,2),null)
OK then let's try :


;with cte_revenu as
( select                                                                                    -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
    convert(bigint,0) as rn,
    convert(nvarchar(35),'') as customer,
    convert(decimal(29,2),0) as revenu,
    convert(decimal(29,2),0) as accumulated_revenu,
    convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu           -- could maybe use the view here
  union all
  select
    c.rn + 1,
    r.customer,
    r.revenu,
    r.revenu + c.accumulated_revenu,
    c.total_revenu
  from
    cte_revenu c
  inner join
    Customer_Invoices_view r
  on r.rn - 1 = c.rn
)
select *,
      (accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution                  -- could possibly include that in the body above.
from
      cte_revenu

go
yes, you have to convert those to the exact sizes, odd, I just ran into that also.
I just got it to come back for my test data set of 1M records but the next problem is we will run into max recurrsion errors so it stopped at 100 records and took 4:46 to run
Msg 156, Level 15, State 1, Line 7
Incorrect syntax near the keyword 'as'.
--- error related to:
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu          

Msg 102, Level 15, State 1, Line 20
Incorrect syntax near ')'.
--- error related to:
  on r.rn - 1 = c.rn
)
select *,
I did not have to put a convert on that value probably because it is calculated off a real column and gets the proper type from there.
if you want to let it recurse forever add OPTION (MaxRecursion 0) at the end
When I take away the convert from:
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu

I get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".

When I put the convert back, I get:
Msg 156, Level 15, State 1, Line 7
Incorrect syntax near the keyword 'as'.
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near ')'.


So I'm stuck in between
oh, with it in there you are missing a closing ) before the AS
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu
added the closing )
get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
Sorry guys, had some visitors - it is Mother's day today in downtown Australia...

Let me type it up for you.

We can use the real columns to get the format of the anchor but best if we simply treat the anchor almost like a table definition.

But let me type it all up for you... Back in a second - will get rid of the riff raff...

OK, they've gone...

2 things.

1) we will alter the view a bit
2) we will be more specific about datatypes in the CTE
3) also decided to do the percent calc in the recursive part of the query (and can be used in the select bit down the very bottom)


I am here and waiting now, so can give you some undivided attention.


ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable group by [customer name]) a
go
 
 
 
;with cte_revenu as
( select                                                                                    -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
    convert(bigint,0) as rn, 
    convert(nvarchar(50),'') as customer, 
    convert(decimal(29,2),0) as revenu, 
    convert(decimal(29,2),0) as accumulated_revenu, 
    convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu,
    convert(decimal(18,3),0) as percent_contribution           
  union all
  select 
    c.rn + 1, 
    r.customer, 
    r.revenu, 
    convert(decimal(29,2),(r.revenu + c.accumulated_revenu)),
    c.total_revenu,
    convert(decimal(18,3),((r.revenu + c.accumulated_revenu) / total_revenu) * 100 )           -- could maybe use the view here
 
  from 
    cte_revenu c
  inner join 
    Customer_Invoices_view r 
  on r.rn - 1 = c.rn
) 
select * 
from cte_revenu
where rn > 0
 
go

Open in new window

@CGluttrell, you are correct, one BIG advantage of establising the initial data set from the actual table. Using data that is not explicitly typed, or conforms to a standard expression in a CTE is not handled particularly well. That can also include computed columns (like revenu + accumulated_revenu). Do not understand why it is so very very particular, but it is. Just on calculated columns, including aggregated functions, values are returned in the most precise expression data type - as determined by SQL. For that reason, we are beginning to see a re-emergance of the money datatype which has implicit scale for decimal operations in places where the value is not a monetry value. Ah well... lifes little mysteries.
Still waiting... Just letting you know...
Thanks,I run the code, the first part is ok. the second returned error:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "customer" of recursive query "cte_revenu".
Wow, I did not get that one...

And can see why - mine was done with nvarchar(50) and I think it is nvarchar(35)...

I really do apologise for this thread dragging on, but think we are so very close now. So long as that view is returning results fairly quickly the rest should be OK.


ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable group by [customer name]) a
go
 
 
 
;with cte_revenu as
( select                                                                                    -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
    convert(bigint,0) as rn, 
    convert(nvarchar(35),'') as customer, 
    convert(decimal(29,2),0) as revenu, 
    convert(decimal(29,2),0) as accumulated_revenu, 
    convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu,
    convert(decimal(18,3),0) as percent_contribution           
  union all
  select 
    c.rn + 1, 
    r.customer, 
    r.revenu, 
    convert(decimal(29,2),(r.revenu + c.accumulated_revenu)),
    c.total_revenu,
    convert(decimal(18,3),((r.revenu + c.accumulated_revenu) / total_revenu) * 100 )           -- could maybe use the view here
 
  from 
    cte_revenu c
  inner join 
    Customer_Invoices_view r 
  on r.rn - 1 = c.rn
) 
select * 
from cte_revenu
where rn > 0
 
go

Open in new window

query running which I think is a good sign, I'm awaiting some results to appear
If you want to limit the amount of test data, can add into the view a few where clauses, or use the "top" qualifier e.g.

ALTER VIEW ...  AS
SELECT TOP 2000 ...

Another small advantage of using views...
good point i'm working on the top 2000, still running
Well, dont forget that it is aggregating 2000 customers...

How long does the query take to complete ?
didn't finish yet, I cancelled and changed to top 20 as I'm desparate to see some results. Actually the server is overloaded with other venry demanding queries from colleagues, so I do not worry about how much time it takes, I just want to see "something"
I checked with:
select count(*) from customer_invoices_view

and there is indeed only 20 rows now but waiting for the result of the real query
I don't know what's going on, I run other queries and they return some results. CGluttrell's query is still running now for 1 day and 8,5 hours.

The query with the view (with top 20) is now running for 45 minutes. Well when I get something back I will let you know
Must be something going on - can you see any lock contentions ?  (SSMS, management, activity monitor)

When you say "the query with the view" is that the CTE query ? or the select * from customer_invoices_view ?

You will need to kill those queries. We can get the results otherways.

If it is the CTE query that is taking all the time, and the select * from view works quickly, then try this:

select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c

if that is OK, (ie returns a result fairly quickly) then try this:

select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
      from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT


Now, you can also change the view to include a nolock option which might also help...


ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
the first:
select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c

took 32 minutes.

I will run the second now:
select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
      from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT
32 minutes - ouch... will wait for next one, expect not much more...
OK 22 minutes but as is the case for the first part, it returns the accumulated revenu not descending (actually in no apparent order), and not adding up to the total revenu.

The percent contribution is not adding up to 100%

having the top nnn in the view will stuff it up a bit - it is more about getting something out and seeing how long.

you know the interesting thing is the the most efficient query takes longer and so far, the more innefficient query has worked the fastest.

think it is time to remove the top nnn statement from the view and run it again. do you know what percent the top nnn is of the total number of customers ?

ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
Avatar of GregTSmith
GregTSmith

An alternative to using recursion:
--------------- START SAMPLE DATA SETUP --------------- 
DECLARE @sample TABLE (customer char(1), revenue money) 
 
INSERT @sample 
  (customer, revenue) 
SELECT 'a' [Customer], 4 [Revenue] 
UNION ALL SELECT 'b' [Customer], 5 [Revenue] 
UNION ALL SELECT 'a' [Customer], 6 [Revenue] 
UNION ALL SELECT 'a' [Customer], 3 [Revenue] 
UNION ALL SELECT 'f' [Customer], 6 [Revenue] 
UNION ALL SELECT 'b' [Customer], 4 [Revenue] 
UNION ALL SELECT 'c' [Customer], 4 [Revenue] 
UNION ALL SELECT 'a' [Customer], 4 [Revenue] 
UNION ALL SELECT 'r' [Customer], 4 [Revenue] 
--------------- FINISH SAMPLE DATA SETUP --------------
 
DECLARE @customers TABLE (
  priority int PRIMARY KEY NOT NULL, 
  customer char(1) NOT NULL, 
  revenue money NOT NULL, 
  accumulatedRevenue money NOT NULL DEFAULT(0) 
) 
 
-- aggregate to customer level 
INSERT @customers 
  (priority, customer, revenue) 
SELECT 
  ROW_NUMBER() OVER (ORDER BY X.revenue DESC, X.customer) [priority], 
  X.customer, 
  X.revenue 
FROM 
  (
    SELECT 
      customer, 
      SUM(revenue) [revenue] 
    FROM 
      @sample 
    GROUP BY 
      customer 
  ) X 
 
 
-- find total revenue
DECLARE @totalRevenue money 
SET @totalRevenue = (SELECT SUM(revenue) FROM @customers) 
 
 
-- calculate accumulated revenue based on priority
UPDATE 
  Y 
SET 
  accumulatedRevenue = X.accumulatedRevenue 
FROM 
  @customers Y 
  INNER JOIN (
    SELECT 
      A.priority, 
      SUM(B.revenue) [accumulatedRevenue] 
    FROM 
      @customers A 
      INNER JOIN @customers B ON B.priority <= A.priority 
    GROUP BY 
      A.priority 
  ) X ON X.priority = Y.priority 
 
 
-- get final results 
SELECT 
  customer, 
  revenue, 
  revenue / @totalRevenue [contribution] 
FROM 
  @customers 
WHERE 
  accumulatedRevenue <= (@totalRevenue * 0.8) 
ORDER BY 
  revenue DESC, 
  customer 

Open in new window

Mark,
after 7 hours of running, your query returned the error:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

GrepT Smith,
thanks I will try your query: could you please change it to reflect the real filed names:
[Customer name] (nvarchar(35),null)
[Amount in local Currency] (decimal (29,2),null)
Thank you

That's interesting... 7 hours ? Max recursion - that means you are running that CTE ?

I thought we were just selecting from that view for the moment, trying to get the correct rows...

ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go


--then

select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
         from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT


Have we tried that yet ?
ASKER CERTIFIED SOLUTION
Avatar of Mark Wills
Mark Wills
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Mark, you last code simply WORKS, I get exactly what I'm looking for, the only thing is that I do not understand why we are still using the view "customer_invoices_view" in the code. Do we need it? meaning do I have the future to
FIRST use the code for the view (by the way which one? there are many) and
THEN create the temporary table and
THEN declare the stored procedure?

If you were to summaries the whole code from the start to finish what would it look like? or is it simply the one above? Do I need to have both a view and a temp table?

Thanks
Thank you
Ok ignore my question about the view, as i just replaced it with the temp table and it worked as well. So it should be fine. MANY THANKS
Glad you found that, and you are correct, it is not needed. Just the temp table, which also means just that snippet.

Sure glad we got to the bottom of this and still a little confused as to why the CTE didn't happen... Ah well, we now have an answer, and very happy to have been of help...
congratulations mark, glad you were able to put it together.  I was thinking a temp table might be what we would have to do back in ID: 24345761 but never got to finish putting it together, got too busy.  The reason just the simple CTE and the solution Greg offered up with the table variable would have run into issues also it the amount of memory required to aggregate the millions of rows and both of those solutions were memory bound while the #tmpTable wrote the intermediate data to disk instead.  I believe we all learned something here, I know I did.
Thanks guys, till next time.