Solved

Pareto Analysis in SQL Server 2005

Posted on 2009-05-07
58
1,779 Views
Last Modified: 2012-06-27
Hello,

How do I make Pareto Analysis in SQL server, say I have a table structure like:

Customer       Revenu
a                         4
b                         5
a                         6
a                         3
f                         6
b                         4
c                         4
a                         4
r                         4


After ading up (grouping by customer) The idea is to find out cumulatively which customers once added up are making 80% of the revenu, which needs to be later presented in a chart?

Output looks like:

Customer      Revenue     Accumulated Revenue      % (acc. / Total)
a                        16                      16                                    20
b                          4                      20                                   35
...                         ...                      ...                                   ...
f                         6                     56                                   81
...                          ...                   ...                                  ...              
0
Comment
Question by:TechNovation
  • 24
  • 23
  • 8
  • +2
58 Comments
 
LVL 22

Expert Comment

by:8080_Diver
ID: 24332797
!^%%$^@!  I hit the tab button a minute ago (out of habit) and then tried to back-tab and lost my response!  SHAZBAT!
Okay, re-typing now. ;-)
SQL_1 below gives the totals by Customer.  So, I'd use that in a Stored Procedure (see SQL_2) which I think kshould provide you with what you want.  

SQL_1:
 

SELECT Customer,

       SUM(Revenu)

FROM   YourTable

GROUP BY Customer;
 

SQL2:
 

CREATE PRCOEDURE usp_Pareto_Analysis 

   @Pct_Limit   AS   INT

AS

BEGIN

   

   SET NOCOUNT ON;
 

   DECLARE @TempTtls (

                       Customer AS varchar(XX),

                       LastName AS (appropriate datatype

                     );

   

   INSERT INTO @TempTtls 

   SELECT Customer,

          SUM(Revenu) AS Revenu

   FROM   YourTable

   GROUP BY Customer;

   

   SELECT Customer, 

          Revenu,

          AccumulatedRevenu,

          (Revenu / AccumulatedRevenu) AS PctgAccumulatedRevenu

   FROM

   (

    SELECT SUM(Revenu) AS TtlAccumRevenu

    FROM   @TempTtl

   )Y,

   ( 

    SELECT T.Customer,

           T.Revenu, 

           (

            SELECT SUM(Revenu) 

            FROM   @TempTtl

            WHERE Customer <= T.Customer 

           ) AS AccumulatedRevenu

    FROM   @TempTtl T

   ) Z

   ORDER BY Customer

   WHERE ((Revenu / AccumulatedRevenu) < @Pct_Limit);

    

END

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24333879
might like to have a look at (starts with a test / dummy table representing the data source - it is NOT part of the solution - the view and the CTE query is)...




-- first up create a table with some test data...
 

select * into my_revenu_table from ( 

select 'a' as customer ,  4 as revenu union all

select 'b' as customer ,  5 as revenu union all

select 'a' as customer ,  6 as revenu union all

select 'a' as customer ,  3 as revenu union all

select 'f' as customer ,  6 as revenu union all

select 'b' as customer ,  4 as revenu union all

select 'c' as customer ,  4 as revenu union all

select 'a' as customer ,  4 as revenu union all

select 'r' as customer ,  4 as revenu 

) src
 
 

-- you might already have an identity or something like it, so might change the approach (ie might not need the view)

-- but in the abscense of more information....
 

-- now, because we want some kind of order, decided to create a view over that data so we can get a row_number or ordering

-- we can also do the aggregation by customer that way
 

create view vw_my_revenu_view as

select convert(int, (row_number() over (order by revenu desc, customer))) as rn, customer, revenu

from   (select customer, sum(revenu) as revenu from my_revenu_table group by customer) a

go
 

-- now we have a data source, with order, we can use that to create a recursive query
 

;with cte_revenu as

( select convert(int,0) as rn, '' as customer, convert(int,0) as revenu, convert(int,0) as accumulated_revenu, (select sum(revenu) from my_revenu_table) as total_revenu 

  union all

  select c.rn + 1, r.customer, r.revenu, r.revenu + c.accumulated_revenu, c.total_revenu

  from cte_revenu c

  inner join vw_my_revenu_view r on r.rn - 1 = c.rn

) 

select *, (accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution from cte_revenu

go

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24333896
Also, I explicitly convert to INT and it is more likely a money or decimal value, but hopefully you get the idea...
0
 

Author Comment

by:TechNovation
ID: 24334184
8080 DIVER:

 I tried your code butit returns errors:
Affected code is:
  DECLARE @TempTtls (
                       Customer AS (nvarchar(35),null),
                       Revenu AS (decimal (29,2),null)
                     )
and related error is:
Incorrect Syntax near '('

And since the table variable isn't declared as consequence of error, the rest isn't working.
any idea?
Thanks


0
 

Author Comment

by:TechNovation
ID: 24334187
MARK WILLS:
This query will be run several times to populate a table, by creating a view, wouldn't the next query return an error as 'view already exists'. Do I really need to create a view? is there another way?

appreciate your help

Thanks
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24335626
Only create the view once - it is a once-off step and you can then re-use it as much as you like. No need to create everytime.

And, if you already have an indentity, or time, or anything representing sequentially incrementing rows, you might not need the view. But with the view we have taken full control of the information we want to play with. The view does not contain data itself, it points to the datatables and is resolved at runtime / at time of use...

The advantage is you have a single CTE query (starts with the ";with" and ends with the "select ... from cte_revenu"  - yep, just a "single" command)

0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 24336242
 DECLARE @TempTtls (
                       Customer AS (nvarchar(35),null),
                       Revenu AS (decimal (29,2),null)
                     )

Oops, forgot a piece.  Add the word TABLE as shown in SQL_1 below.
The SP can be used to populate a table by simply adding INSERT INTO thetablename at the top of the last SELECT statement.  If you are repopulating the table, (e.g. did this yesterday but want to get new results today), you can remove the appropriate rows from the table or update them.  (However, if you add an Datetime column that you feed the current date to, you can build up a historical set of data for trend analysis. ;-)
I have attached a revised version of the SP (commenting out the pieces needed for the historical/trending analysis table ;-):

SQL_1:

 

SELECT Customer,

       SUM(Revenu)

FROM   YourTable

GROUP BY Customer;

 

SQL2:

 

CREATE PRCOEDURE usp_Pareto_Analysis 

   @Pct_Limit   AS   INT

AS

BEGIN

   

   SET NOCOUNT ON;
 

--   DECLARE @Today DATETIME;

--

--   SET @Today = CAST(CONVERT(VARCHAR(10), GETDATE(), 101) AS DATETIME);

 

   DECLARE @TempTtls TABLE

           (

            Customer AS varchar(XX),

            LastName AS (appropriate datatype

           );

   

   INSERT INTO @TempTtls 

   SELECT Customer,

          SUM(Revenu) AS Revenu

   FROM   YourTable

   GROUP BY Customer;
 

		SELECT   
 

--   INSERT INTO YourHistoryTable

   SELECT --Today , 

          Customer, 

          Revenu,

          AccumulatedRevenu,

          (Revenu / AccumulatedRevenu) AS PctgAccumulatedRevenu

   FROM

   (

    SELECT SUM(Revenu) AS TtlAccumRevenu

    FROM   @TempTtl

   )Y,

   ( 

    SELECT T.Customer,

           T.Revenu, 

           (

            SELECT SUM(Revenu) 

            FROM   @TempTtl

            WHERE Customer <= T.Customer 

           ) AS AccumulatedRevenu

    FROM   @TempTtl T

   ) Z

   ORDER BY Customer

   WHERE ((Revenu / AccumulatedRevenu) < @Pct_Limit);

    

END

Open in new window

0
 
LVL 26

Expert Comment

by:Chris Luttrell
ID: 24341803
Hi TechNovation,
Was extremely busy these last couple days and just say this question.
The ultimate query would look something like snippet and produces this output.  I attached the script to set up and run this also.
Love these types of questions!
;With

tblItemsWithRow

as

( SELECT Row_Number() Over(Order by Customer) as RowNum, Customer, SUM(Revenue) Revenue

FROM PareTo PT

GROUP BY Customer

)

--Final Query to return the running total and pcts

Select Customer,Revenue,

( Select Sum(Revenue) from tblItemsWithRow m where m.RowNum<= s.RowNum) as [Accumulated Revenue],

(( Select Sum(CONVERT(FLOAT,Revenue)) from tblItemsWithRow m where m.RowNum<= s.RowNum)/SUM(Revenue) OVER ())*100.0  as [Acc Rev %]

from tblItemsWithRow s

ORDER BY Customer

Open in new window

PareToOutput.png
PareTo.txt
0
 

Author Comment

by:TechNovation
ID: 24344650
Actually the revenu column should be sorted descending before accumulating fo f should come before c in the order of row for customers,
this is because the purpose is to know which customers once aggregated make up 80% of the revenu.

how would the code look like?
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24344671
Line 4 in CGLuttrell's code should be similar to the row_function() previously posted...  Just include REVENU DESC,  after the order by and before Customer...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24344673
sorry, that is SUM(REVENU)
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24344735
CGluttrell, are you interested in discussing the differences in our CTE's ?
0
 

Author Comment

by:TechNovation
ID: 24345718
The query (of CGluttrell) has been running now for more than 11 hours, is it more the server that might be busy or the query is 'demanding'?
0
 

Author Comment

by:TechNovation
ID: 24345729
... and the table I'm testing in has only 20 million records, does is mean if I move to real capacity of hundreds of millions of records I will have to take something into account?
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24345748
The query is a little demanding - it is doing an inline query back on the CTE query which does take some effort. 11 hours is a long time though...
0
 
LVL 26

Expert Comment

by:Chris Luttrell
ID: 24345755
wow,yeah we need to rethink this!
That many rows, this is where Analysis Server cubes would be real handy, but we aint got them so we will make do.
0
 
LVL 26

Expert Comment

by:Chris Luttrell
ID: 24345761
mark, sorry I have been away from the computer, but yes we can compare the cte statements if we need to, but we might need to look in another direction to avoid 11+ hours, I am thinking a aggregate temp table maybe like you were trying to do with the view?
guys, I've gotta go mow some grass so will be away for a bit again, sorry.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24345767
Did you have a quick look at creating the view ? that step (ie create the view) takes no time, then do a select * from the_view  can always can it and drop the view. The difference is that the view is doing a lot of the work and the CTE then just does the running total...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24345776
@CGluttrell, yep, think the view might be worth a shot. the aggregation is done there and the sort with row number so that the CTE "just" has to do the running total and calcs. Think it is worth trying to do a select * from the_view and see what kind of response. It wont be too dissimilar (in fact should be quicker) if there is a temp table.
0
 

Author Comment

by:TechNovation
ID: 24345965
Mark I will have a go at your query as well with the view, but as it is quite complicated for me I could replace column names in the right places: the first header isn't customer but [customer name] and the second column is [Amount in local Currency], if you could please give me the query with these headings Iwill be gratefull. In the meantime CGLuttrell's query is still running.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24346148
OK, just the view first. A "once-off" step to create the view, and once created can then select from it... Now, all we really have to do is change the subquery, because we can use column aliases to give them new names. In our case we will use the real names as part of the subquery. You will have to change the table name in that same subquery from "my_revenu_table"  to whatever it really is. So, should look something like :

create view vw_my_revenu_view
as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from my_revenu_table group by [customer name]) a
go
 
Thats done. That easy. Do not have to worry about creating the view again (might need to change / fix) and so it sits there waiting, wanting to be used, and not costing anything  (the luxury of being a view).

But that will change when we do go to use it. The subquery does the aggregations first, Then it will sort and calculcate a row_number(). So lets give it a try :

select * from vw_my_revenu_view

See how that goes, and we can move on from there.
0
 

Author Comment

by:TechNovation
ID: 24346851
Select * run ok, I get 3 columns with revenu descending. But when I run the rest I get error:

Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "customer" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
create view Customer_Invoices_view 

as

select 

   convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, 

   customer, 

   revenu

from   

   (

    select 

      [customer name] as customer, 

      sum([Amount in local currency]) as revenu 

    from 

      dbo.SourceTable 

    group by 

      [customer name]

    ) a

go
 

----

select * 

from 

   Customer_Invoices_view

---- this select * returns 3 columns: rn,customer, and revenu descending

 

;with cte_revenu as

( select 

    convert(int,0) as rn, 

    '' as customer, 

    convert(int,0) as revenu, 

    convert(int,0) as accumulated_revenu, 

    (select sum([Amount in local currency]) from dbo.SourceTable) as total_revenu 

  union all

  select 

    c.rn + 1, 

    r.customer, 

    r.revenu, 

    r.revenu + c.accumulated_revenu, 

    c.total_revenu

  from 

    cte_revenu c

  inner join 

    Customer_Invoices_view r 

  on r.rn - 1 = c.rn

) 

select *, 

	(accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution 

from 

	cte_revenu

go

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24346954
No, not running the CTE yet :)  Just getting the performance of the view and if it is reasonable enough.

But what you are now observing is one nuance of the CTE. The data types must be identical (including scope and scale, not just type - e.g. varchar(1) will error with a base of varchar(10)).

You will note that I did have a series of converts in the CTE. That was mainly due to using numbers as INT so had to explicitly beat it into submission. Those INT of course is not what you are using in your real table.

So, what datatypes and size is [Customer Name] and [Amount in local currency]  (we already know that rn is bigint, not int) ?

0
 

Author Comment

by:TechNovation
ID: 24346985
[Customer name] (nvarchar(35),null)
[Amount in local Currency] (decimal (29,2),null)
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24347012
OK then let's try :


;with cte_revenu as
( select                                                                                    -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
    convert(bigint,0) as rn,
    convert(nvarchar(35),'') as customer,
    convert(decimal(29,2),0) as revenu,
    convert(decimal(29,2),0) as accumulated_revenu,
    convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu           -- could maybe use the view here
  union all
  select
    c.rn + 1,
    r.customer,
    r.revenu,
    r.revenu + c.accumulated_revenu,
    c.total_revenu
  from
    cte_revenu c
  inner join
    Customer_Invoices_view r
  on r.rn - 1 = c.rn
)
select *,
      (accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution                  -- could possibly include that in the body above.
from
      cte_revenu

go
0
 
LVL 26

Expert Comment

by:Chris Luttrell
ID: 24347017
yes, you have to convert those to the exact sizes, odd, I just ran into that also.
I just got it to come back for my test data set of 1M records but the next problem is we will run into max recurrsion errors so it stopped at 100 records and took 4:46 to run
0
 

Author Comment

by:TechNovation
ID: 24347026
Msg 156, Level 15, State 1, Line 7
Incorrect syntax near the keyword 'as'.
--- error related to:
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu          

Msg 102, Level 15, State 1, Line 20
Incorrect syntax near ')'.
--- error related to:
  on r.rn - 1 = c.rn
)
select *,
0
 
LVL 26

Expert Comment

by:Chris Luttrell
ID: 24347029
I did not have to put a convert on that value probably because it is calculated off a real column and gets the proper type from there.
0
 
LVL 26

Expert Comment

by:Chris Luttrell
ID: 24347035
if you want to let it recurse forever add OPTION (MaxRecursion 0) at the end
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:TechNovation
ID: 24347072
When I take away the convert from:
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu

I get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".

When I put the convert back, I get:
Msg 156, Level 15, State 1, Line 7
Incorrect syntax near the keyword 'as'.
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near ')'.


So I'm stuck in between
0
 
LVL 26

Expert Comment

by:Chris Luttrell
ID: 24347081
oh, with it in there you are missing a closing ) before the AS
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu
0
 

Author Comment

by:TechNovation
ID: 24347127
added the closing )
get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24347261
Sorry guys, had some visitors - it is Mother's day today in downtown Australia...

Let me type it up for you.

We can use the real columns to get the format of the anchor but best if we simply treat the anchor almost like a table definition.

But let me type it all up for you... Back in a second - will get rid of the riff raff...

OK, they've gone...

2 things.

1) we will alter the view a bit
2) we will be more specific about datatypes in the CTE
3) also decided to do the percent calc in the recursive part of the query (and can be used in the select bit down the very bottom)


I am here and waiting now, so can give you some undivided attention.



ALTER VIEW customer_invoices_view as

select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu

from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable group by [customer name]) a

go
 
 
 

;with cte_revenu as

( select                                                                                    -- this first part is the "anchor" and we are really just making a starting point for the recursive part.

    convert(bigint,0) as rn, 

    convert(nvarchar(50),'') as customer, 

    convert(decimal(29,2),0) as revenu, 

    convert(decimal(29,2),0) as accumulated_revenu, 

    convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu,

    convert(decimal(18,3),0) as percent_contribution           

  union all

  select 

    c.rn + 1, 

    r.customer, 

    r.revenu, 

    convert(decimal(29,2),(r.revenu + c.accumulated_revenu)),

    c.total_revenu,

    convert(decimal(18,3),((r.revenu + c.accumulated_revenu) / total_revenu) * 100 )           -- could maybe use the view here
 

  from 

    cte_revenu c

  inner join 

    Customer_Invoices_view r 

  on r.rn - 1 = c.rn

) 

select * 

from cte_revenu

where rn > 0
 

go

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24347387
@CGluttrell, you are correct, one BIG advantage of establising the initial data set from the actual table. Using data that is not explicitly typed, or conforms to a standard expression in a CTE is not handled particularly well. That can also include computed columns (like revenu + accumulated_revenu). Do not understand why it is so very very particular, but it is. Just on calculated columns, including aggregated functions, values are returned in the most precise expression data type - as determined by SQL. For that reason, we are beginning to see a re-emergance of the money datatype which has implicit scale for decimal operations in places where the value is not a monetry value. Ah well... lifes little mysteries.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24348294
Still waiting... Just letting you know...
0
 

Author Comment

by:TechNovation
ID: 24348530
Thanks,I run the code, the first part is ok. the second returned error:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "customer" of recursive query "cte_revenu".
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24348548
Wow, I did not get that one...

And can see why - mine was done with nvarchar(50) and I think it is nvarchar(35)...

I really do apologise for this thread dragging on, but think we are so very close now. So long as that view is returning results fairly quickly the rest should be OK.


ALTER VIEW customer_invoices_view as

select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu

from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable group by [customer name]) a

go

 

 

 

;with cte_revenu as

( select                                                                                    -- this first part is the "anchor" and we are really just making a starting point for the recursive part.

    convert(bigint,0) as rn, 

    convert(nvarchar(35),'') as customer, 

    convert(decimal(29,2),0) as revenu, 

    convert(decimal(29,2),0) as accumulated_revenu, 

    convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu,

    convert(decimal(18,3),0) as percent_contribution           

  union all

  select 

    c.rn + 1, 

    r.customer, 

    r.revenu, 

    convert(decimal(29,2),(r.revenu + c.accumulated_revenu)),

    c.total_revenu,

    convert(decimal(18,3),((r.revenu + c.accumulated_revenu) / total_revenu) * 100 )           -- could maybe use the view here

 

  from 

    cte_revenu c

  inner join 

    Customer_Invoices_view r 

  on r.rn - 1 = c.rn

) 

select * 

from cte_revenu

where rn > 0

 

go

Open in new window

0
 

Author Comment

by:TechNovation
ID: 24348571
query running which I think is a good sign, I'm awaiting some results to appear
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24348596
If you want to limit the amount of test data, can add into the view a few where clauses, or use the "top" qualifier e.g.

ALTER VIEW ...  AS
SELECT TOP 2000 ...

Another small advantage of using views...
0
 

Author Comment

by:TechNovation
ID: 24348670
good point i'm working on the top 2000, still running
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24348981
Well, dont forget that it is aggregating 2000 customers...

How long does the query take to complete ?
0
 

Author Comment

by:TechNovation
ID: 24349147
didn't finish yet, I cancelled and changed to top 20 as I'm desparate to see some results. Actually the server is overloaded with other venry demanding queries from colleagues, so I do not worry about how much time it takes, I just want to see "something"
0
 

Author Comment

by:TechNovation
ID: 24349151
I checked with:
select count(*) from customer_invoices_view

and there is indeed only 20 rows now but waiting for the result of the real query
0
 

Author Comment

by:TechNovation
ID: 24349323
I don't know what's going on, I run other queries and they return some results. CGluttrell's query is still running now for 1 day and 8,5 hours.

The query with the view (with top 20) is now running for 45 minutes. Well when I get something back I will let you know
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24349893
Must be something going on - can you see any lock contentions ?  (SSMS, management, activity monitor)

When you say "the query with the view" is that the CTE query ? or the select * from customer_invoices_view ?

You will need to kill those queries. We can get the results otherways.

If it is the CTE query that is taking all the time, and the select * from view works quickly, then try this:

select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c

if that is OK, (ie returns a result fairly quickly) then try this:

select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
      from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT


Now, you can also change the view to include a nolock option which might also help...


ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
0
 

Author Comment

by:TechNovation
ID: 24350124
the first:
select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c

took 32 minutes.

I will run the second now:
select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
      from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24350138
32 minutes - ouch... will wait for next one, expect not much more...
0
 

Author Comment

by:TechNovation
ID: 24350214
OK 22 minutes but as is the case for the first part, it returns the accumulated revenu not descending (actually in no apparent order), and not adding up to the total revenu.

The percent contribution is not adding up to 100%

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24350370
having the top nnn in the view will stuff it up a bit - it is more about getting something out and seeing how long.

you know the interesting thing is the the most efficient query takes longer and so far, the more innefficient query has worked the fastest.

think it is time to remove the top nnn statement from the view and run it again. do you know what percent the top nnn is of the total number of customers ?

ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
0
 
LVL 3

Expert Comment

by:GregTSmith
ID: 24359378
An alternative to using recursion:
--------------- START SAMPLE DATA SETUP --------------- 

DECLARE @sample TABLE (customer char(1), revenue money) 
 

INSERT @sample 

  (customer, revenue) 

SELECT 'a' [Customer], 4 [Revenue] 

UNION ALL SELECT 'b' [Customer], 5 [Revenue] 

UNION ALL SELECT 'a' [Customer], 6 [Revenue] 

UNION ALL SELECT 'a' [Customer], 3 [Revenue] 

UNION ALL SELECT 'f' [Customer], 6 [Revenue] 

UNION ALL SELECT 'b' [Customer], 4 [Revenue] 

UNION ALL SELECT 'c' [Customer], 4 [Revenue] 

UNION ALL SELECT 'a' [Customer], 4 [Revenue] 

UNION ALL SELECT 'r' [Customer], 4 [Revenue] 

--------------- FINISH SAMPLE DATA SETUP --------------
 

DECLARE @customers TABLE (

  priority int PRIMARY KEY NOT NULL, 

  customer char(1) NOT NULL, 

  revenue money NOT NULL, 

  accumulatedRevenue money NOT NULL DEFAULT(0) 

) 
 

-- aggregate to customer level 

INSERT @customers 

  (priority, customer, revenue) 

SELECT 

  ROW_NUMBER() OVER (ORDER BY X.revenue DESC, X.customer) [priority], 

  X.customer, 

  X.revenue 

FROM 

  (

    SELECT 

      customer, 

      SUM(revenue) [revenue] 

    FROM 

      @sample 

    GROUP BY 

      customer 

  ) X 
 
 

-- find total revenue

DECLARE @totalRevenue money 

SET @totalRevenue = (SELECT SUM(revenue) FROM @customers) 
 
 

-- calculate accumulated revenue based on priority

UPDATE 

  Y 

SET 

  accumulatedRevenue = X.accumulatedRevenue 

FROM 

  @customers Y 

  INNER JOIN (

    SELECT 

      A.priority, 

      SUM(B.revenue) [accumulatedRevenue] 

    FROM 

      @customers A 

      INNER JOIN @customers B ON B.priority <= A.priority 

    GROUP BY 

      A.priority 

  ) X ON X.priority = Y.priority 
 
 

-- get final results 

SELECT 

  customer, 

  revenue, 

  revenue / @totalRevenue [contribution] 

FROM 

  @customers 

WHERE 

  accumulatedRevenue <= (@totalRevenue * 0.8) 

ORDER BY 

  revenue DESC, 

  customer 

Open in new window

0
 

Author Comment

by:TechNovation
ID: 24409154
Mark,
after 7 hours of running, your query returned the error:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

GrepT Smith,
thanks I will try your query: could you please change it to reflect the real filed names:
[Customer name] (nvarchar(35),null)
[Amount in local Currency] (decimal (29,2),null)
Thank you

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24409185
That's interesting... 7 hours ? Max recursion - that means you are running that CTE ?

I thought we were just selecting from that view for the moment, trying to get the correct rows...

ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go


--then

select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
         from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT


Have we tried that yet ?
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 500 total points
ID: 24409243
The other way we can do it is to create a temp table... We can encapsulate all that as a Stored Procedure so all you have to do is to exec that stored procedure...
create table #tmp_pareto(rn bigint primary key clustered, customer varchar(60), revenu decimal(29,2))
 

insert #tmp_pareto
 

select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu

from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
 

--select * from #tmp_pareto
 

declare @total decimal (29,2)

set @total = (select isnull(sum(revenu),1) total_revenu from customer_invoices_view)
 

select *, @total as total_revenue, accumulated_revenu / @total * 100 as percent_contribution

from (select rn, customer, revenu,(select sum(revenu) from #tmp_pareto t where t.rn <= p.rn) as accumulated_revenu

      from #tmp_pareto p) c

Open in new window

0
 

Author Comment

by:TechNovation
ID: 24426496
Mark, you last code simply WORKS, I get exactly what I'm looking for, the only thing is that I do not understand why we are still using the view "customer_invoices_view" in the code. Do we need it? meaning do I have the future to
FIRST use the code for the view (by the way which one? there are many) and
THEN create the temporary table and
THEN declare the stored procedure?

If you were to summaries the whole code from the start to finish what would it look like? or is it simply the one above? Do I need to have both a view and a temp table?

Thanks
0
 

Author Closing Comment

by:TechNovation
ID: 31579120
Thank you
0
 

Author Comment

by:TechNovation
ID: 24426726
Ok ignore my question about the view, as i just replaced it with the temp table and it worked as well. So it should be fine. MANY THANKS
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24427487
Glad you found that, and you are correct, it is not needed. Just the temp table, which also means just that snippet.

Sure glad we got to the bottom of this and still a little confused as to why the CTE didn't happen... Ah well, we now have an answer, and very happy to have been of help...
0
 
LVL 26

Expert Comment

by:Chris Luttrell
ID: 24427610
congratulations mark, glad you were able to put it together.  I was thinking a temp table might be what we would have to do back in ID: 24345761 but never got to finish putting it together, got too busy.  The reason just the simple CTE and the solution Greg offered up with the table variable would have run into issues also it the amount of memory required to aggregate the millions of rows and both of those solutions were memory bound while the #tmpTable wrote the intermediate data to disk instead.  I believe we all learned something here, I know I did.
Thanks guys, till next time.
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Hi all, It is important and often overlooked to understand “Database properties”. Often we see questions about "log files" or "where is the database" and one of the easiest ways to get general information about your database is to use “Database p…
If you have heard of RFC822 date formats, they can be quite a challenge in SQL Server. RFC822 is an Internet standard format for email message headers, including all dates within those headers. The RFC822 protocols are available in detail at:   ht…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This video demonstrates how to create an example email signature rule for a department in a company using CodeTwo Exchange Rules. The signature will be inserted beneath users' latest emails in conversations and will be displayed in users' Sent Items…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now