Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Pareto Analysis in SQL Server 2005

Posted on 2009-05-07
58
Medium Priority
?
1,902 Views
Last Modified: 2012-06-27
Hello,

How do I make Pareto Analysis in SQL server, say I have a table structure like:

Customer       Revenu
a                         4
b                         5
a                         6
a                         3
f                         6
b                         4
c                         4
a                         4
r                         4


After ading up (grouping by customer) The idea is to find out cumulatively which customers once added up are making 80% of the revenu, which needs to be later presented in a chart?

Output looks like:

Customer      Revenue     Accumulated Revenue      % (acc. / Total)
a                        16                      16                                    20
b                          4                      20                                   35
...                         ...                      ...                                   ...
f                         6                     56                                   81
...                          ...                   ...                                  ...              
0
Comment
Question by:TechNovation
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 24
  • 23
  • 8
  • +2
58 Comments
 
LVL 22

Expert Comment

by:8080_Diver
ID: 24332797
!^%%$^@!  I hit the tab button a minute ago (out of habit) and then tried to back-tab and lost my response!  SHAZBAT!
Okay, re-typing now. ;-)
SQL_1 below gives the totals by Customer.  So, I'd use that in a Stored Procedure (see SQL_2) which I think kshould provide you with what you want.  

SQL_1:
 
SELECT Customer,
       SUM(Revenu)
FROM   YourTable
GROUP BY Customer;
 
SQL2:
 
CREATE PRCOEDURE usp_Pareto_Analysis 
   @Pct_Limit   AS   INT
AS
BEGIN
   
   SET NOCOUNT ON;
 
   DECLARE @TempTtls (
                       Customer AS varchar(XX),
                       LastName AS (appropriate datatype
                     );
   
   INSERT INTO @TempTtls 
   SELECT Customer,
          SUM(Revenu) AS Revenu
   FROM   YourTable
   GROUP BY Customer;
   
   SELECT Customer, 
          Revenu,
          AccumulatedRevenu,
          (Revenu / AccumulatedRevenu) AS PctgAccumulatedRevenu
   FROM
   (
    SELECT SUM(Revenu) AS TtlAccumRevenu
    FROM   @TempTtl
   )Y,
   ( 
    SELECT T.Customer,
           T.Revenu, 
           (
            SELECT SUM(Revenu) 
            FROM   @TempTtl
            WHERE Customer <= T.Customer 
           ) AS AccumulatedRevenu
    FROM   @TempTtl T
   ) Z
   ORDER BY Customer
   WHERE ((Revenu / AccumulatedRevenu) < @Pct_Limit);
    
END

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24333879
might like to have a look at (starts with a test / dummy table representing the data source - it is NOT part of the solution - the view and the CTE query is)...



-- first up create a table with some test data...
 
select * into my_revenu_table from ( 
select 'a' as customer ,  4 as revenu union all
select 'b' as customer ,  5 as revenu union all
select 'a' as customer ,  6 as revenu union all
select 'a' as customer ,  3 as revenu union all
select 'f' as customer ,  6 as revenu union all
select 'b' as customer ,  4 as revenu union all
select 'c' as customer ,  4 as revenu union all
select 'a' as customer ,  4 as revenu union all
select 'r' as customer ,  4 as revenu 
) src
 
 
-- you might already have an identity or something like it, so might change the approach (ie might not need the view)
-- but in the abscense of more information....
 
-- now, because we want some kind of order, decided to create a view over that data so we can get a row_number or ordering
-- we can also do the aggregation by customer that way
 
create view vw_my_revenu_view as
select convert(int, (row_number() over (order by revenu desc, customer))) as rn, customer, revenu
from   (select customer, sum(revenu) as revenu from my_revenu_table group by customer) a
go
 
-- now we have a data source, with order, we can use that to create a recursive query
 
;with cte_revenu as
( select convert(int,0) as rn, '' as customer, convert(int,0) as revenu, convert(int,0) as accumulated_revenu, (select sum(revenu) from my_revenu_table) as total_revenu 
  union all
  select c.rn + 1, r.customer, r.revenu, r.revenu + c.accumulated_revenu, c.total_revenu
  from cte_revenu c
  inner join vw_my_revenu_view r on r.rn - 1 = c.rn
) 
select *, (accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution from cte_revenu
go

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24333896
Also, I explicitly convert to INT and it is more likely a money or decimal value, but hopefully you get the idea...
0
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

 

Author Comment

by:TechNovation
ID: 24334184
8080 DIVER:

 I tried your code butit returns errors:
Affected code is:
  DECLARE @TempTtls (
                       Customer AS (nvarchar(35),null),
                       Revenu AS (decimal (29,2),null)
                     )
and related error is:
Incorrect Syntax near '('

And since the table variable isn't declared as consequence of error, the rest isn't working.
any idea?
Thanks


0
 

Author Comment

by:TechNovation
ID: 24334187
MARK WILLS:
This query will be run several times to populate a table, by creating a view, wouldn't the next query return an error as 'view already exists'. Do I really need to create a view? is there another way?

appreciate your help

Thanks
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24335626
Only create the view once - it is a once-off step and you can then re-use it as much as you like. No need to create everytime.

And, if you already have an indentity, or time, or anything representing sequentially incrementing rows, you might not need the view. But with the view we have taken full control of the information we want to play with. The view does not contain data itself, it points to the datatables and is resolved at runtime / at time of use...

The advantage is you have a single CTE query (starts with the ";with" and ends with the "select ... from cte_revenu"  - yep, just a "single" command)

0
 
LVL 22

Expert Comment

by:8080_Diver
ID: 24336242
 DECLARE @TempTtls (
                       Customer AS (nvarchar(35),null),
                       Revenu AS (decimal (29,2),null)
                     )

Oops, forgot a piece.  Add the word TABLE as shown in SQL_1 below.
The SP can be used to populate a table by simply adding INSERT INTO thetablename at the top of the last SELECT statement.  If you are repopulating the table, (e.g. did this yesterday but want to get new results today), you can remove the appropriate rows from the table or update them.  (However, if you add an Datetime column that you feed the current date to, you can build up a historical set of data for trend analysis. ;-)
I have attached a revised version of the SP (commenting out the pieces needed for the historical/trending analysis table ;-):

SQL_1:
 
SELECT Customer,
       SUM(Revenu)
FROM   YourTable
GROUP BY Customer;
 
SQL2:
 
CREATE PRCOEDURE usp_Pareto_Analysis 
   @Pct_Limit   AS   INT
AS
BEGIN
   
   SET NOCOUNT ON;
 
--   DECLARE @Today DATETIME;
--
--   SET @Today = CAST(CONVERT(VARCHAR(10), GETDATE(), 101) AS DATETIME);
 
   DECLARE @TempTtls TABLE
           (
            Customer AS varchar(XX),
            LastName AS (appropriate datatype
           );
   
   INSERT INTO @TempTtls 
   SELECT Customer,
          SUM(Revenu) AS Revenu
   FROM   YourTable
   GROUP BY Customer;
 
		SELECT   
 
--   INSERT INTO YourHistoryTable
   SELECT --Today , 
          Customer, 
          Revenu,
          AccumulatedRevenu,
          (Revenu / AccumulatedRevenu) AS PctgAccumulatedRevenu
   FROM
   (
    SELECT SUM(Revenu) AS TtlAccumRevenu
    FROM   @TempTtl
   )Y,
   ( 
    SELECT T.Customer,
           T.Revenu, 
           (
            SELECT SUM(Revenu) 
            FROM   @TempTtl
            WHERE Customer <= T.Customer 
           ) AS AccumulatedRevenu
    FROM   @TempTtl T
   ) Z
   ORDER BY Customer
   WHERE ((Revenu / AccumulatedRevenu) < @Pct_Limit);
    
END

Open in new window

0
 
LVL 27

Expert Comment

by:Chris Luttrell
ID: 24341803
Hi TechNovation,
Was extremely busy these last couple days and just say this question.
The ultimate query would look something like snippet and produces this output.  I attached the script to set up and run this also.
Love these types of questions!
;With
tblItemsWithRow
as
( SELECT Row_Number() Over(Order by Customer) as RowNum, Customer, SUM(Revenue) Revenue
FROM PareTo PT
GROUP BY Customer
)
--Final Query to return the running total and pcts
Select Customer,Revenue,
( Select Sum(Revenue) from tblItemsWithRow m where m.RowNum<= s.RowNum) as [Accumulated Revenue],
(( Select Sum(CONVERT(FLOAT,Revenue)) from tblItemsWithRow m where m.RowNum<= s.RowNum)/SUM(Revenue) OVER ())*100.0  as [Acc Rev %]
from tblItemsWithRow s
ORDER BY Customer

Open in new window

PareToOutput.png
PareTo.txt
0
 

Author Comment

by:TechNovation
ID: 24344650
Actually the revenu column should be sorted descending before accumulating fo f should come before c in the order of row for customers,
this is because the purpose is to know which customers once aggregated make up 80% of the revenu.

how would the code look like?
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24344671
Line 4 in CGLuttrell's code should be similar to the row_function() previously posted...  Just include REVENU DESC,  after the order by and before Customer...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24344673
sorry, that is SUM(REVENU)
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24344735
CGluttrell, are you interested in discussing the differences in our CTE's ?
0
 

Author Comment

by:TechNovation
ID: 24345718
The query (of CGluttrell) has been running now for more than 11 hours, is it more the server that might be busy or the query is 'demanding'?
0
 

Author Comment

by:TechNovation
ID: 24345729
... and the table I'm testing in has only 20 million records, does is mean if I move to real capacity of hundreds of millions of records I will have to take something into account?
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24345748
The query is a little demanding - it is doing an inline query back on the CTE query which does take some effort. 11 hours is a long time though...
0
 
LVL 27

Expert Comment

by:Chris Luttrell
ID: 24345755
wow,yeah we need to rethink this!
That many rows, this is where Analysis Server cubes would be real handy, but we aint got them so we will make do.
0
 
LVL 27

Expert Comment

by:Chris Luttrell
ID: 24345761
mark, sorry I have been away from the computer, but yes we can compare the cte statements if we need to, but we might need to look in another direction to avoid 11+ hours, I am thinking a aggregate temp table maybe like you were trying to do with the view?
guys, I've gotta go mow some grass so will be away for a bit again, sorry.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24345767
Did you have a quick look at creating the view ? that step (ie create the view) takes no time, then do a select * from the_view  can always can it and drop the view. The difference is that the view is doing a lot of the work and the CTE then just does the running total...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24345776
@CGluttrell, yep, think the view might be worth a shot. the aggregation is done there and the sort with row number so that the CTE "just" has to do the running total and calcs. Think it is worth trying to do a select * from the_view and see what kind of response. It wont be too dissimilar (in fact should be quicker) if there is a temp table.
0
 

Author Comment

by:TechNovation
ID: 24345965
Mark I will have a go at your query as well with the view, but as it is quite complicated for me I could replace column names in the right places: the first header isn't customer but [customer name] and the second column is [Amount in local Currency], if you could please give me the query with these headings Iwill be gratefull. In the meantime CGLuttrell's query is still running.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24346148
OK, just the view first. A "once-off" step to create the view, and once created can then select from it... Now, all we really have to do is change the subquery, because we can use column aliases to give them new names. In our case we will use the real names as part of the subquery. You will have to change the table name in that same subquery from "my_revenu_table"  to whatever it really is. So, should look something like :

create view vw_my_revenu_view
as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from my_revenu_table group by [customer name]) a
go
 
Thats done. That easy. Do not have to worry about creating the view again (might need to change / fix) and so it sits there waiting, wanting to be used, and not costing anything  (the luxury of being a view).

But that will change when we do go to use it. The subquery does the aggregations first, Then it will sort and calculcate a row_number(). So lets give it a try :

select * from vw_my_revenu_view

See how that goes, and we can move on from there.
0
 

Author Comment

by:TechNovation
ID: 24346851
Select * run ok, I get 3 columns with revenu descending. But when I run the rest I get error:

Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "customer" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
create view Customer_Invoices_view 
as
select 
   convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, 
   customer, 
   revenu
from   
   (
    select 
      [customer name] as customer, 
      sum([Amount in local currency]) as revenu 
    from 
      dbo.SourceTable 
    group by 
      [customer name]
    ) a
go
 
----
select * 
from 
   Customer_Invoices_view
---- this select * returns 3 columns: rn,customer, and revenu descending
 
;with cte_revenu as
( select 
    convert(int,0) as rn, 
    '' as customer, 
    convert(int,0) as revenu, 
    convert(int,0) as accumulated_revenu, 
    (select sum([Amount in local currency]) from dbo.SourceTable) as total_revenu 
  union all
  select 
    c.rn + 1, 
    r.customer, 
    r.revenu, 
    r.revenu + c.accumulated_revenu, 
    c.total_revenu
  from 
    cte_revenu c
  inner join 
    Customer_Invoices_view r 
  on r.rn - 1 = c.rn
) 
select *, 
	(accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution 
from 
	cte_revenu
go

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24346954
No, not running the CTE yet :)  Just getting the performance of the view and if it is reasonable enough.

But what you are now observing is one nuance of the CTE. The data types must be identical (including scope and scale, not just type - e.g. varchar(1) will error with a base of varchar(10)).

You will note that I did have a series of converts in the CTE. That was mainly due to using numbers as INT so had to explicitly beat it into submission. Those INT of course is not what you are using in your real table.

So, what datatypes and size is [Customer Name] and [Amount in local currency]  (we already know that rn is bigint, not int) ?

0
 

Author Comment

by:TechNovation
ID: 24346985
[Customer name] (nvarchar(35),null)
[Amount in local Currency] (decimal (29,2),null)
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24347012
OK then let's try :


;with cte_revenu as
( select                                                                                    -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
    convert(bigint,0) as rn,
    convert(nvarchar(35),'') as customer,
    convert(decimal(29,2),0) as revenu,
    convert(decimal(29,2),0) as accumulated_revenu,
    convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu           -- could maybe use the view here
  union all
  select
    c.rn + 1,
    r.customer,
    r.revenu,
    r.revenu + c.accumulated_revenu,
    c.total_revenu
  from
    cte_revenu c
  inner join
    Customer_Invoices_view r
  on r.rn - 1 = c.rn
)
select *,
      (accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution                  -- could possibly include that in the body above.
from
      cte_revenu

go
0
 
LVL 27

Expert Comment

by:Chris Luttrell
ID: 24347017
yes, you have to convert those to the exact sizes, odd, I just ran into that also.
I just got it to come back for my test data set of 1M records but the next problem is we will run into max recurrsion errors so it stopped at 100 records and took 4:46 to run
0
 

Author Comment

by:TechNovation
ID: 24347026
Msg 156, Level 15, State 1, Line 7
Incorrect syntax near the keyword 'as'.
--- error related to:
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu          

Msg 102, Level 15, State 1, Line 20
Incorrect syntax near ')'.
--- error related to:
  on r.rn - 1 = c.rn
)
select *,
0
 
LVL 27

Expert Comment

by:Chris Luttrell
ID: 24347029
I did not have to put a convert on that value probably because it is calculated off a real column and gets the proper type from there.
0
 
LVL 27

Expert Comment

by:Chris Luttrell
ID: 24347035
if you want to let it recurse forever add OPTION (MaxRecursion 0) at the end
0
 

Author Comment

by:TechNovation
ID: 24347072
When I take away the convert from:
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu

I get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".

When I put the convert back, I get:
Msg 156, Level 15, State 1, Line 7
Incorrect syntax near the keyword 'as'.
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near ')'.


So I'm stuck in between
0
 
LVL 27

Expert Comment

by:Chris Luttrell
ID: 24347081
oh, with it in there you are missing a closing ) before the AS
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu
0
 

Author Comment

by:TechNovation
ID: 24347127
added the closing )
get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24347261
Sorry guys, had some visitors - it is Mother's day today in downtown Australia...

Let me type it up for you.

We can use the real columns to get the format of the anchor but best if we simply treat the anchor almost like a table definition.

But let me type it all up for you... Back in a second - will get rid of the riff raff...

OK, they've gone...

2 things.

1) we will alter the view a bit
2) we will be more specific about datatypes in the CTE
3) also decided to do the percent calc in the recursive part of the query (and can be used in the select bit down the very bottom)


I am here and waiting now, so can give you some undivided attention.


ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable group by [customer name]) a
go
 
 
 
;with cte_revenu as
( select                                                                                    -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
    convert(bigint,0) as rn, 
    convert(nvarchar(50),'') as customer, 
    convert(decimal(29,2),0) as revenu, 
    convert(decimal(29,2),0) as accumulated_revenu, 
    convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu,
    convert(decimal(18,3),0) as percent_contribution           
  union all
  select 
    c.rn + 1, 
    r.customer, 
    r.revenu, 
    convert(decimal(29,2),(r.revenu + c.accumulated_revenu)),
    c.total_revenu,
    convert(decimal(18,3),((r.revenu + c.accumulated_revenu) / total_revenu) * 100 )           -- could maybe use the view here
 
  from 
    cte_revenu c
  inner join 
    Customer_Invoices_view r 
  on r.rn - 1 = c.rn
) 
select * 
from cte_revenu
where rn > 0
 
go

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24347387
@CGluttrell, you are correct, one BIG advantage of establising the initial data set from the actual table. Using data that is not explicitly typed, or conforms to a standard expression in a CTE is not handled particularly well. That can also include computed columns (like revenu + accumulated_revenu). Do not understand why it is so very very particular, but it is. Just on calculated columns, including aggregated functions, values are returned in the most precise expression data type - as determined by SQL. For that reason, we are beginning to see a re-emergance of the money datatype which has implicit scale for decimal operations in places where the value is not a monetry value. Ah well... lifes little mysteries.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24348294
Still waiting... Just letting you know...
0
 

Author Comment

by:TechNovation
ID: 24348530
Thanks,I run the code, the first part is ok. the second returned error:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "customer" of recursive query "cte_revenu".
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24348548
Wow, I did not get that one...

And can see why - mine was done with nvarchar(50) and I think it is nvarchar(35)...

I really do apologise for this thread dragging on, but think we are so very close now. So long as that view is returning results fairly quickly the rest should be OK.


ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable group by [customer name]) a
go
 
 
 
;with cte_revenu as
( select                                                                                    -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
    convert(bigint,0) as rn, 
    convert(nvarchar(35),'') as customer, 
    convert(decimal(29,2),0) as revenu, 
    convert(decimal(29,2),0) as accumulated_revenu, 
    convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu,
    convert(decimal(18,3),0) as percent_contribution           
  union all
  select 
    c.rn + 1, 
    r.customer, 
    r.revenu, 
    convert(decimal(29,2),(r.revenu + c.accumulated_revenu)),
    c.total_revenu,
    convert(decimal(18,3),((r.revenu + c.accumulated_revenu) / total_revenu) * 100 )           -- could maybe use the view here
 
  from 
    cte_revenu c
  inner join 
    Customer_Invoices_view r 
  on r.rn - 1 = c.rn
) 
select * 
from cte_revenu
where rn > 0
 
go

Open in new window

0
 

Author Comment

by:TechNovation
ID: 24348571
query running which I think is a good sign, I'm awaiting some results to appear
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24348596
If you want to limit the amount of test data, can add into the view a few where clauses, or use the "top" qualifier e.g.

ALTER VIEW ...  AS
SELECT TOP 2000 ...

Another small advantage of using views...
0
 

Author Comment

by:TechNovation
ID: 24348670
good point i'm working on the top 2000, still running
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24348981
Well, dont forget that it is aggregating 2000 customers...

How long does the query take to complete ?
0
 

Author Comment

by:TechNovation
ID: 24349147
didn't finish yet, I cancelled and changed to top 20 as I'm desparate to see some results. Actually the server is overloaded with other venry demanding queries from colleagues, so I do not worry about how much time it takes, I just want to see "something"
0
 

Author Comment

by:TechNovation
ID: 24349151
I checked with:
select count(*) from customer_invoices_view

and there is indeed only 20 rows now but waiting for the result of the real query
0
 

Author Comment

by:TechNovation
ID: 24349323
I don't know what's going on, I run other queries and they return some results. CGluttrell's query is still running now for 1 day and 8,5 hours.

The query with the view (with top 20) is now running for 45 minutes. Well when I get something back I will let you know
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24349893
Must be something going on - can you see any lock contentions ?  (SSMS, management, activity monitor)

When you say "the query with the view" is that the CTE query ? or the select * from customer_invoices_view ?

You will need to kill those queries. We can get the results otherways.

If it is the CTE query that is taking all the time, and the select * from view works quickly, then try this:

select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c

if that is OK, (ie returns a result fairly quickly) then try this:

select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
      from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT


Now, you can also change the view to include a nolock option which might also help...


ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
0
 

Author Comment

by:TechNovation
ID: 24350124
the first:
select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c

took 32 minutes.

I will run the second now:
select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
      from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24350138
32 minutes - ouch... will wait for next one, expect not much more...
0
 

Author Comment

by:TechNovation
ID: 24350214
OK 22 minutes but as is the case for the first part, it returns the accumulated revenu not descending (actually in no apparent order), and not adding up to the total revenu.

The percent contribution is not adding up to 100%

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24350370
having the top nnn in the view will stuff it up a bit - it is more about getting something out and seeing how long.

you know the interesting thing is the the most efficient query takes longer and so far, the more innefficient query has worked the fastest.

think it is time to remove the top nnn statement from the view and run it again. do you know what percent the top nnn is of the total number of customers ?

ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
0
 
LVL 3

Expert Comment

by:GregTSmith
ID: 24359378
An alternative to using recursion:
--------------- START SAMPLE DATA SETUP --------------- 
DECLARE @sample TABLE (customer char(1), revenue money) 
 
INSERT @sample 
  (customer, revenue) 
SELECT 'a' [Customer], 4 [Revenue] 
UNION ALL SELECT 'b' [Customer], 5 [Revenue] 
UNION ALL SELECT 'a' [Customer], 6 [Revenue] 
UNION ALL SELECT 'a' [Customer], 3 [Revenue] 
UNION ALL SELECT 'f' [Customer], 6 [Revenue] 
UNION ALL SELECT 'b' [Customer], 4 [Revenue] 
UNION ALL SELECT 'c' [Customer], 4 [Revenue] 
UNION ALL SELECT 'a' [Customer], 4 [Revenue] 
UNION ALL SELECT 'r' [Customer], 4 [Revenue] 
--------------- FINISH SAMPLE DATA SETUP --------------
 
DECLARE @customers TABLE (
  priority int PRIMARY KEY NOT NULL, 
  customer char(1) NOT NULL, 
  revenue money NOT NULL, 
  accumulatedRevenue money NOT NULL DEFAULT(0) 
) 
 
-- aggregate to customer level 
INSERT @customers 
  (priority, customer, revenue) 
SELECT 
  ROW_NUMBER() OVER (ORDER BY X.revenue DESC, X.customer) [priority], 
  X.customer, 
  X.revenue 
FROM 
  (
    SELECT 
      customer, 
      SUM(revenue) [revenue] 
    FROM 
      @sample 
    GROUP BY 
      customer 
  ) X 
 
 
-- find total revenue
DECLARE @totalRevenue money 
SET @totalRevenue = (SELECT SUM(revenue) FROM @customers) 
 
 
-- calculate accumulated revenue based on priority
UPDATE 
  Y 
SET 
  accumulatedRevenue = X.accumulatedRevenue 
FROM 
  @customers Y 
  INNER JOIN (
    SELECT 
      A.priority, 
      SUM(B.revenue) [accumulatedRevenue] 
    FROM 
      @customers A 
      INNER JOIN @customers B ON B.priority <= A.priority 
    GROUP BY 
      A.priority 
  ) X ON X.priority = Y.priority 
 
 
-- get final results 
SELECT 
  customer, 
  revenue, 
  revenue / @totalRevenue [contribution] 
FROM 
  @customers 
WHERE 
  accumulatedRevenue <= (@totalRevenue * 0.8) 
ORDER BY 
  revenue DESC, 
  customer 

Open in new window

0
 

Author Comment

by:TechNovation
ID: 24409154
Mark,
after 7 hours of running, your query returned the error:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

GrepT Smith,
thanks I will try your query: could you please change it to reflect the real filed names:
[Customer name] (nvarchar(35),null)
[Amount in local Currency] (decimal (29,2),null)
Thank you

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24409185
That's interesting... 7 hours ? Max recursion - that means you are running that CTE ?

I thought we were just selecting from that view for the moment, trying to get the correct rows...

ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go


--then

select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
         from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT


Have we tried that yet ?
0
 
LVL 51

Accepted Solution

by:
Mark Wills earned 2000 total points
ID: 24409243
The other way we can do it is to create a temp table... We can encapsulate all that as a Stored Procedure so all you have to do is to exec that stored procedure...
create table #tmp_pareto(rn bigint primary key clustered, customer varchar(60), revenu decimal(29,2))
 
insert #tmp_pareto
 
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from   (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
 
--select * from #tmp_pareto
 
declare @total decimal (29,2)
set @total = (select isnull(sum(revenu),1) total_revenu from customer_invoices_view)
 
select *, @total as total_revenue, accumulated_revenu / @total * 100 as percent_contribution
from (select rn, customer, revenu,(select sum(revenu) from #tmp_pareto t where t.rn <= p.rn) as accumulated_revenu
      from #tmp_pareto p) c

Open in new window

0
 

Author Comment

by:TechNovation
ID: 24426496
Mark, you last code simply WORKS, I get exactly what I'm looking for, the only thing is that I do not understand why we are still using the view "customer_invoices_view" in the code. Do we need it? meaning do I have the future to
FIRST use the code for the view (by the way which one? there are many) and
THEN create the temporary table and
THEN declare the stored procedure?

If you were to summaries the whole code from the start to finish what would it look like? or is it simply the one above? Do I need to have both a view and a temp table?

Thanks
0
 

Author Closing Comment

by:TechNovation
ID: 31579120
Thank you
0
 

Author Comment

by:TechNovation
ID: 24426726
Ok ignore my question about the view, as i just replaced it with the temp table and it worked as well. So it should be fine. MANY THANKS
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24427487
Glad you found that, and you are correct, it is not needed. Just the temp table, which also means just that snippet.

Sure glad we got to the bottom of this and still a little confused as to why the CTE didn't happen... Ah well, we now have an answer, and very happy to have been of help...
0
 
LVL 27

Expert Comment

by:Chris Luttrell
ID: 24427610
congratulations mark, glad you were able to put it together.  I was thinking a temp table might be what we would have to do back in ID: 24345761 but never got to finish putting it together, got too busy.  The reason just the simple CTE and the solution Greg offered up with the table variable would have run into issues also it the amount of memory required to aggregate the millions of rows and both of those solutions were memory bound while the #tmpTable wrote the intermediate data to disk instead.  I believe we all learned something here, I know I did.
Thanks guys, till next time.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In SQL Server, when rows are selected from a table, does it retrieve data in the order in which it is inserted?  Many believe this is the case. Let us try to examine for ourselves with an example. To get started, use the following script, wh…
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
Add bar graphs to Access queries using Unicode block characters. Graphs appear on every record in the color you want. Give life to numbers. Hopes this gives you ideas on visualizing your data in new ways ~ Create a calculated field in a query: …

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question