TechNovation
asked on
Pareto Analysis in SQL Server 2005
Hello,
How do I make Pareto Analysis in SQL server, say I have a table structure like:
Customer Revenu
a 4
b 5
a 6
a 3
f 6
b 4
c 4
a 4
r 4
After ading up (grouping by customer) The idea is to find out cumulatively which customers once added up are making 80% of the revenu, which needs to be later presented in a chart?
Output looks like:
Customer Revenue Accumulated Revenue % (acc. / Total)
a 16 16 20
b 4 20 35
... ... ... ...
f 6 56 81
... ... ... ...
How do I make Pareto Analysis in SQL server, say I have a table structure like:
Customer Revenu
a 4
b 5
a 6
a 3
f 6
b 4
c 4
a 4
r 4
After ading up (grouping by customer) The idea is to find out cumulatively which customers once added up are making 80% of the revenu, which needs to be later presented in a chart?
Output looks like:
Customer Revenue Accumulated Revenue % (acc. / Total)
a 16 16 20
b 4 20 35
... ... ... ...
f 6 56 81
... ... ... ...
might like to have a look at (starts with a test / dummy table representing the data source - it is NOT part of the solution - the view and the CTE query is)...
-- first up create a table with some test data...
select * into my_revenu_table from (
select 'a' as customer , 4 as revenu union all
select 'b' as customer , 5 as revenu union all
select 'a' as customer , 6 as revenu union all
select 'a' as customer , 3 as revenu union all
select 'f' as customer , 6 as revenu union all
select 'b' as customer , 4 as revenu union all
select 'c' as customer , 4 as revenu union all
select 'a' as customer , 4 as revenu union all
select 'r' as customer , 4 as revenu
) src
-- you might already have an identity or something like it, so might change the approach (ie might not need the view)
-- but in the abscense of more information....
-- now, because we want some kind of order, decided to create a view over that data so we can get a row_number or ordering
-- we can also do the aggregation by customer that way
create view vw_my_revenu_view as
select convert(int, (row_number() over (order by revenu desc, customer))) as rn, customer, revenu
from (select customer, sum(revenu) as revenu from my_revenu_table group by customer) a
go
-- now we have a data source, with order, we can use that to create a recursive query
;with cte_revenu as
( select convert(int,0) as rn, '' as customer, convert(int,0) as revenu, convert(int,0) as accumulated_revenu, (select sum(revenu) from my_revenu_table) as total_revenu
union all
select c.rn + 1, r.customer, r.revenu, r.revenu + c.accumulated_revenu, c.total_revenu
from cte_revenu c
inner join vw_my_revenu_view r on r.rn - 1 = c.rn
)
select *, (accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution from cte_revenu
go
Also, I explicitly convert to INT and it is more likely a money or decimal value, but hopefully you get the idea...
ASKER
8080 DIVER:
I tried your code butit returns errors:
Affected code is:
DECLARE @TempTtls (
Customer AS (nvarchar(35),null),
Revenu AS (decimal (29,2),null)
)
and related error is:
Incorrect Syntax near '('
And since the table variable isn't declared as consequence of error, the rest isn't working.
any idea?
Thanks
I tried your code butit returns errors:
Affected code is:
DECLARE @TempTtls (
Customer AS (nvarchar(35),null),
Revenu AS (decimal (29,2),null)
)
and related error is:
Incorrect Syntax near '('
And since the table variable isn't declared as consequence of error, the rest isn't working.
any idea?
Thanks
ASKER
MARK WILLS:
This query will be run several times to populate a table, by creating a view, wouldn't the next query return an error as 'view already exists'. Do I really need to create a view? is there another way?
appreciate your help
Thanks
This query will be run several times to populate a table, by creating a view, wouldn't the next query return an error as 'view already exists'. Do I really need to create a view? is there another way?
appreciate your help
Thanks
Only create the view once - it is a once-off step and you can then re-use it as much as you like. No need to create everytime.
And, if you already have an indentity, or time, or anything representing sequentially incrementing rows, you might not need the view. But with the view we have taken full control of the information we want to play with. The view does not contain data itself, it points to the datatables and is resolved at runtime / at time of use...
The advantage is you have a single CTE query (starts with the ";with" and ends with the "select ... from cte_revenu" - yep, just a "single" command)
And, if you already have an indentity, or time, or anything representing sequentially incrementing rows, you might not need the view. But with the view we have taken full control of the information we want to play with. The view does not contain data itself, it points to the datatables and is resolved at runtime / at time of use...
The advantage is you have a single CTE query (starts with the ";with" and ends with the "select ... from cte_revenu" - yep, just a "single" command)
DECLARE @TempTtls (
Customer AS (nvarchar(35),null),
Revenu AS (decimal (29,2),null)
)
Oops, forgot a piece. Add the word TABLE as shown in SQL_1 below.
The SP can be used to populate a table by simply adding INSERT INTO thetablename at the top of the last SELECT statement. If you are repopulating the table, (e.g. did this yesterday but want to get new results today), you can remove the appropriate rows from the table or update them. (However, if you add an Datetime column that you feed the current date to, you can build up a historical set of data for trend analysis. ;-)
I have attached a revised version of the SP (commenting out the pieces needed for the historical/trending analysis table ;-):
Customer AS (nvarchar(35),null),
Revenu AS (decimal (29,2),null)
)
Oops, forgot a piece. Add the word TABLE as shown in SQL_1 below.
The SP can be used to populate a table by simply adding INSERT INTO thetablename at the top of the last SELECT statement. If you are repopulating the table, (e.g. did this yesterday but want to get new results today), you can remove the appropriate rows from the table or update them. (However, if you add an Datetime column that you feed the current date to, you can build up a historical set of data for trend analysis. ;-)
I have attached a revised version of the SP (commenting out the pieces needed for the historical/trending analysis table ;-):
SQL_1:
SELECT Customer,
SUM(Revenu)
FROM YourTable
GROUP BY Customer;
SQL2:
CREATE PRCOEDURE usp_Pareto_Analysis
@Pct_Limit AS INT
AS
BEGIN
SET NOCOUNT ON;
-- DECLARE @Today DATETIME;
--
-- SET @Today = CAST(CONVERT(VARCHAR(10), GETDATE(), 101) AS DATETIME);
DECLARE @TempTtls TABLE
(
Customer AS varchar(XX),
LastName AS (appropriate datatype
);
INSERT INTO @TempTtls
SELECT Customer,
SUM(Revenu) AS Revenu
FROM YourTable
GROUP BY Customer;
SELECT
-- INSERT INTO YourHistoryTable
SELECT --Today ,
Customer,
Revenu,
AccumulatedRevenu,
(Revenu / AccumulatedRevenu) AS PctgAccumulatedRevenu
FROM
(
SELECT SUM(Revenu) AS TtlAccumRevenu
FROM @TempTtl
)Y,
(
SELECT T.Customer,
T.Revenu,
(
SELECT SUM(Revenu)
FROM @TempTtl
WHERE Customer <= T.Customer
) AS AccumulatedRevenu
FROM @TempTtl T
) Z
ORDER BY Customer
WHERE ((Revenu / AccumulatedRevenu) < @Pct_Limit);
END
Hi TechNovation,
Was extremely busy these last couple days and just say this question.
The ultimate query would look something like snippet and produces this output. I attached the script to set up and run this also.
Love these types of questions!
PareTo.txt
Was extremely busy these last couple days and just say this question.
The ultimate query would look something like snippet and produces this output. I attached the script to set up and run this also.
Love these types of questions!
;With
tblItemsWithRow
as
( SELECT Row_Number() Over(Order by Customer) as RowNum, Customer, SUM(Revenue) Revenue
FROM PareTo PT
GROUP BY Customer
)
--Final Query to return the running total and pcts
Select Customer,Revenue,
( Select Sum(Revenue) from tblItemsWithRow m where m.RowNum<= s.RowNum) as [Accumulated Revenue],
(( Select Sum(CONVERT(FLOAT,Revenue)) from tblItemsWithRow m where m.RowNum<= s.RowNum)/SUM(Revenue) OVER ())*100.0 as [Acc Rev %]
from tblItemsWithRow s
ORDER BY Customer
PareToOutput.pngPareTo.txt
ASKER
Actually the revenu column should be sorted descending before accumulating fo f should come before c in the order of row for customers,
this is because the purpose is to know which customers once aggregated make up 80% of the revenu.
how would the code look like?
this is because the purpose is to know which customers once aggregated make up 80% of the revenu.
how would the code look like?
Line 4 in CGLuttrell's code should be similar to the row_function() previously posted... Just include REVENU DESC, after the order by and before Customer...
sorry, that is SUM(REVENU)
CGluttrell, are you interested in discussing the differences in our CTE's ?
ASKER
The query (of CGluttrell) has been running now for more than 11 hours, is it more the server that might be busy or the query is 'demanding'?
ASKER
... and the table I'm testing in has only 20 million records, does is mean if I move to real capacity of hundreds of millions of records I will have to take something into account?
The query is a little demanding - it is doing an inline query back on the CTE query which does take some effort. 11 hours is a long time though...
wow,yeah we need to rethink this!
That many rows, this is where Analysis Server cubes would be real handy, but we aint got them so we will make do.
That many rows, this is where Analysis Server cubes would be real handy, but we aint got them so we will make do.
mark, sorry I have been away from the computer, but yes we can compare the cte statements if we need to, but we might need to look in another direction to avoid 11+ hours, I am thinking a aggregate temp table maybe like you were trying to do with the view?
guys, I've gotta go mow some grass so will be away for a bit again, sorry.
guys, I've gotta go mow some grass so will be away for a bit again, sorry.
Did you have a quick look at creating the view ? that step (ie create the view) takes no time, then do a select * from the_view can always can it and drop the view. The difference is that the view is doing a lot of the work and the CTE then just does the running total...
@CGluttrell, yep, think the view might be worth a shot. the aggregation is done there and the sort with row number so that the CTE "just" has to do the running total and calcs. Think it is worth trying to do a select * from the_view and see what kind of response. It wont be too dissimilar (in fact should be quicker) if there is a temp table.
ASKER
Mark I will have a go at your query as well with the view, but as it is quite complicated for me I could replace column names in the right places: the first header isn't customer but [customer name] and the second column is [Amount in local Currency], if you could please give me the query with these headings Iwill be gratefull. In the meantime CGLuttrell's query is still running.
OK, just the view first. A "once-off" step to create the view, and once created can then select from it... Now, all we really have to do is change the subquery, because we can use column aliases to give them new names. In our case we will use the real names as part of the subquery. You will have to change the table name in that same subquery from "my_revenu_table" to whatever it really is. So, should look something like :
create view vw_my_revenu_view
as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, revenu
from (select [customer name] as customer, sum([Amount in local currency]) as revenu from my_revenu_table group by [customer name]) a
go
Thats done. That easy. Do not have to worry about creating the view again (might need to change / fix) and so it sits there waiting, wanting to be used, and not costing anything (the luxury of being a view).
But that will change when we do go to use it. The subquery does the aggregations first, Then it will sort and calculcate a row_number(). So lets give it a try :
select * from vw_my_revenu_view
See how that goes, and we can move on from there.
create view vw_my_revenu_view
as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, revenu
from (select [customer name] as customer, sum([Amount in local currency]) as revenu from my_revenu_table group by [customer name]) a
go
Thats done. That easy. Do not have to worry about creating the view again (might need to change / fix) and so it sits there waiting, wanting to be used, and not costing anything (the luxury of being a view).
But that will change when we do go to use it. The subquery does the aggregations first, Then it will sort and calculcate a row_number(). So lets give it a try :
select * from vw_my_revenu_view
See how that goes, and we can move on from there.
ASKER
Select * run ok, I get 3 columns with revenu descending. But when I run the rest I get error:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "customer" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "customer" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
create view Customer_Invoices_view
as
select
convert(bigint, (row_number() over (order by revenu desc, customer))) as rn,
customer,
revenu
from
(
select
[customer name] as customer,
sum([Amount in local currency]) as revenu
from
dbo.SourceTable
group by
[customer name]
) a
go
----
select *
from
Customer_Invoices_view
---- this select * returns 3 columns: rn,customer, and revenu descending
;with cte_revenu as
( select
convert(int,0) as rn,
'' as customer,
convert(int,0) as revenu,
convert(int,0) as accumulated_revenu,
(select sum([Amount in local currency]) from dbo.SourceTable) as total_revenu
union all
select
c.rn + 1,
r.customer,
r.revenu,
r.revenu + c.accumulated_revenu,
c.total_revenu
from
cte_revenu c
inner join
Customer_Invoices_view r
on r.rn - 1 = c.rn
)
select *,
(accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution
from
cte_revenu
go
No, not running the CTE yet :) Just getting the performance of the view and if it is reasonable enough.
But what you are now observing is one nuance of the CTE. The data types must be identical (including scope and scale, not just type - e.g. varchar(1) will error with a base of varchar(10)).
You will note that I did have a series of converts in the CTE. That was mainly due to using numbers as INT so had to explicitly beat it into submission. Those INT of course is not what you are using in your real table.
So, what datatypes and size is [Customer Name] and [Amount in local currency] (we already know that rn is bigint, not int) ?
But what you are now observing is one nuance of the CTE. The data types must be identical (including scope and scale, not just type - e.g. varchar(1) will error with a base of varchar(10)).
You will note that I did have a series of converts in the CTE. That was mainly due to using numbers as INT so had to explicitly beat it into submission. Those INT of course is not what you are using in your real table.
So, what datatypes and size is [Customer Name] and [Amount in local currency] (we already know that rn is bigint, not int) ?
ASKER
[Customer name] (nvarchar(35),null)
[Amount in local Currency] (decimal (29,2),null)
[Amount in local Currency] (decimal (29,2),null)
OK then let's try :
;with cte_revenu as
( select -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
convert(bigint,0) as rn,
convert(nvarchar(35),'') as customer,
convert(decimal(29,2),0) as revenu,
convert(decimal(29,2),0) as accumulated_revenu,
convert(decimal(29,2),(sel ect sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu -- could maybe use the view here
union all
select
c.rn + 1,
r.customer,
r.revenu,
r.revenu + c.accumulated_revenu,
c.total_revenu
from
cte_revenu c
inner join
Customer_Invoices_view r
on r.rn - 1 = c.rn
)
select *,
(accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution -- could possibly include that in the body above.
from
cte_revenu
go
;with cte_revenu as
( select -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
convert(bigint,0) as rn,
convert(nvarchar(35),'') as customer,
convert(decimal(29,2),0) as revenu,
convert(decimal(29,2),0) as accumulated_revenu,
convert(decimal(29,2),(sel
union all
select
c.rn + 1,
r.customer,
r.revenu,
r.revenu + c.accumulated_revenu,
c.total_revenu
from
cte_revenu c
inner join
Customer_Invoices_view r
on r.rn - 1 = c.rn
)
select *,
(accumulated_revenu * 1.0 / total_revenu) * 100.0 as precent_contribution -- could possibly include that in the body above.
from
cte_revenu
go
yes, you have to convert those to the exact sizes, odd, I just ran into that also.
I just got it to come back for my test data set of 1M records but the next problem is we will run into max recurrsion errors so it stopped at 100 records and took 4:46 to run
I just got it to come back for my test data set of 1M records but the next problem is we will run into max recurrsion errors so it stopped at 100 records and took 4:46 to run
ASKER
Msg 156, Level 15, State 1, Line 7
Incorrect syntax near the keyword 'as'.
--- error related to:
convert(decimal(29,2),(sel ect sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near ')'.
--- error related to:
on r.rn - 1 = c.rn
)
select *,
Incorrect syntax near the keyword 'as'.
--- error related to:
convert(decimal(29,2),(sel
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near ')'.
--- error related to:
on r.rn - 1 = c.rn
)
select *,
I did not have to put a convert on that value probably because it is calculated off a real column and gets the proper type from there.
if you want to let it recurse forever add OPTION (MaxRecursion 0) at the end
ASKER
When I take away the convert from:
convert(decimal(29,2),(sel ect sum([Amount in local currency]) from dbo.SourceTable ) as total_revenu
I get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
When I put the convert back, I get:
Msg 156, Level 15, State 1, Line 7
Incorrect syntax near the keyword 'as'.
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near ')'.
So I'm stuck in between
convert(decimal(29,2),(sel
I get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
When I put the convert back, I get:
Msg 156, Level 15, State 1, Line 7
Incorrect syntax near the keyword 'as'.
Msg 102, Level 15, State 1, Line 20
Incorrect syntax near ')'.
So I'm stuck in between
oh, with it in there you are missing a closing ) before the AS
convert(decimal(29,2),(sel ect sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu
convert(decimal(29,2),(sel
ASKER
added the closing )
get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
get:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "revenu" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "accumulated_revenu" of recursive query "cte_revenu".
Sorry guys, had some visitors - it is Mother's day today in downtown Australia...
Let me type it up for you.
We can use the real columns to get the format of the anchor but best if we simply treat the anchor almost like a table definition.
But let me type it all up for you... Back in a second - will get rid of the riff raff...
OK, they've gone...
2 things.
1) we will alter the view a bit
2) we will be more specific about datatypes in the CTE
3) also decided to do the percent calc in the recursive part of the query (and can be used in the select bit down the very bottom)
I am here and waiting now, so can give you some undivided attention.
Let me type it up for you.
We can use the real columns to get the format of the anchor but best if we simply treat the anchor almost like a table definition.
But let me type it all up for you... Back in a second - will get rid of the riff raff...
OK, they've gone...
2 things.
1) we will alter the view a bit
2) we will be more specific about datatypes in the CTE
3) also decided to do the percent calc in the recursive part of the query (and can be used in the select bit down the very bottom)
I am here and waiting now, so can give you some undivided attention.
ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable group by [customer name]) a
go
;with cte_revenu as
( select -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
convert(bigint,0) as rn,
convert(nvarchar(50),'') as customer,
convert(decimal(29,2),0) as revenu,
convert(decimal(29,2),0) as accumulated_revenu,
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu,
convert(decimal(18,3),0) as percent_contribution
union all
select
c.rn + 1,
r.customer,
r.revenu,
convert(decimal(29,2),(r.revenu + c.accumulated_revenu)),
c.total_revenu,
convert(decimal(18,3),((r.revenu + c.accumulated_revenu) / total_revenu) * 100 ) -- could maybe use the view here
from
cte_revenu c
inner join
Customer_Invoices_view r
on r.rn - 1 = c.rn
)
select *
from cte_revenu
where rn > 0
go
@CGluttrell, you are correct, one BIG advantage of establising the initial data set from the actual table. Using data that is not explicitly typed, or conforms to a standard expression in a CTE is not handled particularly well. That can also include computed columns (like revenu + accumulated_revenu). Do not understand why it is so very very particular, but it is. Just on calculated columns, including aggregated functions, values are returned in the most precise expression data type - as determined by SQL. For that reason, we are beginning to see a re-emergance of the money datatype which has implicit scale for decimal operations in places where the value is not a monetry value. Ah well... lifes little mysteries.
Still waiting... Just letting you know...
ASKER
Thanks,I run the code, the first part is ok. the second returned error:
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "customer" of recursive query "cte_revenu".
Msg 240, Level 16, State 1, Line 1
Types don't match between the anchor and the recursive part in column "customer" of recursive query "cte_revenu".
Wow, I did not get that one...
And can see why - mine was done with nvarchar(50) and I think it is nvarchar(35)...
I really do apologise for this thread dragging on, but think we are so very close now. So long as that view is returning results fairly quickly the rest should be OK.
And can see why - mine was done with nvarchar(50) and I think it is nvarchar(35)...
I really do apologise for this thread dragging on, but think we are so very close now. So long as that view is returning results fairly quickly the rest should be OK.
ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),revenu) as revenu
from (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable group by [customer name]) a
go
;with cte_revenu as
( select -- this first part is the "anchor" and we are really just making a starting point for the recursive part.
convert(bigint,0) as rn,
convert(nvarchar(35),'') as customer,
convert(decimal(29,2),0) as revenu,
convert(decimal(29,2),0) as accumulated_revenu,
convert(decimal(29,2),(select sum([Amount in local currency]) from dbo.SourceTable )) as total_revenu,
convert(decimal(18,3),0) as percent_contribution
union all
select
c.rn + 1,
r.customer,
r.revenu,
convert(decimal(29,2),(r.revenu + c.accumulated_revenu)),
c.total_revenu,
convert(decimal(18,3),((r.revenu + c.accumulated_revenu) / total_revenu) * 100 ) -- could maybe use the view here
from
cte_revenu c
inner join
Customer_Invoices_view r
on r.rn - 1 = c.rn
)
select *
from cte_revenu
where rn > 0
go
ASKER
query running which I think is a good sign, I'm awaiting some results to appear
If you want to limit the amount of test data, can add into the view a few where clauses, or use the "top" qualifier e.g.
ALTER VIEW ... AS
SELECT TOP 2000 ...
Another small advantage of using views...
ALTER VIEW ... AS
SELECT TOP 2000 ...
Another small advantage of using views...
ASKER
good point i'm working on the top 2000, still running
Well, dont forget that it is aggregating 2000 customers...
How long does the query take to complete ?
How long does the query take to complete ?
ASKER
didn't finish yet, I cancelled and changed to top 20 as I'm desparate to see some results. Actually the server is overloaded with other venry demanding queries from colleagues, so I do not worry about how much time it takes, I just want to see "something"
ASKER
I checked with:
select count(*) from customer_invoices_view
and there is indeed only 20 rows now but waiting for the result of the real query
select count(*) from customer_invoices_view
and there is indeed only 20 rows now but waiting for the result of the real query
ASKER
I don't know what's going on, I run other queries and they return some results. CGluttrell's query is still running now for 1 day and 8,5 hours.
The query with the view (with top 20) is now running for 45 minutes. Well when I get something back I will let you know
The query with the view (with top 20) is now running for 45 minutes. Well when I get something back I will let you know
Must be something going on - can you see any lock contentions ? (SSMS, management, activity monitor)
When you say "the query with the view" is that the CTE query ? or the select * from customer_invoices_view ?
You will need to kill those queries. We can get the results otherways.
If it is the CTE query that is taking all the time, and the select * from view works quickly, then try this:
select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c
if that is OK, (ie returns a result fairly quickly) then try this:
select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT
Now, you can also change the view to include a nolock option which might also help...
ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),reve nu) as revenu
from (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
When you say "the query with the view" is that the CTE query ? or the select * from customer_invoices_view ?
You will need to kill those queries. We can get the results otherways.
If it is the CTE query that is taking all the time, and the select * from view works quickly, then try this:
select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c
if that is OK, (ie returns a result fairly quickly) then try this:
select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT
Now, you can also change the view to include a nolock option which might also help...
ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),reve
from (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
ASKER
the first:
select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c
took 32 minutes.
I will run the second now:
select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT
select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c
took 32 minutes.
I will run the second now:
select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT
32 minutes - ouch... will wait for next one, expect not much more...
ASKER
OK 22 minutes but as is the case for the first part, it returns the accumulated revenu not descending (actually in no apparent order), and not adding up to the total revenu.
The percent contribution is not adding up to 100%
The percent contribution is not adding up to 100%
having the top nnn in the view will stuff it up a bit - it is more about getting something out and seeing how long.
you know the interesting thing is the the most efficient query takes longer and so far, the more innefficient query has worked the fastest.
think it is time to remove the top nnn statement from the view and run it again. do you know what percent the top nnn is of the total number of customers ?
ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),reve nu) as revenu
from (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
you know the interesting thing is the the most efficient query takes longer and so far, the more innefficient query has worked the fastest.
think it is time to remove the top nnn statement from the view and run it again. do you know what percent the top nnn is of the total number of customers ?
ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),reve
from (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
An alternative to using recursion:
--------------- START SAMPLE DATA SETUP ---------------
DECLARE @sample TABLE (customer char(1), revenue money)
INSERT @sample
(customer, revenue)
SELECT 'a' [Customer], 4 [Revenue]
UNION ALL SELECT 'b' [Customer], 5 [Revenue]
UNION ALL SELECT 'a' [Customer], 6 [Revenue]
UNION ALL SELECT 'a' [Customer], 3 [Revenue]
UNION ALL SELECT 'f' [Customer], 6 [Revenue]
UNION ALL SELECT 'b' [Customer], 4 [Revenue]
UNION ALL SELECT 'c' [Customer], 4 [Revenue]
UNION ALL SELECT 'a' [Customer], 4 [Revenue]
UNION ALL SELECT 'r' [Customer], 4 [Revenue]
--------------- FINISH SAMPLE DATA SETUP --------------
DECLARE @customers TABLE (
priority int PRIMARY KEY NOT NULL,
customer char(1) NOT NULL,
revenue money NOT NULL,
accumulatedRevenue money NOT NULL DEFAULT(0)
)
-- aggregate to customer level
INSERT @customers
(priority, customer, revenue)
SELECT
ROW_NUMBER() OVER (ORDER BY X.revenue DESC, X.customer) [priority],
X.customer,
X.revenue
FROM
(
SELECT
customer,
SUM(revenue) [revenue]
FROM
@sample
GROUP BY
customer
) X
-- find total revenue
DECLARE @totalRevenue money
SET @totalRevenue = (SELECT SUM(revenue) FROM @customers)
-- calculate accumulated revenue based on priority
UPDATE
Y
SET
accumulatedRevenue = X.accumulatedRevenue
FROM
@customers Y
INNER JOIN (
SELECT
A.priority,
SUM(B.revenue) [accumulatedRevenue]
FROM
@customers A
INNER JOIN @customers B ON B.priority <= A.priority
GROUP BY
A.priority
) X ON X.priority = Y.priority
-- get final results
SELECT
customer,
revenue,
revenue / @totalRevenue [contribution]
FROM
@customers
WHERE
accumulatedRevenue <= (@totalRevenue * 0.8)
ORDER BY
revenue DESC,
customer
ASKER
Mark,
after 7 hours of running, your query returned the error:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
GrepT Smith,
thanks I will try your query: could you please change it to reflect the real filed names:
[Customer name] (nvarchar(35),null)
[Amount in local Currency] (decimal (29,2),null)
Thank you
after 7 hours of running, your query returned the error:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
GrepT Smith,
thanks I will try your query: could you please change it to reflect the real filed names:
[Customer name] (nvarchar(35),null)
[Amount in local Currency] (decimal (29,2),null)
Thank you
That's interesting... 7 hours ? Max recursion - that means you are running that CTE ?
I thought we were just selecting from that view for the moment, trying to get the correct rows...
ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),reve nu) as revenu
from (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
--then
select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT
Have we tried that yet ?
I thought we were just selecting from that view for the moment, trying to get the correct rows...
ALTER VIEW customer_invoices_view as
select convert(bigint, (row_number() over (order by revenu desc, customer))) as rn, customer, convert(decimal(29,2),reve
from (select [customer name] as customer, sum([Amount in local currency]) as revenu from dbo.sourcetable with (nolock) group by [customer name]) a
go
--then
select rn, customer, revenu, accumulated_revenu, total_revenu, accumulated_revenu / total_revenu * 100 as percent_contribution
from (select c.*, (select sum(revenu) from customer_invoices_view v where v.rn <= c.rn) as accumulated_revenu
from customer_invoices_view c ) s , (select sum(revenu) total_revenu from customer_invoices_view) as GT
Have we tried that yet ?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Mark, you last code simply WORKS, I get exactly what I'm looking for, the only thing is that I do not understand why we are still using the view "customer_invoices_view" in the code. Do we need it? meaning do I have the future to
FIRST use the code for the view (by the way which one? there are many) and
THEN create the temporary table and
THEN declare the stored procedure?
If you were to summaries the whole code from the start to finish what would it look like? or is it simply the one above? Do I need to have both a view and a temp table?
Thanks
FIRST use the code for the view (by the way which one? there are many) and
THEN create the temporary table and
THEN declare the stored procedure?
If you were to summaries the whole code from the start to finish what would it look like? or is it simply the one above? Do I need to have both a view and a temp table?
Thanks
ASKER
Thank you
ASKER
Ok ignore my question about the view, as i just replaced it with the temp table and it worked as well. So it should be fine. MANY THANKS
Glad you found that, and you are correct, it is not needed. Just the temp table, which also means just that snippet.
Sure glad we got to the bottom of this and still a little confused as to why the CTE didn't happen... Ah well, we now have an answer, and very happy to have been of help...
Sure glad we got to the bottom of this and still a little confused as to why the CTE didn't happen... Ah well, we now have an answer, and very happy to have been of help...
congratulations mark, glad you were able to put it together. I was thinking a temp table might be what we would have to do back in ID: 24345761 but never got to finish putting it together, got too busy. The reason just the simple CTE and the solution Greg offered up with the table variable would have run into issues also it the amount of memory required to aggregate the millions of rows and both of those solutions were memory bound while the #tmpTable wrote the intermediate data to disk instead. I believe we all learned something here, I know I did.
Thanks guys, till next time.
Thanks guys, till next time.
Okay, re-typing now. ;-)
SQL_1 below gives the totals by Customer. So, I'd use that in a Stored Procedure (see SQL_2) which I think kshould provide you with what you want.
Open in new window