edwinson
asked on
Cummulative counts etc in SQL query
I apologize if this question has been asked before, but I get a timed out response when searching the database so this is my last resort.
I am trying to build an SQL query that will return data to be displayed/accessed via ADO in an application. Basically, the result should show cummulative counts and percentages. Here is the basic SQL where I am starting.
SELECT distinct var1,
count(var1) as [Count],
CONVERT(decimal,COUNT(var1 )) / 690 as [PCT]
FROM dbo.BinData
GROUP BY var1
My thought is to modify this statement to use a local declare variable. I can't seem to figure out how to:
1.) set the value of the local variable say @cumpct = @cumpct + [pct]
2.) load the value of @cumpct into a new field in the query such as @cumpct as [Cum Pct]
If I can figure out this problem, I will have the answer to multiple other hacked workarounds I have used in the past.
I am trying to build an SQL query that will return data to be displayed/accessed via ADO in an application. Basically, the result should show cummulative counts and percentages. Here is the basic SQL where I am starting.
SELECT distinct var1,
count(var1) as [Count],
CONVERT(decimal,COUNT(var1
FROM dbo.BinData
GROUP BY var1
My thought is to modify this statement to use a local declare variable. I can't seem to figure out how to:
1.) set the value of the local variable say @cumpct = @cumpct + [pct]
2.) load the value of @cumpct into a new field in the query such as @cumpct as [Cum Pct]
If I can figure out this problem, I will have the answer to multiple other hacked workarounds I have used in the past.
ASKER
I'll try to be a little more clear.
Basically I want to produce a table that looks something like this. Although this is very simplified for the purposes of illustration.
Var1 Quantity Pct Cummulative Pct
0 222 32.2 32.2
1 468 67.8 100
The cummulative pct is simply the pct of the previous record plus the value of the pct from the current record.
I want a deduped table with the values of var1 listed so I build the first part of the sql as follows:
Select distinct var1,
Since I want a count of the number of records for each value of var1 I use:
count(var1) as Quantity,
I happen to know the table I am working with has 690 records so the calculation of the percentage of data is a bit easier.
convert(decimal,count(var1 )/690 as Pct
The data is coming from a table called BinData so we add:
from BinData
The aggregate function Count requires a group by clause so:
group by var1
Thus the entire sql so far is
Select Distinct var1,
count(var1) as Quantity,
convert(decimal,count(var1 ))/690 as PCT
from BinData
Group by var1
This gets everything EXCEPT the Cummulative percentage. HOW can I get to that without screwing around in the application code to generate the cummulatives as I do now. Imagine an instance where there are millions of unique values of Var1. It would be terribly ineffective to process the cummulatives with application code if I can pass the data in an SQL statement to do the same thing.
By the way this sql is dynamic in the sense that the application formulates the query based on user selection of a particular variable. In this example the user identified var1, however it can be virtually any variable in the BinData table.
The results are accessed via an ADO recordset object that uses the SQL string as the source.
Basically I want to produce a table that looks something like this. Although this is very simplified for the purposes of illustration.
Var1 Quantity Pct Cummulative Pct
0 222 32.2 32.2
1 468 67.8 100
The cummulative pct is simply the pct of the previous record plus the value of the pct from the current record.
I want a deduped table with the values of var1 listed so I build the first part of the sql as follows:
Select distinct var1,
Since I want a count of the number of records for each value of var1 I use:
count(var1) as Quantity,
I happen to know the table I am working with has 690 records so the calculation of the percentage of data is a bit easier.
convert(decimal,count(var1
The data is coming from a table called BinData so we add:
from BinData
The aggregate function Count requires a group by clause so:
group by var1
Thus the entire sql so far is
Select Distinct var1,
count(var1) as Quantity,
convert(decimal,count(var1
from BinData
Group by var1
This gets everything EXCEPT the Cummulative percentage. HOW can I get to that without screwing around in the application code to generate the cummulatives as I do now. Imagine an instance where there are millions of unique values of Var1. It would be terribly ineffective to process the cummulatives with application code if I can pass the data in an SQL statement to do the same thing.
By the way this sql is dynamic in the sense that the application formulates the query based on user selection of a particular variable. In this example the user identified var1, however it can be virtually any variable in the BinData table.
The results are accessed via an ADO recordset object that uses the SQL string as the source.
ASKER
By the way this code does not work:
SELECT distinct var1
SUM( CONVERT( DECIMAL, COUNT( var1 ) ) / 690 )
AS 'the_sum'
FROM
dbo.BinData
The aggregate functions require a group by clause.
The aggregate functions cannot be nested as they are in this code.
Query Analyzer had a coronary attack with this code.
SELECT distinct var1
SUM( CONVERT( DECIMAL, COUNT( var1 ) ) / 690 )
AS 'the_sum'
FROM
dbo.BinData
The aggregate functions require a group by clause.
The aggregate functions cannot be nested as they are in this code.
Query Analyzer had a coronary attack with this code.
Few comments.
Dont need the distinct - group by already does that.
Prefer count(*) to count(Var1) - sql server is optimised for it.
I put an identity on #a but it's not necessary as you can use var1 for the inequality - that might be a bit faster but the id might be useful for other things. Could also do without the update just return the result.
(I've given the code below for that)
Not sure if this is what you want.
Haven't tried it but should be close to what I am intending to do
SELECT identity(int,1,1) as id
var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
convert(decimal,0) as [PCT_cum]
into #a
FROM dbo.BinData
GROUP BY var1
order by var1
update #a
set PCT_CUM = (select sum(PCT) from #a a2 where a2.id <= #a.id)
select var1, Count, PCT, PCT_cum from #a
order by id
drop table #a
Shorter code
SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
into #a
FROM dbo.BinData
GROUP BY var1
select var1, Count, PCT,
PCT_cum = (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1
drop table #a
Dont need the distinct - group by already does that.
Prefer count(*) to count(Var1) - sql server is optimised for it.
I put an identity on #a but it's not necessary as you can use var1 for the inequality - that might be a bit faster but the id might be useful for other things. Could also do without the update just return the result.
(I've given the code below for that)
Not sure if this is what you want.
Haven't tried it but should be close to what I am intending to do
SELECT identity(int,1,1) as id
var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
convert(decimal,0) as [PCT_cum]
into #a
FROM dbo.BinData
GROUP BY var1
order by var1
update #a
set PCT_CUM = (select sum(PCT) from #a a2 where a2.id <= #a.id)
select var1, Count, PCT, PCT_cum from #a
order by id
drop table #a
Shorter code
SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
into #a
FROM dbo.BinData
GROUP BY var1
select var1, Count, PCT,
PCT_cum = (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1
drop table #a
Lucky guess - looks like that is what you want.
I think I see what you are trying... basically, you are looking for a running total to appear in the query, something like this for this data:
MyTable: Id, Sequence, Charge
1 1 40
1 2 37.5
1 3 100
1 4 54
you want this result:
1 1 40 40
1 2 37.5 77.5
1 3 100 177.5
1 4 54 231.5
If this is what you are looking for, you'll probably need to use a temp table and a cursor.
Using your sample query, here's how I would probably end up doing it:
CREATE TABLE #Working (var1 varchar(10), [Count] int, Pct decimal(12,4), Cumulative decimal(12,4))
INSERT INTO #Working (var1, [Count], Pct)
SELECT distinct var1,
count(var1) as [Count],
CONVERT(decimal,COUNT(var1 )) / 690 as [PCT]
FROM dbo.BinData
GROUP BY var1
Declare c_Working CURSOR for
SELECT Pct From #Working Order By var1
Declare @Running decimal(12,4)
Declare @Pct decimal(12,4)
OPEN c_Working
Fetch Next From c_Working INTO @pct
While @@FETCH_STATUS = 0
BEGIN
Set @Running = @Running + @Pct
Update #Working Set Cumulative = @Running WHERE CURRENT OF c_Working
Fetch Next From c_Working
END
Close c_Working
Deallocate c_Working
MyTable: Id, Sequence, Charge
1 1 40
1 2 37.5
1 3 100
1 4 54
you want this result:
1 1 40 40
1 2 37.5 77.5
1 3 100 177.5
1 4 54 231.5
If this is what you are looking for, you'll probably need to use a temp table and a cursor.
Using your sample query, here's how I would probably end up doing it:
CREATE TABLE #Working (var1 varchar(10), [Count] int, Pct decimal(12,4), Cumulative decimal(12,4))
INSERT INTO #Working (var1, [Count], Pct)
SELECT distinct var1,
count(var1) as [Count],
CONVERT(decimal,COUNT(var1
FROM dbo.BinData
GROUP BY var1
Declare c_Working CURSOR for
SELECT Pct From #Working Order By var1
Declare @Running decimal(12,4)
Declare @Pct decimal(12,4)
OPEN c_Working
Fetch Next From c_Working INTO @pct
While @@FETCH_STATUS = 0
BEGIN
Set @Running = @Running + @Pct
Update #Working Set Cumulative = @Running WHERE CURRENT OF c_Working
Fetch Next From c_Working
END
Close c_Working
Deallocate c_Working
Only 690 recs could
declare @count int
select @count= count(*) from BinData
SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / @count as [PCT]
into #a
FROM BinData
GROUP BY var1
select var1, Count, PCT,
PCT_cum = (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1
drop table #a
or this might even work without a temp table (could even do it in a single query - don't recommend either though).
declare @count int
select @count= count(*) from BinData
select b.var1, min(b.Count), min(b.PCT),
PCT_cum = sum(a.PCT)
from
(SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / @count as [PCT]
FROM BinData
GROUP BY var1
) as a
join
(SELECT var1,
CONVERT(decimal,COUNT(*)) / @count as [PCT]
FROM BinData
GROUP BY var1
) as b
on a.var1 <= b.var1
group by b.var1
order by b.var1
declare @count int
select @count= count(*) from BinData
SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / @count as [PCT]
into #a
FROM BinData
GROUP BY var1
select var1, Count, PCT,
PCT_cum = (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1
drop table #a
or this might even work without a temp table (could even do it in a single query - don't recommend either though).
declare @count int
select @count= count(*) from BinData
select b.var1, min(b.Count), min(b.PCT),
PCT_cum = sum(a.PCT)
from
(SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / @count as [PCT]
FROM BinData
GROUP BY var1
) as a
join
(SELECT var1,
CONVERT(decimal,COUNT(*)) / @count as [PCT]
FROM BinData
GROUP BY var1
) as b
on a.var1 <= b.var1
group by b.var1
order by b.var1
My bad... I spaced the fact that the aggregate function would cause a puking without a 'group by' clause. :)
My bad... I spaced the fact that the aggregate function would cause a puking without a 'group by' clause. :)
My bad... I spaced the fact that the aggregate function would cause a puking without a 'group by' clause. :)
rayb64
You are probably hitting the refresh button - that unfortunately posts the comment again. There is a reload question at the top.
Everyone does it :-).
More than once.
You are probably hitting the refresh button - that unfortunately posts the comment again. There is a reload question at the top.
Everyone does it :-).
More than once.
ASKER
Seems to me this is more of a pain in the arse than what I was doing before. I would use a relatively simple SQL query to extract all the NON cummulative data.
Then using an ADO recordset build a new recordset with the non cummulative data columns and creating the cummulative columns.
Then I build another ADO recordset using the constructed SQL as the source for the ADO recordset:
Select Distinct var1,
count(var1) as Quantity,
convert(decimal,count(var1 ))/690 as PCT
from BinData
Group by var1
VB code
Sub BuildReport()
dim rstSQL as new adodb.recordset
dim rstRpt as new adodb.recordset
dim dblTmp as double
rstsql.open "....SQL....",Connection,. ..
rstrpt.fields.append "Var1",advwchar,255
rstrpt.fields.append "Count",advarwchar,255
rstrpt.fields.append "PCT", advarwchar,255
rstrpt.fields.append "Cum Pct",advarwchar,255
rstrpt.open
dim mfield as adodb.field
do until rstsql.eof
'adds a new record to the report recordset
rstrpt.addnew
'take each field from the sql data and copy the data to
'the report recordset
for each mfield in rstsql.fields
rstrpt.fields(mfield.name) .value=val (mfield.va lue)
next mfield
'use a temporary variable to hold cummulative pct
dbltmp=dbltmp+rstrpt.field s("PCT").v alue
'copy the cummulative pct to the report field
rstrpt.fields("Cum Pct").value=dbltmp
'move to the next sql record
rstsql.movenext
'update the report recordset
rstrpt.update
loop
'show the data in the grid display
set mshflexgrid1.datasource=rs trpt
end sub
so you can see it isn't easy either way, but I thought that just maybe other people might have seen this problem and want to do something similar. Hey maybe this should be part of a suite of Statistical functions for sale?
Assuming there are no other suggestions from others, I will pick the best comment from above and award the full amount of the points.
Then using an ADO recordset build a new recordset with the non cummulative data columns and creating the cummulative columns.
Then I build another ADO recordset using the constructed SQL as the source for the ADO recordset:
Select Distinct var1,
count(var1) as Quantity,
convert(decimal,count(var1
from BinData
Group by var1
VB code
Sub BuildReport()
dim rstSQL as new adodb.recordset
dim rstRpt as new adodb.recordset
dim dblTmp as double
rstsql.open "....SQL....",Connection,.
rstrpt.fields.append "Var1",advwchar,255
rstrpt.fields.append "Count",advarwchar,255
rstrpt.fields.append "PCT", advarwchar,255
rstrpt.fields.append "Cum Pct",advarwchar,255
rstrpt.open
dim mfield as adodb.field
do until rstsql.eof
'adds a new record to the report recordset
rstrpt.addnew
'take each field from the sql data and copy the data to
'the report recordset
for each mfield in rstsql.fields
rstrpt.fields(mfield.name)
next mfield
'use a temporary variable to hold cummulative pct
dbltmp=dbltmp+rstrpt.field
'copy the cummulative pct to the report field
rstrpt.fields("Cum Pct").value=dbltmp
'move to the next sql record
rstsql.movenext
'update the report recordset
rstrpt.update
loop
'show the data in the grid display
set mshflexgrid1.datasource=rs
end sub
so you can see it isn't easy either way, but I thought that just maybe other people might have seen this problem and want to do something similar. Hey maybe this should be part of a suite of Statistical functions for sale?
Assuming there are no other suggestions from others, I will pick the best comment from above and award the full amount of the points.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
nigelrivett,
I agree it is less complicated and more efficient.
Here is the slightly revised version of the SQL statement.
The second select statement originally was in error since the field I named "Count" also happens to be a function, thus if referring to a field by this name it needs to be referenced [Count].
What is throwing me just a bit is the line:
pct_cum= (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
I am assuming the a2 reference in the select statement is a system assigned temporary number. Is it a2 because it is an index of the #a reference earlier in the stream?
SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
into #a
FROM dbo.BinData
GROUP BY var1
select var1, [Count], PCT,
pct_cum= (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1
drop table #a
I agree it is less complicated and more efficient.
Here is the slightly revised version of the SQL statement.
The second select statement originally was in error since the field I named "Count" also happens to be a function, thus if referring to a field by this name it needs to be referenced [Count].
What is throwing me just a bit is the line:
pct_cum= (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
I am assuming the a2 reference in the select statement is a system assigned temporary number. Is it a2 because it is an index of the #a reference earlier in the stream?
SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
into #a
FROM dbo.BinData
GROUP BY var1
select var1, [Count], PCT,
pct_cum= (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1
drop table #a
Are you trying to do this as a SQL string passed from the application?
In the first part of your question, a better solution than using a local variable might be to do something like this:
SELECT distinct var1
SUM( CONVERT( DECIMAL, COUNT( var1 ) ) / 690 )
AS 'the_sum'
FROM
dbo.BinData
This is a potential solution, but, what are you really after? Are you looking to receive the results of the query as is and then to have summary information available as well?
A temp table might be a pretty good compromise in this situation. Consider this solution (but do it as a stored proc):
SELECT
DISTINCT var1 AS 'var1'
,COUNT( var1 ) AS 'quantity'
,CONVERT( DECIMAL, COUNT( var1 ) ) / 690 AS 'PCT'
INTO #tabular_data
FROM
dbo.BinData
GROUP BY
var1
SELECT
var1
,quantity
,PCT
FROM
#tabular_data
SELECT
SUM( ISNULL( PCT, 0 ) ) AS 'total_amount'
FROM
#tabular_data
DROP TABLE #tabular_data
If you incorporate the code above into a stored proc, you'll need to use ADO to navigate two recordsets. If you aren't certain how to do this, you'll need to check the MS documentation. At one point in time, there was no way to navigate from the first recordset to the second and then back to the first again. I don't know if it is now possible... Again, check the ADO documentation.
Without knowing exactly the data you are after and the formats you require, I can't really be more helpful than this.
Ray