Link to home
Start Free TrialLog in
Avatar of edwinson
edwinsonFlag for United States of America

asked on

Cummulative counts etc in SQL query

I apologize if this question has been asked before, but I get a timed out response when searching the database so this is my last resort.

I am trying to build an SQL query that will return data to be displayed/accessed via ADO in an application.  Basically, the result should show cummulative counts and percentages.  Here is the basic SQL where I am starting.

SELECT distinct var1,
       count(var1) as [Count],
       CONVERT(decimal,COUNT(var1)) / 690 as [PCT]
       FROM dbo.BinData
       GROUP BY var1

My thought is to modify this statement to use a local declare variable. I can't seem to figure out how to:
1.) set the value of the local variable say @cumpct = @cumpct + [pct]
2.) load the value of @cumpct into a new field in the query such as @cumpct as [Cum Pct]

If I can figure out this problem, I will have the answer to multiple other hacked workarounds I have used in the past.
Avatar of rayb64
rayb64


Are you trying to do this as a SQL string passed from the application?

In the first part of your question, a better solution than using a local variable might be to do something like this:

SELECT distinct var1
  SUM( CONVERT( DECIMAL, COUNT( var1 ) ) / 690 )
    AS 'the_sum'
FROM
  dbo.BinData

This is a potential solution, but, what are you really after?  Are you looking to receive the results of the query as is and then to have summary information available as well?

A temp table might be a pretty good compromise in this situation.  Consider this solution (but do it as a stored proc):

SELECT
  DISTINCT var1 AS 'var1'
  ,COUNT( var1 ) AS 'quantity'
  ,CONVERT( DECIMAL, COUNT( var1 ) ) / 690 AS 'PCT'
INTO #tabular_data
FROM
  dbo.BinData
GROUP BY
  var1

SELECT
  var1
  ,quantity
  ,PCT
FROM
  #tabular_data

SELECT
  SUM( ISNULL( PCT, 0 ) ) AS 'total_amount'
FROM
  #tabular_data

DROP TABLE #tabular_data

If you incorporate the code above into a stored proc, you'll need to use ADO to navigate two recordsets.  If you aren't certain how to do this, you'll need to check the MS documentation.  At one point in time, there was no way to navigate from the first recordset to the second and then back to the first again.  I don't know if it is now possible...  Again, check the ADO documentation.

Without knowing exactly the data you are after and the formats you require, I can't really be more helpful than this.

Ray
Avatar of edwinson

ASKER

I'll try to be a little more clear.

Basically I want to produce a table that looks something like this. Although this is very simplified for the purposes of illustration.

Var1     Quantity      Pct      Cummulative Pct
0          222         32.2          32.2
1          468         67.8         100

The cummulative pct is simply the pct of the previous record plus the value of the pct from the current record.

I want a deduped table with the values of var1 listed so I build the first part of the sql as follows:

Select distinct var1,

Since I want a count of the number of records for each value of var1 I use:
       
      count(var1) as Quantity,

I happen to know the table I am working with has 690 records so the calculation of the percentage of data is a bit easier.

      convert(decimal,count(var1)/690 as Pct

The data is coming from a table called BinData so we add:
     
      from BinData

The aggregate function Count requires a group by clause so:
 
     group by var1


Thus the entire sql so far is

Select Distinct var1,
       count(var1) as Quantity,
       convert(decimal,count(var1))/690 as PCT
from BinData
Group by var1


This gets everything EXCEPT the Cummulative percentage.  HOW can I get to that without screwing around in the application code to generate the cummulatives as I do now.  Imagine an instance where there are millions of unique values of Var1.  It would be terribly ineffective to process the cummulatives with application code if I can pass the data in an SQL statement to do the same thing.

By the way this sql is dynamic in the sense that the application formulates the query based on user selection of a particular variable.  In this example the user identified var1, however it can be virtually any variable in the BinData table.

The results are accessed via an ADO recordset object that uses the SQL string as the source.          
By the way this code does not work:

SELECT distinct var1
 SUM( CONVERT( DECIMAL, COUNT( var1 ) ) / 690 )
   AS 'the_sum'
FROM
 dbo.BinData

The aggregate functions require a group by clause.
The aggregate functions cannot be nested as they are in this code.

Query Analyzer had a coronary attack with this code.
Few comments.
Dont need the distinct - group by already does that.
Prefer count(*) to count(Var1) - sql server is optimised for it.
I put an identity on #a but it's not necessary as you can use var1 for the inequality - that might be a bit faster but the id might be useful for other things. Could also do without the update just return the result.
(I've given the code below for that)
Not sure if this is what you want.
Haven't tried it but should be close to what I am intending to do

SELECT identity(int,1,1) as id
var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
convert(decimal,0) as [PCT_cum]
into #a
FROM dbo.BinData
GROUP BY var1
order by var1

update #a
set PCT_CUM = (select sum(PCT) from #a a2 where a2.id <= #a.id)

select var1, Count, PCT, PCT_cum from #a
order by id

drop table #a


Shorter code

SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
into #a
FROM dbo.BinData
GROUP BY var1

select var1, Count, PCT,
PCT_cum  = (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1

drop table #a



Lucky guess - looks like that is what you want.
Avatar of Brendt Hess
I think I see what you are trying... basically, you are looking for a running total to appear in the query, something like this for this data:

MyTable:  Id, Sequence, Charge

1   1    40
1   2    37.5
1   3    100
1   4    54

you want this result:

1   1    40     40
1   2    37.5   77.5
1   3    100    177.5
1   4    54     231.5

If this is what you are looking for, you'll probably need to use a temp table and a cursor.

Using your sample query, here's how I would probably end up doing it:

CREATE TABLE #Working (var1 varchar(10), [Count] int, Pct decimal(12,4), Cumulative decimal(12,4))
INSERT INTO #Working (var1, [Count], Pct)
SELECT distinct var1,
      count(var1) as [Count],
      CONVERT(decimal,COUNT(var1)) / 690 as [PCT]
      FROM dbo.BinData
      GROUP BY var1

Declare c_Working CURSOR for
SELECT Pct From #Working Order By var1
Declare @Running decimal(12,4)
Declare @Pct decimal(12,4)
OPEN c_Working
Fetch Next From c_Working INTO @pct
While @@FETCH_STATUS = 0
BEGIN
   Set @Running = @Running + @Pct
   Update #Working Set Cumulative = @Running WHERE CURRENT OF c_Working
   Fetch Next From c_Working
END
Close c_Working
Deallocate c_Working


Only 690 recs could

declare @count int
select @count= count(*) from BinData

SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / @count as [PCT]
into #a
FROM BinData
GROUP BY var1

select var1, Count, PCT,
PCT_cum  = (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1

drop table #a

or this might even work without a temp table (could even do it in a single query - don't recommend either though).

declare @count int
select @count= count(*) from BinData

select b.var1, min(b.Count), min(b.PCT),
PCT_cum = sum(a.PCT)
from
(SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / @count as [PCT]
FROM BinData
GROUP BY var1
) as a
join
(SELECT var1,
CONVERT(decimal,COUNT(*)) / @count as [PCT]
FROM BinData
GROUP BY var1
) as b
on a.var1 <= b.var1
group by b.var1
order by b.var1



My bad...  I spaced the fact that the aggregate function would cause a puking without a 'group by' clause.  :)

My bad...  I spaced the fact that the aggregate function would cause a puking without a 'group by' clause.  :)

My bad...  I spaced the fact that the aggregate function would cause a puking without a 'group by' clause.  :)
rayb64
You are probably hitting the refresh button - that unfortunately posts the comment again. There is a reload question at the top.

Everyone does it :-).
More than once.

Seems to me this is more of a pain in the arse than what I was doing before.  I would use a relatively simple SQL query to extract all the NON cummulative data.

Then using an ADO recordset build a new recordset with the non cummulative data columns and creating the cummulative columns.

Then I build another ADO recordset using the constructed SQL as the source for the ADO recordset:
Select Distinct var1,
      count(var1) as Quantity,
      convert(decimal,count(var1))/690 as PCT
from BinData
Group by var1


VB code

Sub BuildReport()
  dim rstSQL as new adodb.recordset
  dim rstRpt as new adodb.recordset
  dim dblTmp as double

  rstsql.open "....SQL....",Connection,...

  rstrpt.fields.append "Var1",advwchar,255
  rstrpt.fields.append "Count",advarwchar,255
  rstrpt.fields.append "PCT", advarwchar,255
  rstrpt.fields.append "Cum Pct",advarwchar,255
  rstrpt.open

  dim mfield as adodb.field
  do until rstsql.eof
   'adds a new record to the report recordset
   rstrpt.addnew
   'take each field from the sql data and copy the data to
   'the report recordset
   for each mfield in rstsql.fields
     rstrpt.fields(mfield.name).value=val(mfield.value)
   next mfield
   'use a temporary variable to hold cummulative pct
   dbltmp=dbltmp+rstrpt.fields("PCT").value
   'copy the cummulative pct to the report field
   rstrpt.fields("Cum Pct").value=dbltmp
   'move to the next sql record
   rstsql.movenext
   'update the report recordset
   rstrpt.update
  loop
 
  'show the data in the grid display
  set mshflexgrid1.datasource=rstrpt

end sub

so you can see it isn't easy either way, but I thought that just maybe other people might have seen this problem and want to do something similar.  Hey maybe this should be part of a suite of Statistical functions for sale?

Assuming there are no other suggestions from others, I will pick the best comment from above and award the full amount of the points.
ASKER CERTIFIED SOLUTION
Avatar of nigelrivett
nigelrivett

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
nigelrivett,

I agree it is less complicated and more efficient.

Here is the slightly revised version of the SQL statement.
The second select statement originally was in error since the field I named "Count" also happens to be a function, thus if referring to a field by this name it needs to be referenced [Count].

What is throwing me just a bit is the line:
pct_cum= (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)

I am assuming the a2 reference in the select statement is a system assigned temporary number.  Is it a2 because it is an index of the #a reference earlier in the stream?


SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
into #a
FROM dbo.BinData
GROUP BY var1

select var1, [Count], PCT,
  pct_cum= (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1

drop table #a