Solved

Cummulative counts etc in SQL query

Posted on 2001-08-30
14
525 Views
Last Modified: 2011-10-03
I apologize if this question has been asked before, but I get a timed out response when searching the database so this is my last resort.

I am trying to build an SQL query that will return data to be displayed/accessed via ADO in an application.  Basically, the result should show cummulative counts and percentages.  Here is the basic SQL where I am starting.

SELECT distinct var1,
       count(var1) as [Count],
       CONVERT(decimal,COUNT(var1)) / 690 as [PCT]
       FROM dbo.BinData
       GROUP BY var1

My thought is to modify this statement to use a local declare variable. I can't seem to figure out how to:
1.) set the value of the local variable say @cumpct = @cumpct + [pct]
2.) load the value of @cumpct into a new field in the query such as @cumpct as [Cum Pct]

If I can figure out this problem, I will have the answer to multiple other hacked workarounds I have used in the past.
0
Comment
Question by:edwinson
  • 5
  • 4
  • 4
  • +1
14 Comments
 

Expert Comment

by:rayb64
ID: 6442025

Are you trying to do this as a SQL string passed from the application?

In the first part of your question, a better solution than using a local variable might be to do something like this:

SELECT distinct var1
  SUM( CONVERT( DECIMAL, COUNT( var1 ) ) / 690 )
    AS 'the_sum'
FROM
  dbo.BinData

This is a potential solution, but, what are you really after?  Are you looking to receive the results of the query as is and then to have summary information available as well?

A temp table might be a pretty good compromise in this situation.  Consider this solution (but do it as a stored proc):

SELECT
  DISTINCT var1 AS 'var1'
  ,COUNT( var1 ) AS 'quantity'
  ,CONVERT( DECIMAL, COUNT( var1 ) ) / 690 AS 'PCT'
INTO #tabular_data
FROM
  dbo.BinData
GROUP BY
  var1

SELECT
  var1
  ,quantity
  ,PCT
FROM
  #tabular_data

SELECT
  SUM( ISNULL( PCT, 0 ) ) AS 'total_amount'
FROM
  #tabular_data

DROP TABLE #tabular_data

If you incorporate the code above into a stored proc, you'll need to use ADO to navigate two recordsets.  If you aren't certain how to do this, you'll need to check the MS documentation.  At one point in time, there was no way to navigate from the first recordset to the second and then back to the first again.  I don't know if it is now possible...  Again, check the ADO documentation.

Without knowing exactly the data you are after and the formats you require, I can't really be more helpful than this.

Ray
0
 
LVL 1

Author Comment

by:edwinson
ID: 6442311
I'll try to be a little more clear.

Basically I want to produce a table that looks something like this. Although this is very simplified for the purposes of illustration.

Var1     Quantity      Pct      Cummulative Pct
0          222         32.2          32.2
1          468         67.8         100

The cummulative pct is simply the pct of the previous record plus the value of the pct from the current record.

I want a deduped table with the values of var1 listed so I build the first part of the sql as follows:

Select distinct var1,

Since I want a count of the number of records for each value of var1 I use:
       
      count(var1) as Quantity,

I happen to know the table I am working with has 690 records so the calculation of the percentage of data is a bit easier.

      convert(decimal,count(var1)/690 as Pct

The data is coming from a table called BinData so we add:
     
      from BinData

The aggregate function Count requires a group by clause so:
 
     group by var1


Thus the entire sql so far is

Select Distinct var1,
       count(var1) as Quantity,
       convert(decimal,count(var1))/690 as PCT
from BinData
Group by var1


This gets everything EXCEPT the Cummulative percentage.  HOW can I get to that without screwing around in the application code to generate the cummulatives as I do now.  Imagine an instance where there are millions of unique values of Var1.  It would be terribly ineffective to process the cummulatives with application code if I can pass the data in an SQL statement to do the same thing.

By the way this sql is dynamic in the sense that the application formulates the query based on user selection of a particular variable.  In this example the user identified var1, however it can be virtually any variable in the BinData table.

The results are accessed via an ADO recordset object that uses the SQL string as the source.          
0
 
LVL 1

Author Comment

by:edwinson
ID: 6442326
By the way this code does not work:

SELECT distinct var1
 SUM( CONVERT( DECIMAL, COUNT( var1 ) ) / 690 )
   AS 'the_sum'
FROM
 dbo.BinData

The aggregate functions require a group by clause.
The aggregate functions cannot be nested as they are in this code.

Query Analyzer had a coronary attack with this code.
0
Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

 
LVL 18

Expert Comment

by:nigelrivett
ID: 6442333
Few comments.
Dont need the distinct - group by already does that.
Prefer count(*) to count(Var1) - sql server is optimised for it.
I put an identity on #a but it's not necessary as you can use var1 for the inequality - that might be a bit faster but the id might be useful for other things. Could also do without the update just return the result.
(I've given the code below for that)
Not sure if this is what you want.
Haven't tried it but should be close to what I am intending to do

SELECT identity(int,1,1) as id
var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
convert(decimal,0) as [PCT_cum]
into #a
FROM dbo.BinData
GROUP BY var1
order by var1

update #a
set PCT_CUM = (select sum(PCT) from #a a2 where a2.id <= #a.id)

select var1, Count, PCT, PCT_cum from #a
order by id

drop table #a


Shorter code

SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
into #a
FROM dbo.BinData
GROUP BY var1

select var1, Count, PCT,
PCT_cum  = (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1

drop table #a



0
 
LVL 18

Expert Comment

by:nigelrivett
ID: 6442337
Lucky guess - looks like that is what you want.
0
 
LVL 32

Expert Comment

by:bhess1
ID: 6442357
I think I see what you are trying... basically, you are looking for a running total to appear in the query, something like this for this data:

MyTable:  Id, Sequence, Charge

1   1    40
1   2    37.5
1   3    100
1   4    54

you want this result:

1   1    40     40
1   2    37.5   77.5
1   3    100    177.5
1   4    54     231.5

If this is what you are looking for, you'll probably need to use a temp table and a cursor.

Using your sample query, here's how I would probably end up doing it:

CREATE TABLE #Working (var1 varchar(10), [Count] int, Pct decimal(12,4), Cumulative decimal(12,4))
INSERT INTO #Working (var1, [Count], Pct)
SELECT distinct var1,
      count(var1) as [Count],
      CONVERT(decimal,COUNT(var1)) / 690 as [PCT]
      FROM dbo.BinData
      GROUP BY var1

Declare c_Working CURSOR for
SELECT Pct From #Working Order By var1
Declare @Running decimal(12,4)
Declare @Pct decimal(12,4)
OPEN c_Working
Fetch Next From c_Working INTO @pct
While @@FETCH_STATUS = 0
BEGIN
   Set @Running = @Running + @Pct
   Update #Working Set Cumulative = @Running WHERE CURRENT OF c_Working
   Fetch Next From c_Working
END
Close c_Working
Deallocate c_Working


0
 
LVL 18

Expert Comment

by:nigelrivett
ID: 6442382
Only 690 recs could

declare @count int
select @count= count(*) from BinData

SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / @count as [PCT]
into #a
FROM BinData
GROUP BY var1

select var1, Count, PCT,
PCT_cum  = (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1

drop table #a

or this might even work without a temp table (could even do it in a single query - don't recommend either though).

declare @count int
select @count= count(*) from BinData

select b.var1, min(b.Count), min(b.PCT),
PCT_cum = sum(a.PCT)
from
(SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / @count as [PCT]
FROM BinData
GROUP BY var1
) as a
join
(SELECT var1,
CONVERT(decimal,COUNT(*)) / @count as [PCT]
FROM BinData
GROUP BY var1
) as b
on a.var1 <= b.var1
group by b.var1
order by b.var1


0
 

Expert Comment

by:rayb64
ID: 6442504

My bad...  I spaced the fact that the aggregate function would cause a puking without a 'group by' clause.  :)
0
 

Expert Comment

by:rayb64
ID: 6442554

My bad...  I spaced the fact that the aggregate function would cause a puking without a 'group by' clause.  :)
0
 

Expert Comment

by:rayb64
ID: 6442596

My bad...  I spaced the fact that the aggregate function would cause a puking without a 'group by' clause.  :)
0
 
LVL 18

Expert Comment

by:nigelrivett
ID: 6442659
rayb64
You are probably hitting the refresh button - that unfortunately posts the comment again. There is a reload question at the top.

Everyone does it :-).
More than once.

0
 
LVL 1

Author Comment

by:edwinson
ID: 6443478
Seems to me this is more of a pain in the arse than what I was doing before.  I would use a relatively simple SQL query to extract all the NON cummulative data.

Then using an ADO recordset build a new recordset with the non cummulative data columns and creating the cummulative columns.

Then I build another ADO recordset using the constructed SQL as the source for the ADO recordset:
Select Distinct var1,
      count(var1) as Quantity,
      convert(decimal,count(var1))/690 as PCT
from BinData
Group by var1


VB code

Sub BuildReport()
  dim rstSQL as new adodb.recordset
  dim rstRpt as new adodb.recordset
  dim dblTmp as double

  rstsql.open "....SQL....",Connection,...

  rstrpt.fields.append "Var1",advwchar,255
  rstrpt.fields.append "Count",advarwchar,255
  rstrpt.fields.append "PCT", advarwchar,255
  rstrpt.fields.append "Cum Pct",advarwchar,255
  rstrpt.open

  dim mfield as adodb.field
  do until rstsql.eof
   'adds a new record to the report recordset
   rstrpt.addnew
   'take each field from the sql data and copy the data to
   'the report recordset
   for each mfield in rstsql.fields
     rstrpt.fields(mfield.name).value=val(mfield.value)
   next mfield
   'use a temporary variable to hold cummulative pct
   dbltmp=dbltmp+rstrpt.fields("PCT").value
   'copy the cummulative pct to the report field
   rstrpt.fields("Cum Pct").value=dbltmp
   'move to the next sql record
   rstsql.movenext
   'update the report recordset
   rstrpt.update
  loop
 
  'show the data in the grid display
  set mshflexgrid1.datasource=rstrpt

end sub

so you can see it isn't easy either way, but I thought that just maybe other people might have seen this problem and want to do something similar.  Hey maybe this should be part of a suite of Statistical functions for sale?

Assuming there are no other suggestions from others, I will pick the best comment from above and award the full amount of the points.
0
 
LVL 18

Accepted Solution

by:
nigelrivett earned 300 total points
ID: 6444987
I suspect this would be a lot more efficient.
And possibly simpler.

SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
into #a
FROM dbo.BinData
GROUP BY var1

select var1, Count, PCT,
PCT_cum  = (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1

drop table #a



0
 
LVL 1

Author Comment

by:edwinson
ID: 6445172
nigelrivett,

I agree it is less complicated and more efficient.

Here is the slightly revised version of the SQL statement.
The second select statement originally was in error since the field I named "Count" also happens to be a function, thus if referring to a field by this name it needs to be referenced [Count].

What is throwing me just a bit is the line:
pct_cum= (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)

I am assuming the a2 reference in the select statement is a system assigned temporary number.  Is it a2 because it is an index of the #a reference earlier in the stream?


SELECT var1,
count(*) as [Count],
CONVERT(decimal,COUNT(*)) / 690 as [PCT]
into #a
FROM dbo.BinData
GROUP BY var1

select var1, [Count], PCT,
  pct_cum= (select sum(PCT) from #a a2 where a2.var1 <= #a.var1)
from #a
order by var1

drop table #a
0

Featured Post

Best Practices: Disaster Recovery Testing

Besides backup, any IT division should have a disaster recovery plan. You will find a few tips below relating to the development of such a plan and to what issues one should pay special attention in the course of backup planning.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
convert in derived column 7 30
T-SQL: Creating Records Where There Are None2 - The Sequel 6 27
SQL Server 2012 r2 - Varible Table 3 24
Increment column based of a FK 8 22
Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed. Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve th…
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Via a live example, show how to shrink a transaction log file down to a reasonable size.

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question