Solved

SQL 2008 calculate AVG on computed column

Posted on 2012-12-30
20
473 Views
Last Modified: 2012-12-31
I have a computed column (MyColumn) defined as decimal (4,2) which calls a user-defined function. The function calculates a score (think of it as a credit score) based on transactional activity in two other tables.

The computed column is non deterministic and so the data is non persistent. I have 200,000+ rows (users who's credit score is calculated by the computed column and which changes daily). I want to calculate AVG(MyColumn) however the processing time would likely take several hours. What is the best way to calculate AVG(MyColumn) given that the data upon which it needs to calculate is not persistent?
0
Comment
Question by:drl1
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 7
  • 2
  • +1
20 Comments
 
LVL 12

Expert Comment

by:Saurabh Bhadauria
ID: 38732029
Better would be to use actual tables while calculating the average...  

Actual tables means tables  used in UDF..
0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 38732391
you could dump the computed columns' value into some temp table...  asnyc (for example, early in the morning), so you could use that table for your reporting ...
at least that is the way I work most often if such "computed columns" change daily once.
means also that I would not have the computed column in the table directly, but be computed once per day anyhow.
0
 

Author Comment

by:drl1
ID: 38732516
Thanks for the feedback.

@Saurv...Can you elaborate on how a table in UDF might work in this example? Currently I pass an ID (@id) in to the UDF and it conducts the following (simplified) process:

set @count1 = (select count(id) from table A where id=@id)
set @count2 = (select count(id) from table B where id=@id)
set @result = @count1/@count2
return @result

@angelIII... That's the line of thought I was heading down, so compute the value in the early hours of the morning so that I at least have persistent values from which to report in aggregate but it doesn't give me the real-time value per ID that I was looking for. I guess I could complete the early hours conputation and also retain the computed column for ad-hoc calls for each ID thus capturing the real-time value (which may have changed since that morning).
0
The Eight Noble Truths of Backup and Recovery

How can IT departments tackle the challenges of a Big Data world? This white paper provides a roadmap to success and helps companies ensure that all their data is safe and secure, no matter if it resides on-premise with physical or virtual machines or in the cloud.

 
LVL 12

Expert Comment

by:Saurabh Bhadauria
ID: 38732573
What i meant.... to put your logic inside the SP and do the calculation at once

I have written simple code just give you an idea... Wrap you code inside a Sp .
Since you have around 200,000 record you can also put looping with indexes on temp tables..

And you can call that Sp when ever you are running report..    let me know if you have any concern..

I am assuming the @id is  some column value of tablec    ........

 Create table #tempa 
 (
   id_A int,
   Cnt_A bigint
 )
 
 
 Create table #tempb 
 (
   id_B int,
   Cnt_b bigint
 )
 
 
 insert into #tempa
select A.id,count(A.id)
 from tableA A  join tablec C on A.id=c.id
group by A.id




 insert into #tempb
select b.id,count(b.id)
 from tableb b  join tablec C on b.id=c.id
group by b.id


--Return Table   

select  A.ID , cnt_a/cnt_b from   
#tempa a join #tempb b on a.id_A=b.id_B 

Open in new window



Thanks,
Saurabh
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 38732769
Create an in-line-table function that does the same computation.  In-line table functions will be much faster than row-by-row.  The computed column r-b-r makes sense for a few rows here and there, but to do a lot of them an in-line table function will be much more efficient.
0
 

Author Comment

by:drl1
ID: 38733038
Thanks for the guidance. So this is where I'm up to...
****************************************************************
CREATE FUNCTION [dbo].[TEST] (@id int)
 
RETURNS @TableA TABLE
 (
   id bigint,
   FieldA bigint,
   FieldB bigint,
   Score (5,2)
 )
AS
BEGIN
DECLARE @CountA int = (Select count(id) from tableA where id = @id)
DECLARE @CountB int = (Select count(id) from tableB where id = @id and status = 'Whatever')
DECLARE @CScore decimal(5,2) = (CAST((@CountB) as float))/(CAST((@CountA) as float))
      INSERT @TableA
            SELECT @id, @CountA, @CountB, @CScore
      RETURN
END
****************************************************************

So I can pass the ID for a user in without problem and it returns a single row table containing the id, two counts and calculated score. How would I now convert this to work out the scores for every ID (from 1 - 200,000) and then conduct an AVG function on the 'Score' field?
0
 

Author Comment

by:drl1
ID: 38733161
Think I've got it now...
****************************************************************
CREATE FUNCTION [dbo].[TEST] (@id int)
 
RETURNS @TableA TABLE
 (
   id bigint,
   FieldA bigint,
   FieldB bigint,
   Score (5,2)
 )
AS
BEGIN
DECLARE @ProcID int = 1
DECLARE @MaxID int = (SELECT MAX(id) from tableA)

WHILE @ProcID <= @MaxID
BEGIN

DECLARE @CountA int = (Select count(id) from tableA where id = @ProcID)
DECLARE @CountB int = (Select count(id) from tableB where id = @ProcID and status = 'Whatever')
DECLARE @CScore decimal(5,2) = (CAST((@CountB) as float))/(CAST((@CountA) as float))
      INSERT @TableA
            SELECT @ProcID, @CountA, @CountB, @CScore
            SELECT @ProcID = (Select top 1 ID from tableA where ID > @ProcID order by ID)
END
      RETURN
END
****************************************************************

Takes 35 secs to return 1000 rows. @ID as an import value is now redundant as I start the loop from 1 (set @MaxID to 1000 for testing). Reason I'm incrementing the @ProcID using a select statement is because not all IDs exist in tableA due to some accounts being physically removed and IDs deleted. Any ideas on further streamlining this?
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 38733206
That's not an inline table function, that's a multiline table function.

An inline table function has just a RETURN statement.
0
 

Author Comment

by:drl1
ID: 38733354
Thanks. Would you be able to re-code the above as an inline function, just so I'm clear on how to achieve that implementation? When I tried declaring the temp tables within the function I got some compile errors so must have been doing it incorrectly.
0
 
LVL 69

Accepted Solution

by:
Scott Pletcher earned 500 total points
ID: 38733384
CREATE FUNCTION [dbo].[TEST] (
    @id int
)
RETURNS TABLE
AS
RETURN (
    SELECT id, FieldA, FieldB, CAST( CAST(FieldB AS decimal(19, 2)) / CAST(FieldA AS decimal(19, 2)) AS decimal(4, 2)) AS Score
    FROM (
            SELECT a.id, COUNT(a.id) AS FieldA, (SELECT COUNT(id) FROM dbo.tableB b WHERE b.id = a.id AND status = 'Whatever') AS FieldB
            FROM dbo.tableA a
            WHERE
                a.id = @id
            GROUP BY a.id
      ) AS derived
);
0
 

Author Comment

by:drl1
ID: 38733436
Wow...now that is pretty slick. 211,379 rows calculated and returned in 85 secs. Thanks a million ;o)
0
 

Author Closing Comment

by:drl1
ID: 38733440
Perfect, thanks!
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 38733521
Glad it worked out!

Inline functions are literally "compiled" right into the statement being run, so they are much more efficient.  Multiline functions are called like a regular scalar function, and so have much more overhead.
0
 

Author Comment

by:drl1
ID: 38733565
Yeah, it makes sense now I can see good working examples of each method in front of me. Could I pester you for another solution? I could start a new Q but it is related to the above....

I need to calculate a similar process to the one above however this time I'm pulling values from one table only instead of two tables.

Essentially...

select count(id) as Count1 from tableA where status = 'X' or status = 'Y'
select count(id) as Count2 from tableA
NewScore = Count1/Count2

Each row in the result set would show: id, Count1, Count2, NewScore

Again, I need to calculate this on-the-fly for each ID and be able to run AVG(NewScore) without waiting an inordinate amount of time. Maybe i've been looking at the screen too long to see the obvious answer to this one, which is definitely more straight forward.
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 38733597
SELECT
    id,
    SUM(CASE WHEN status = 'X' OR status = 'Y' THEN 1 ELSE 0 END) AS Count1,
    SUM(1) AS Count2,
    (SUM(CASE WHEN status = 'X' OR status = 'Y' THEN 1 ELSE 0 END) * 1.0) /
    SUM(1) AS NewScore    
FROM dbo.tableA
GROUP BY
    id
ORDER BY
    id


If you need to see NewScore as a %, change the "* 1.0" to "* 100.0".
0
 

Author Comment

by:drl1
ID: 38733618
That's great for getting all of the rows, thanks. Calculating AVG(NewScore) is my final requirement. Would I simply select all of the above to a temp table then run AVG(NewScore) on the temp table to get the most effective solution to that requirement?
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 38733628
Do you need to see the detail at all?  If not, we can compute the avg in a single SELECT with just that as the result, we don't need a temp table.

If you need the detail and the final avg, we can add a WITH ROLLUP to the group by and calc it from that :-) .
0
 

Author Comment

by:drl1
ID: 38733631
Having the detail would be good ;o)
0
 
LVL 69

Expert Comment

by:Scott Pletcher
ID: 38733669
Try this and see if it gives you what you need:



SELECT
    CASE WHEN id IS NULL THEN 'Summary' ELSE CAST(id AS varchar(10)) END AS id,
    Count1, Count2,
    CASE WHEN id IS NULL THEN (Count1 * 1.0 / Count2) ELSE NewScore END AS NewScore
FROM (
      SELECT
            id,
            SUM(CASE WHEN status = 'X' OR status = 'Y' THEN 1 ELSE 0 END) AS Count1,
            SUM(1) AS Count2,
            (SUM(CASE WHEN status = 'X' OR status = 'Y' THEN 1 ELSE 0 END) * 1.0) /
            SUM(1) AS NewScore    
      FROM dbo.tableA
      GROUP BY
            id WITH ROLLUP
) AS derived
ORDER BY
    id


If you want to leave id as numeric, you can use 0 or 2000000000 for the summary row, depending on whether you want it to be first or last.
0
 

Author Comment

by:drl1
ID: 38733689
Perfect, thanks again, that has been very helpful. Happy New Year!
0

Featured Post

Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have written a PowerShell script to "walk" the security structure of each SQL instance to find:         Each Login (Windows or SQL)             * Its Server Roles             * Every database to which the login is mapped             * The associated "Database User" for this …
This is basically a blog post I wrote recently. I've found that SARGability is poorly understood, and since many people don't read blogs, I figured I'd post it here as an article. SARGable is an adjective in SQL that means that an item can be fou…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…
Add bar graphs to Access queries using Unicode block characters. Graphs appear on every record in the color you want. Give life to numbers. Hopes this gives you ideas on visualizing your data in new ways ~ Create a calculated field in a query: …

631 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question