Link to home
Start Free TrialLog in
Avatar of Henrik Klindt
Henrik Klindt

asked on

Trend line problem, divide by zero

Hi
I Attached a query in which I need to calculate a linear trend line.
I found this solution, "Calculate linear trend line in TSQL",  and hoped that I could use the very fine solution in my query.
Unfortunatly I get a "Divide by zero" error. See lines 83 and 84 i the attached query
Can anyone see what is wrong. And is this the way to make a trend line.
SQLQueryTrendLine.sql
Avatar of Vikas Garg
Vikas Garg
Flag of India image

Hi,

You can use CASE statement to avoid Divide by zero

,Case when Iweek  > 0 then (Avg(Iweek * 1.0000 * opened) - Avg(Iweek * 1.0000) * Avg(opened)) / VarP(Iweek * 1.0000) else 0 end AS RegressCoeff
,Case when Iweek  > 0 then  Avg(opened) - ((Avg(Iweek * 1.0000 * opened) - Avg(Iweek * 1.0000) * Avg(opened)) / VarP(Iweek * 1.0000)) * Avg(Iweek * 1.0000) 
else 0 end AS Intercept
FROM tickets2

Open in new window


It may be the case then if iweek value = 0 then this error can come
Avatar of Henrik Klindt
Henrik Klindt

ASKER

I still get the divide by zero error.

Msg 8134, Level 16, State 1, Line 26
Divide by zero error encountered.

This is my result without the two trendline lines:
Iyear        Imonth     Iweek       Iday          opened    closed
2014	7	29	18	21	24
2014	7	29	14	8	23
2014	7	29	16	22	39
2014	7	29	15	26	19
2014	7	29	17	28	29
2014	7	30	23	25	40
2014	7	30	25	24	31
2014	7	30	24	17	22
2014	7	30	21	24	30
2014	7	30	22	23	22
2014	7	31	28	17	22
2014	7	31	31	21	25
2014	7	31	29	12	16
2014	7	31	30	24	22
2014	8	31	1	28	23
2014	8	32	4	19	24
2014	8	32	6	21	30
2014	8	32	8	15	12
2014	8	32	5	16	23
2014	8	32	7	21	26
2014	8	33	12	14	24
2014	8	33	11	14	17
2014	8	33	14	8	25
2014	8	33	15	10	15
2014	8	33	13	19	19

Open in new window

Try this:
SELECT
      Iyear
    , Imonth
    , Iweek
    , Iday
    , opened
    , closed
    , VARP(Iweek * 1.0000) varp_iweek
    --, (AVG(Iweek * 1.0000 * opened) - AVG(Iweek * 1.0000) * AVG(opened)) / VARP(Iweek * 1.0000)                                       AS RegressCoeff
    --, AVG(opened) - ((AVG(Iweek * 1.0000 * opened) - AVG(Iweek * 1.0000) * AVG(opened)) / VARP(Iweek * 1.0000)) * AVG(Iweek * 1.0000) AS Intercept
FROM tickets2
GROUP BY
      Iyear
    , Imonth
    , Iweek
    , Iday
    , opened
    , closed

Open in new window

It produces this from your sample:
| IYEAR | IMONTH | IWEEK | IDAY | OPENED | CLOSED | VARP_IWEEK |
|-------|--------|-------|------|--------|--------|------------|
|  2014 |      7 |    29 |   14 |      8 |     23 |          0 |
|  2014 |      7 |    29 |   15 |     26 |     19 |          0 |
|  2014 |      7 |    29 |   16 |     22 |     39 |          0 |
|  2014 |      7 |    29 |   17 |     28 |     29 |          0 |
|  2014 |      7 |    29 |   18 |     21 |     24 |          0 |
|  2014 |      7 |    30 |   21 |     24 |     30 |          0 |
|  2014 |      7 |    30 |   22 |     23 |     22 |          0 |
|  2014 |      7 |    30 |   23 |     25 |     40 |          0 |
|  2014 |      7 |    30 |   24 |     17 |     22 |          0 |
|  2014 |      7 |    30 |   25 |     24 |     31 |          0 |
|  2014 |      7 |    31 |   28 |     17 |     22 |          0 |
|  2014 |      7 |    31 |   29 |     12 |     16 |          0 |
|  2014 |      7 |    31 |   30 |     24 |     22 |          0 |
|  2014 |      7 |    31 |   31 |     21 |     25 |          0 |
|  2014 |      8 |    31 |    1 |     28 |     23 |          0 |
|  2014 |      8 |    32 |    4 |     19 |     24 |          0 |
|  2014 |      8 |    32 |    5 |     16 |     23 |          0 |
|  2014 |      8 |    32 |    6 |     21 |     30 |          0 |
|  2014 |      8 |    32 |    7 |     21 |     26 |          0 |
|  2014 |      8 |    32 |    8 |     15 |     12 |          0 |
|  2014 |      8 |    33 |   11 |     14 |     17 |          0 |
|  2014 |      8 |    33 |   12 |     14 |     24 |          0 |
|  2014 |      8 |    33 |   13 |     19 |     19 |          0 |
|  2014 |      8 |    33 |   14 |      8 |     25 |          0 |
|  2014 |      8 |    33 |   15 |     10 |     15 |          0 |

Open in new window

Hi

The issue is with the VAR / VARP function which is returning 0

Try this one

,
Case when VarP(Iweek * 1.0000) > 0 then (Avg(Iweek * 1.0000 * opened) - Avg(Iweek * 1.0000) * Avg(opened)) / VarP(Iweek * 1.0000) else 0 end AS RegressCoeff
,Case when VarP(Iweek * 1.0000) > 0 then  Avg(opened) - ((Avg(Iweek * 1.0000 * opened) - Avg(Iweek * 1.0000) * Avg(opened)) / VarP(Iweek * 1.0000)) * Avg(Iweek * 1.0000) 
else 0 end AS Intercept
FROM tickets2

Open in new window

I got the same result. But why is VARP(Iweek * 1.000) always 0.
Hi Vikas
With your solution the "divide by zero" error is gone.
But the regress and intercept is always 0.

I need to find out why VarP(Iweek * 1.000) is 0

The two trendline lines is taken from another solution. Maybe another way to calculate trendline is the solution?
This is how it looks from where I took it.
I cannot see what is wrong with my query. Can anyone help?


CREATE TABLE afr (Month int, Value numeric(8, 2))

INSERT INTO afr (Month, Value) VALUES (1, 16.89)
INSERT INTO afr (Month, Value) VALUES (2, 15.87)
INSERT INTO afr (Month, Value) VALUES (3, 11.23)
INSERT INTO afr (Month, Value) VALUES (4, 14.82)
INSERT INTO afr (Month, Value) VALUES (5, 13.74)
INSERT INTO afr (Month, Value) VALUES (6, 7.62)
INSERT INTO afr (Month, Value) VALUES (7, 17.51)
INSERT INTO afr (Month, Value) VALUES (8, 13.96)
INSERT INTO afr (Month, Value) VALUES (9, 20.00)
INSERT INTO afr (Month, Value) VALUES (10, 11.38)
INSERT INTO afr (Month, Value) VALUES (11, 10.67)
INSERT INTO afr (Month, Value) VALUES (12, 11.76)

SELECT (Avg(Month * Value) - Avg(Month) * Avg(Value)) / VarP(Month) AS RegressCoeff,
    Avg(Value) - ((Avg(Month * Value) - Avg(Month) * Avg(Value)) / VarP(Month)) * Avg(Month) AS Intercept
FROM afr
there isn't sufficient statistical variation in Iweek

Returns the statistical variance for the population for all values in the specified expression.
http://msdn.microsoft.com/en-us/library/ms188735.aspx

Let's face it Iweek isn't very interesting (except as an X axis perhaps)
What is sufficient statistical variation? I tried with other fields, always with the same result.

Have I missed something in this article ?
https://www.experts-exchange.com/questions/26713647/Calculate-linear-trend-line-in-TSQL.html
sorry, it was meant as a question "there isn't sufficient statistical variation in Iweek ?"

If you use VARP(Iweek) OVER(partition by Imonth)   you get results, e.g.:
SELECT
      Iyear
    , Imonth
    , Iweek
    , Iday
    , opened
    , closed
    , VARP(Iweek * 1.0) over(partition by Imonth)
    , VARP(Iday * 1.0) over(partition by Iyear)

    --, VARP(Iweek * 1.0000)

    --, (AVG(Iweek * 1.0000 * opened) - AVG(Iweek * 1.0000) * AVG(opened)) / VARP(Iweek * 1.0000)                                       AS RegressCoeff
    --, AVG(opened) - ((AVG(Iweek * 1.0000 * opened) - AVG(Iweek * 1.0000) * AVG(opened)) / VARP(Iweek * 1.0000)) * AVG(Iweek * 1.0000) AS Intercept
FROM tickets2
;


| IYEAR | IMONTH | IWEEK | IDAY | OPENED | CLOSED |       COLUMN_6 | COLUMN_7 |
|-------|--------|-------|------|--------|--------|----------------|----------|
|  2014 |      7 |    29 |   18 |     21 |     24 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    29 |   14 |      8 |     23 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    29 |   16 |     22 |     39 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    29 |   15 |     26 |     19 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    29 |   17 |     28 |     29 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    30 |   23 |     25 |     40 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    30 |   25 |     24 |     31 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    30 |   24 |     17 |     22 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    30 |   21 |     24 |     30 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    30 |   22 |     23 |     22 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    31 |   28 |     17 |     22 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    31 |   31 |     21 |     25 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    31 |   29 |     12 |     16 | 0.637755102041 |  71.4304 |
|  2014 |      7 |    31 |   30 |     24 |     22 | 0.637755102041 |  71.4304 |
|  2014 |      8 |    31 |    1 |     28 |     23 | 0.413223140496 |  71.4304 |
|  2014 |      8 |    32 |    4 |     19 |     24 | 0.413223140496 |  71.4304 |
|  2014 |      8 |    32 |    6 |     21 |     30 | 0.413223140496 |  71.4304 |
|  2014 |      8 |    32 |    8 |     15 |     12 | 0.413223140496 |  71.4304 |
|  2014 |      8 |    32 |    5 |     16 |     23 | 0.413223140496 |  71.4304 |
|  2014 |      8 |    32 |    7 |     21 |     26 | 0.413223140496 |  71.4304 |
|  2014 |      8 |    33 |   12 |     14 |     24 | 0.413223140496 |  71.4304 |
|  2014 |      8 |    33 |   11 |     14 |     17 | 0.413223140496 |  71.4304 |
|  2014 |      8 |    33 |   14 |      8 |     25 | 0.413223140496 |  71.4304 |
|  2014 |      8 |    33 |   15 |     10 |     15 | 0.413223140496 |  71.4304 |
|  2014 |      8 |    33 |   13 |     19 |     19 | 0.413223140496 |  71.4304 |

http://sqlfiddle.com/#!3/eb0ff/10		

Open in new window

Okay now we are getting somewhere :-)

But what does "over(partition by Imonth)" do? I dont know how it has to look like in my two trendline lines.
Can you build the two lines?
well the evidence is in front of you isn't it?

column_6 and column_7 are BOTH calculated by an aggregate function VARP() but also using OVER()
( and there is no GROUP BY clause ! )

so the addition OVER() produces (and/or, repeats)  an aggregating value for each row without needing a group by clause.

the PARTITION BY is similar to a group by, it controls the clusters of records that are included in the calculations.
Yes im sure the evidence is right in front of me :-) I just can't see it.


I tried with this:

,tickets3 AS
(
SELECT
Iyear
,Imonth
--,Iweek
,Iday
,opened
,closed
, VARP(Iweek * 1.0) over(partition by Imonth) as Iweek
--, VARP(Iday * 1.0) over(partition by Iyear) as testday

,(Avg(Iweek * 1.0000 * opened) - Avg(Iweek * 1.0000) * Avg(opened)) / VarP(Iweek * 1.0000) AS RegressCoeff
,Avg(opened) - ((Avg(Iweek * 1.0000 * opened) - Avg(Iweek * 1.0000) * Avg(opened)) / VarP(Iweek * 1.0000)) * Avg(Iweek * 1.0000) AS Intercept
FROM tickets2
Group BY
Iyear,Imonth,Iweek,Iday,opened,closed
)


select *
from tickets3
order by Iyear,Imonth

I still get the divide by zero.

Then I tried this version:

,tickets3 AS
(
SELECT
Iyear
,Imonth
--,Iweek
,Iday
,opened
,closed
--, VARP(Iweek * 1.0) over(partition by Imonth) as Iweek
--, VARP(Iday * 1.0) over(partition by Iyear) as testday

,(Avg(Iweek * 1.0000 * opened) - Avg(Iweek * 1.0000) * Avg(opened)) / VarP(Iweek * 1.0000) over(partition by Imonth) AS RegressCoeff
,Avg(opened) - ((Avg(Iweek * 1.0000 * opened) - Avg(Iweek * 1.0000) * Avg(opened)) / VarP(Iweek * 1.0000) over(partition by Imonth)) * Avg(Iweek * 1.0000) AS Intercept
FROM tickets2
Group BY
Iyear,Imonth,Iweek,Iday,opened,closed
)


select *
from tickets3
order by Iyear,Imonth

Which returns this. Is this the wrong way to use over(partition by Imonth)


Iyear      Imonth      Iday      opened      closed      RegressCoeff      Intercept
2014      7      18      21      24      0      21
2014      7      24      17      22      0      17
2014      7      25      24      31      0      24
2014      7      30      24      22      0      24
2014      7      16      22      39      0      22
2014      7      21      24      30      0      24
2014      7      29      12      16      0      12
2014      7      31      21      25      0      21
2014      7      23      25      40      0      25
2014      7      17      28      29      0      28
2014      7      22      23      22      0      23
2014      7      15      22      13      0      22
2014      7      28      17      22      0      17
2014      8      4      19      24      0      19
2014      8      18      10      30      0      10
2014      8      1      28      23      0      28
2014      8      5      16      23      0      16
2014      8      7      23      26      0      23
2014      8      8      16      12      0      16
2014      8      12      14      24      0      14
2014      8      14      10      25      0      10
2014      8      13      21      19      0      21
2014      8      6      22      30      0      22
2014      8      11      15      17      0      15
2014      8      15      16      14      0      16
SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
It was the group by that caused the divide by zero error.