?
Solved

Auto Correlation Function, Output Range and Regression Analysis data to Excel

Posted on 2009-02-13
75
Medium Priority
?
2,093 Views
Last Modified: 2012-05-06
Hi,

I am working on an Auto-correlation function, and need help finishing it off.

This function loops through the flex grid, and finds the R-squared value closest to 1. As soon as this value is reached, the function will give the Range (the Start date, where the function started this calculation, and the End Date, where it finished this calculation) and then it continues this until the end.

I also need some other functionality, such as the slope, Y-intercept, Standard errors, and the regression analysis functions, but I guess those are built in excel so it wont be a problem.

I have attached a sample output and input file.

A logical explanation could be, and to make the problem more simpler, the loop should look up to the first 6-8 values from the point it begins the loop, as every month has 4 data values, so we can look at 2 months data.

A temp variable can be created that checks how the R-squared value is affected by adding the first 6 values, the first 7 values, or the first 8 values, if the R-squared values keeps increasing as n increases, then we should keep increasing the loop, but if the R-squared value decreases as n increases, then we should keep the value of n that resulted in the highest R-squared value.

This is just a logical explanation to solve this problem,
now I need to put it into code.

So I will need a loop, a temp variable to keep track of n, and the R-squared value.
Might need If case.

Your help and suggestions are much awaited and appreciated
Public Function Corell()
   Dim objApp As Excel.Application
   Dim objBook As Workbook
   Dim objSheet As Worksheet
   On Error Resume Next
   Set objApp = GetObject("excel.application")
   If Err.Number <> 0 Then
      Set objApp = CreateObject("excel.application")
   End If
   Set objBook = objApp.Workbooks.Add
   Set objSheet = objApp.Sheets(1)
 
   Dim j As Integer, high As Integer
      Dim current_max, max As Double ' This compares the result with the max
   Dim n As Integer
   high = grd_result.Rows - 1
   For j = 1 To high
   n = j
           objSheet.Cells(j, 1).value = grd_result.TextMatrix(j, 2)
           objSheet.Cells(j, 2).value = grd_result.TextMatrix(j, 3)
            objSheet.Range("a" & n + 1 & ":a" & n + 1).formula = "=Correl(a1:a" & n & ", b1:b" & n & ")^2"
            result = FormatNumber(CDbl(objSheet.Range("a" & n + 1 & ":A" & n + 1).value), 5, vbTrue)
            
            If result > current_max Then
            max = result
            ElseIf current_max > result Then
            max = current_max
            End If
        Next
   End If
   
   objSheet.Range("a" & high + 1 & ":a" & high + 1).formula = "=Correl(a1:a" & high & ", b1:b" & high & ")^2"
   result = FormatNumber(CDbl(objSheet.Range("a" & high + 1 & ":A" & high + 1).value), 5, vbTrue)
   
  
   objApp.DisplayAlerts = False
   Set objSheet = Nothing
   Set objBook = Nothing
   objApp.Quit
   Set objApp = Nothing
End Function
 
 
'''''
 
Private Sub cmd_OutputKeelingPlot_Click()
 
Call Corell
 
   Dim objApp As Excel.Application
   Dim objBook As Workbook
   Dim objSheet As Worksheet
   Dim objExcelCI As Excel.Chart
   Dim high As Integer
   
   On Error Resume Next
   Set objApp = GetObject("excel.application")
   If Err.Number <> 0 Then
      Set objApp = CreateObject("excel.application")
   End If
   Set objBook = objApp.Workbooks.Add
   Set objSheet = objApp.Sheets(1)
   
   objApp.Visible = True
   objApp.UserControl = True
   objApp.WindowState = xlMaximized
   objApp.DisplayAlerts = False
   
   With objSheet
   .Cells(1, 2).value = " Starting Date "
   .Cells(1, 3).value = " Ending Date "
   .Cells(1, 4).value = " Plot Date "
   .Cells(1, 5).value = " n "
   .Cells(1, 6).value = " R-squared "
   .Cells(1, 7).value = " Standard Error "
   .Cells(1, 8).value = " Slope "
   .Cells(1, 9).value = " Standard Error "
   .Cells(1, 10).value = " Y-intercept "
   .Cells(1, 11).value = " Standard Error "
   
   ''' Have the code for the loop  that gives all the values to these columns and defiens the range
   
   ''' High = end row from Autocorrellation
   high = 10
   
   .Range("B1:K" & high).Select
   
   With Selection
   .HorizontalAlignment = xlCenter
   .VerticalAlignment = xlBottom
   .WrapText = False
   .Orientation = 0
   .AddIndent = False
   .IndentLevel = 0
   .ShrinkToFit = True
   .ReadingOrder = xlContext
   .MergeCells = False
   .Borders(xlDiagonalDown).LineStyle = xlNone
   .Borders(xlDiagonalUp).LineStyle = xlNone
   
   With Selection.Borders(xlEdgeLeft)
        .LineStyle = xlContinuous
        .Weight = xlMedium
        .ColorIndex = xlAutomatic
    End With
    
    With Selection.Borders(xlEdgeTop)
        .LineStyle = xlContinuous
        .Weight = xlMedium
        .ColorIndex = xlAutomatic
    End With
    
    With Selection.Borders(xlEdgeBottom)
        .LineStyle = xlContinuous
        .Weight = xlMedium
        .ColorIndex = xlAutomatic
    End With
    
    With Selection.Borders(xlEdgeRight)
        .LineStyle = xlContinuous
        .Weight = xlMedium
        .ColorIndex = xlAutomatic
    End With
    
    End With
    
    .Range("B1:K1").Select
    
    .Columns.AutoFit
    
    With Selection
    .Font.Bold = True
    .Borders(xlDiagonalDown).LineStyle = xlNone
    .Borders(xlDiagonalUp).LineStyle = xlNone
    
    With Selection.Borders(xlEdgeLeft)
        .LineStyle = xlContinuous
        .Weight = xlMedium
        .ColorIndex = xlAutomatic
    End With
    
    With Selection.Borders(xlEdgeTop)
        .LineStyle = xlContinuous
        .Weight = xlMedium
        .ColorIndex = xlAutomatic
    End With
    
    With Selection.Borders(xlEdgeBottom)
        .LineStyle = xlContinuous
        .Weight = xlMedium
        .ColorIndex = xlAutomatic
    End With
    
    With Selection.Borders(xlEdgeRight)
        .LineStyle = xlContinuous
        .Weight = xlMedium
        .ColorIndex = xlAutomatic
    End With
    
   End With
   End With
 
   Set objExcelCI = objExcelW.Charts.Add
   With objExcelCI
          
    .ChartType = xlBarStacked
    .SetSourceData Source:=Sheets("Sheet1").Range("E1:E" & high & "J1:J" & high), PlotBy:=xlColumns
        
    .Location Where:=xlLocationAsNewSheet, Name:="Keeling Plot"
    
    .HasTitle = True
    .ChartTitle.Characters.Text = "Keeling Plot"
    .Axes(xlCategory, xlPrimary).HasTitle = True
    .Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "Y-Intercept"
    .Axes(xlValue, xlPrimary).HasTitle = True
    .Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Plot Date"
    .HasLegend = False
    
    With .PlotArea.Border
        .ColorIndex = 16
        .Weight = xlThin
        .LineStyle = xlContinuous
    End With
    
    Selection.Fill.OneColorGradient Style:=msoGradientDiagonalUp, Variant:=3, _
        Degree:=0.231372549019608
        
    With Selection
        .Fill.Visible = True
        .Fill.ForeColor.SchemeColor = 43
    End With
   
    With .SeriesCollection(1).Border
        .ColorIndex = 1
        .Weight = xlHairline
        .LineStyle = xlDot
    End With
    
    With .SeriesCollection(1)
        .MarkerBackgroundColorIndex = 2
        .MarkerForegroundColorIndex = 1
        .MarkerStyle = xlCircle
        .Smooth = False
        .MarkerSize = 5
        .Shadow = False
    End With
    
    End With
       
       
   Set objSheet = Nothing
   Set objBook = Nothing
   Set objApp = Nothing
   Set objExcelCI = Nothing
   
End Sub

Open in new window

Sample-Data-for-keeling-Plot.xls
Sample-Output-For-keeling-Plot.xls
0
Comment
Question by:Student_101
  • 46
  • 21
  • 5
  • +1
75 Comments
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 23633445
Hello Student_101,

I urge you not to rely on Excel for regression calculations.  Most of the time Excel will generate the right
answer, but under certain conditions Excel produces absurd output, such as negative R^2, which of course
is mathematically impossible.

For all of my regression work I use Minitab, which is very, very robust, albeit pricey.  There are several
commercial add-ins available for Excel that are worthwhile, but while they are much less expensive than
Minitab, they are not free.

Regards,

Patrick
0
 

Author Comment

by:Student_101
ID: 23633485
Hi Matthewspatrick,

Thanks for the prompt feedback, but this is just a function that I need in my application.

I need to get the R-squared value, the Y-intercept, and the Slope. Other regression calculations are not so important for me.

I have been manually doing this for a long time, and I haven't come across such problems yet.

Thanks for your feedback :)
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 23633591
Student_101,

Well, if you need the R^2, the bad news is that Excel will not always calculate it correctly.  The fact that
it has not happened to you yet does not mean that it never will happen to you.

Regards,

Patrick
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:Student_101
ID: 23633641
matthewspatrick,

I am designing this application for my supervisor, and they are well-acquainted with Excel, and have asked me to do this in Excel.

I understand that your comment is based on your knowledge and experience, but lets put it this way,

I have no other choice, and I have to use excel for this, as buying another software will not be allowed.

Can you please kindly help me with my function?

Regards
0
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 23633789
Student_101 said:
>>Can you please kindly help me with my function?

I will not help you with this current approach--nothing personal, and I wish you the best, but in my opinion your
supervisor is pushing you down the wrong path.

If no other Expert joins in with a few hours, I suggest using the 'request attention' to ask the Mods for help in
bringing in some other Experts.

Cheers,

Patrick
0
 

Author Comment

by:Student_101
ID: 23633809
Thanks matthewspatrick :)

Regards
0
 
LVL 6

Expert Comment

by:VK
ID: 23633929
Hello Student 101,

i'm sure that i will find a solution for you.
Since i have no flex grid i will import your excel data into an access table.
Then i will change your code so that it reads the data from the access table.
The bad excel calculation is not a problem for me because we can do it in vb/vba.

I'm now finished with my job at my office and will go home.
We can continue solving your problem at the weekend.
I hope you can spend more than 500 points if the solution will be very extensive.
0
 

Author Comment

by:Student_101
ID: 23633971
Hi VK,

I actually get this data from an SQL QUERY, using the ADODB recordset from my Access Database.

That's how I get the the data to my flexgrid.

I really hope we can solve this, as I have been looking for a solution for almost 2 weeks now.

I am also new at Expert Exchange, and I was under the impression that the max points you could award is 500?

can you please inform me if there is a limit to the points awarded, and do I have a limit per month as to how many points I use? or does it cost me anything to award more points??
0
 

Author Comment

by:Student_101
ID: 23633995
This is how I get the info to my flex grid
OpenConnection '' connection string to database
Dim cmd_Results As New ADODB.Command
  Dim RS_Results As New ADODB.Recordset
  Dim Row As Long
  Dim newcell As String
  
 On Error Resume Next
 
grd_result.Clear
 
  With cmd_Results
    .ActiveConnection = m_Conn
    .CommandText = "Select * From qry_distinctValues;" ''' This is the query that gets the values to the flexgrid
    .CommandType = adCmdText
  End With
 
  With RS_Results
    .CursorType = adOpenStatic
    .CursorLocation = adUseClient
    .LockType = adLockOptimistic
    .Open cmd_Results
  End With
 
  Set grd_result.DataSource = RS_Results
 
 
        grd_result.ColAlignmentFixed(-1) = flexAlignCenterBottom
        grd_result.TextMatrix(0, 2) = " 1/Original "
        grd_result.TextMatrix(0, 3) = " Isotope "
        
        
        For i = 1 To grd_result.Cols - 1
          grd_result.ColAlignment(i) = flexAlignCenterBottom
        Next
        
          Call SizeColumns
          
 
           For i = 1 To grd_result.Rows - 1
                grd_result.TextMatrix(i, 2) = 1 / grd_result.TextMatrix(i, 2)
                grd_result.TextMatrix(i, 2) = FormatNumber(grd_result.TextMatrix(i, 2), 6, vbTrue, vbUseDefault, vbFalse)
           Next i
                     
 
Dim cmd_graph As New ADODB.Connection
Dim rs_graph As New ADODB.Recordset
 
With cmd_graph
.ActiveConnection = m_Conn
.CommandText = "SELECT tbl_Operation.Value, tbl_keelingPlot.Value2 FROM tbl_keelingPlot INNER JOIN tbl_Operation ON tbl_keelingPlot.Row = tbl_Operation.Row ORDER BY tbl_Operation.Date;"
.CommandType = adCmdText
End With
    
  With rs_graph
    .CursorType = adOpenStatic
    .CursorLocation = adUseClient
    .LockType = adLockOptimistic
    .Open cmd_graph
  End With
  
  With Chart_1
        .HasChartSpaceTitle = True
        .ChartSpaceTitle.Caption = cbo_measurementParam.List(0) & " VS. " & cbo_measurementParam.List(1)
        
        If keeling_Select = True Then
            .Refresh
            high = therowsel
            For j = therow To high
                .SetData chDimXValues, grd_result.TextMatrix(j, 2), rs_graph!value
                .SetData chDimYValues, grd_result.TextMatrix(j, 3), rs_graph!value2
            Next j
            
        Else
            
        .SetData chDimXValues, 2, expr1000  'Categories are first field
        .SetData chDimYValues, 3, rs_graph!value2     'Values are second field
        End If
    End With
 
 
  Set grd_graph.DataSource = RS_Results
              
    cbo_RowSelected.Clear
    cbo_RowSelected.Refresh
End Sub

Open in new window

0
 
LVL 6

Expert Comment

by:VK
ID: 23634044
Simply accept my answers in your open questions.
0
 

Author Comment

by:Student_101
ID: 23634100
Will do :)

This is my first post:
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/Q_24109514.html

This is my 2nd post
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/Q_24138898.html

This is my 3rd post
http://www.experts-exchange.com/Programming/Languages/Visual_Basic/Q_24141217.html

You can have all the 1500 points if I get my solution :)

Looking forward to hearing from you.
Regards.
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23634269
Student_101,

=INDEX(LINEST(kown_ys,known_xs,const,1) for the gradient m

=INDEX(LINEST(kown_ys,known_xs,const,1) for the constant c

where in the general formula y = mx + c

Have a look at LINEST() 'help' to get the additional stats results you want.

Patrick (A-B)
0
 
LVL 45

Assisted Solution

by:patrickab
patrickab earned 1320 total points
ID: 23634304
Oops should have been - for the second formula:

Student_101,

=INDEX(LINEST(kown_ys,known_xs,const,1) for the gradient m

=INDEX(LINEST(kown_ys,known_xs,const,2) for the constant c

where in the general formula y = mx + c

Have a look at LINEST() 'help' to get the additional stats results you want.

Patrick (A-B)
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23634376

m = 0.000157 =INDEX(LINEST($C$2:$C$398,$D$2:$D$398),1)
c = 0.003958 =INDEX(LINEST($C$2:$C$398,$D$2:$D$398),2)




0
 

Author Comment

by:Student_101
ID: 23634424
Hi patrickab,

I was just trying out the formulas in excel.

Yes, the slope and Y-intercept works fine. But I need this for every range outputted with the R-squared value.

For example, every time the function gets a R-squared value closest to 1, it should give the relevant slope, Y-intercept, and other functions such as the standard error etc.

Your solution is correct, and I will award points to you for that part after I have received the complete function from VK so I can incorporate this for the slope and Y-intercept.

Thanks for your help :)

btw, can i show the standard error to for the m and c ?
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23634594
Student_101,

What makes you think VK is going to return here?

Generally speaking 'fishing' for points here in the Excel forum is discouraged as it's close to passing points to people. We all have the choice as to whether or not we contribute for whatever points are on offer. I have already 'voted' to contribute whatever I can. If you ask a new question then I will of course look at that as well.

Patrick
0
 

Author Comment

by:Student_101
ID: 23634657
Hi Patrickab,

My question still remains the same, it was not just to find the slope and the Y-intercept, but to complete a function that gives me all the ranges of the values in closest R-squared to 1.

The other parts such as Slope, Y-intercept, will all follow once the R-squared values have been obtained, because on the basis of those values I will calculate the slope and Y-intercept.

I have no problem in accepting your solution right now, but I am new to EE, so can I just award you the points that you deserve for the slope and Y-intercept, and yet keep the remaining points for VK.

And I really do think that VK will return here :)

And I DO appreciate your help as well :)
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23634712
      m              c                 R^2
0.000157    0.003958     0.734419

I think I've missed the point because I can't see what the problem is in calculating those basic stats. - even if they have to be done without using the built-in Excel functions.

Patrick
0
 

Author Comment

by:Student_101
ID: 23634777
Oh, Ok.

The point is not to just calculate it once for the whole range patrickab, i can do that very easily.

Lets give an example, in the file I have uplodaded, I can MANUALLY get a R-squared value of 0.984269 (=CORREL(C2:C7,D2:D7)^2) for the ranges 2-7, now the function will get this BEST RANGE automatically, and the continue from cell 8 downwards and give me the next BEST corresponding R-squared value, its Range(start/end date) the slope, the Y-intercept, and other regression analysis values :)

Is my question clear now?

I apologize for the misunderstanding.



=CORREL(C2:C7,D2:D7)^2

Open in new window

0
 
LVL 6

Expert Comment

by:VK
ID: 23635641
Hi student 101,

Is Sample-Data-for-keeling-Plot.xls
the output of Public Function Corell()
?
0
 

Author Comment

by:Student_101
ID: 23636175
Hi VK,

no, that is just dummy data that I made up to illustrate the output format.
0
 

Author Comment

by:Student_101
ID: 23636257
The correl function still needs a lot more work,

because It doesn't know what current is, so I think we will need a subroutine to define and keep a temp variable. If you analyze the data pattern, usually one or two months of data gives the best R-squared value, and then the pattern changes.

You will be able to better see this is make a scatter plot for the whole range, and it will show you the different patterns. Usually consecutive months follow the same pattern. So our R-squared value will be closest to 1 in consecutive months. i.e, withing 6-12 values.
0
 

Author Comment

by:Student_101
ID: 23636281
** Current --> current_max**
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23636540
I still don't see the problem. Surely all that's needed is to use MAX() in Conditional Formatting for the R^2 column of results or to use INDEX(MATCH) to get the best result. I think I must still be missing something. There is only one problem in doing a 'rolling' set of figures - as more data is added to the calculation the R^2 figure drops so the best figure of 1 will always be at the start of the range of data - see attached file.

Apart from that it is very 'dubious' statistically to be selective over what data to include and what not to include - all depending on how 'good' the results are.
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23636555
0
 

Author Comment

by:Student_101
ID: 23636696
Hi Patrick,

nice to hear back from you.

I think the point you are missing is this.

Once the max R-squared value has been reached, which after manual extraction was 0.984269 for cells 2-8, then the function should begin this calculation again WITHOUT ADDING TO THE PREVIOUS R-SQUARED VALUE

So when the loop starts the 2nd time, from cell 9, again after manual analysis I got rows 9-13 to give me a value of 0.09844
RSQ($C$8:$C13,$D$8:$D13)

and so on, the function should give the max at every range it finds it, and then output the Start Date, the End Date, and the other things I requsted for in the sample output format.

I have attached the 2 trials that I talked about in this comment.

Thank you for helping :)

Let me know if there's still some clarification required.
temp.xls
0
 

Author Comment

by:Student_101
ID: 23636705
***So when the loop starts the 2nd time, from cell 9, again after manual analysis I got rows 9-13 to give me a value of 0.9844***
RSQ($C$8:$C13,$D$8:$D13)
0
 

Author Comment

by:Student_101
ID: 23636738
and Patrick, this is just one file, I don't need the info for just one file, I have several different data sources that I get data from, and I want to build this function in my application so I can get the ranges every time, as this helps my supervisor (chemical Scientist) in her scientific research.
0
 

Author Comment

by:Student_101
ID: 23636763
>>  If you analyze the data pattern, usually one or two months of data gives the best R-squared value, and then the pattern changes.

You will be able to better see this is make a scatter plot for the whole range, and it will show you the different patterns. Usually consecutive months follow the same pattern. So our R-squared value will be closest to 1 in consecutive months. i.e, withing 6-12 values. <<
0
 

Author Comment

by:Student_101
ID: 23636826
So lets put it this way, we are looking for the best range of values that produce a R-squared value closest to 1, once that range is obtained, we start looking for the next BEST range after that date, and the process goes on, Outputting the ranges and other regression calculations.

The function should take a minimum of 6-8 points, and maximum of 15 points.
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23637000
OK, so if I've got it this time, what you are wanting to do is make the calculations on a 5 row window of data - starting on row 8.

But you then lose me completely with this statement:

>The function should take a minimum of 6-8 points, and maximum of 15 points.
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23637037
>OK, so if I've got it this time, what you are wanting to do is make the calculations on a 5 row window of data - starting on row 8.

Nope, that doesn't look right either. I just don't see what is the correct number of rows to include in the calculation. Do please explain.

0
 

Author Comment

by:Student_101
ID: 23637056
The calculation is not fixed to 5 rows, it should be flexible based on either 6,7,8,9,10,11,12,13,14 or 15 row which gives the R-squared value closest to one.

So for example, in the first try, we got the closest RSQ to be 0.984 in 7 rows
The one after that was in 5 rows
Then lets say we get the next RSQ to be .0977 after 10 rows.

So the function should flexible determine how many rows, (more then 5 and less then 15) result in the best RSQ.

because if we just use 2 rows lets say, then of course the RSQ will be very close to 1.
So that's why I said minimum or 6 rows.
0
 

Author Comment

by:Student_101
ID: 23637111
May be if you check my output file you will get a better idea.

In which I have the different Ranges that give me the best RSQ values.

But that is just dummy data that I created to describe the output format. It will still help I think.

:)
0
 

Author Comment

by:Student_101
ID: 23637123
Output file format....
Sample-Output-For-keeling-Plot.xls
0
 

Author Comment

by:Student_101
ID: 23637175
Does this make it easier to understand??

>>A logical explanation could be, and to make the problem more simpler, the loop should look up to the first 6-8 values from the point it begins the loop, as every month has 4 data values, so we can look at 2 months data.

A temp variable can be created that checks how the R-squared value is affected by adding the first 6 values, the first 7 values, or the first 8 values, if the R-squared values keeps increasing as n increases, then we should keep increasing the loop, but if the R-squared value decreases as n increases, then we should keep the value of n that resulted in the highest R-squared value.<<
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23640372
>The function should take a minimum of 6-8 points, and maximum of 15 points.

I believe I now understand that comment.
0
 

Author Comment

by:Student_101
ID: 23640992
So is this possible in just excel formula, or should I do it in VBA?
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23641737
It is in my view only possible to achieve what's wanted by using VBA. I will have a go at it, but I'm sure you appreciate that, like everyone all contributors here, I am a volunteer, so I will tackle it when I am free.
0
 

Author Comment

by:Student_101
ID: 23641799
Yes Patrick,

I appreciate all your help :)

When ever you can, thanks a lot !
0
 

Author Comment

by:Student_101
ID: 23642781
VK,

have you tried working on the function still?

Any luck??
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23644383
Student_101,

I have been considering your whole approach to the analysis of the data and find that I cannot agree with it at all. You should not be attempting to extract 'a good fit' from a random number of rows of data - the number of rows depending on whether the fit gets worse. That is data manipulation to fit a theory of the worst kind. I mentioned that much earlier and I still hold to that view. If you chart the whole data set and can then see some sort of datal-based pattern then it is fair to extract that pattern. However that is not how you are going about it. Instead your supervisor is attempting to manipulate the figures to fit his/her theory.

If I had attempted to analyse my data when I was a researcher, in the same way that your supervisor is proposing, I would have been severely criticised by my Professor and his Senior Lecturers.

Do please contradict me if you have an explanation that refutes my belief in what I see as a flawed analysis. Meantime I will await your response.

Patrick


0
 

Author Comment

by:Student_101
ID: 23645337
Hi Patrick,

I understand your explanation, but the explanation behind this data analysis is that a pattern does exist.

When the RSQ reaches closest to 1, it just tells us the range of that pattern. My supervisor has been doing this manually, by hand, to find out which months illustrate that pattern, from which we can conclude that during these months, the data follows a certain pattern.

This is not about manipulating the data to get the answer we are looking for, the data has a pattern, and I mentioned that earlier as well,

>>  If you analyze the data pattern, usually one or two months of data gives the best R-squared value, and then the pattern changes.

You will be able to better see this is make a scatter plot for the whole range, and it will show you the different patterns. Usually consecutive months follow the same pattern. So our R-squared value will be closest to 1 in consecutive months. i.e, withing 6-12 values. <<

Just try making a scatter plot for the whole data set, and you will see many patterns, and all we are doing through the concept of RSQ is getting the ranges of those patterns.

0
 
LVL 45

Expert Comment

by:patrickab
ID: 23645511
Student_101,

I understand the idea but I still strongly disagree with approach. It is manipulating the data in an attempt to find some patterns. For a moment consider what your likely solution might be. Let's say you believe you have found a pattern. As the pattern will have been found over an inconsistent number of rows you cannot then claim that the pattern is valid for anything other than for that specific set of circumstances. In other words it will be invalid for general application and as such will have little or no value.

If you believe there is a datal (monthly) pattern then, in my opinion, you should be attempting to prove that by charting many corresponding months on the same scale. If there is a pattern then it will show up.

I have looked at the total plot and there are some interesting trends but I have no idea of their significance (if any). For a linear regression line the R^2 figure is not good - but then that shouldn't surprise as the data is pretty well spread..

Patrick
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23645589
Student_101,

If you, or your supervisor, believe there is some sort of  monthly trend you could plot the results as in the attached file - even so the results are not obvious as they are datally bases and that can only be deciphered by refence to the series keys - see attached file.

Patrick

Sample-Data-for-keeling-Plot-02.xls
0
 

Author Comment

by:Student_101
ID: 23645595
Patrick,

This data represents the concentration of a molecule and an Isotope of its kind.

I didn't want to give you these details as they were not necessary for solving the problem, but the scientists have been using this approach for several years, and again as i said before, it is not manipulation of data to get the results WE DESIRE.

The months follow a pattern, and the Y-intercept value gives a significance meaning which cannot be described due to company policy, but it gives the relationship between the environment and the changes that occur in the molecular state of these isotopes and the weather.

Through this concept, the RSQ tells which months have a similar pattern, and when these isotopes give those relationships.

I appreciate that you are trying to make a logical sense of it, but the details in terms of the true relationships go far beyond my understanding as I am just a student, and the scientists that have been working on this procedure know what they are doing.

And again I assure you that this is not just false manipulation of data to fit our desired conclusions/research.

I have asked this question about 2 weeks from now, and to be very frank with you, this was the reason I joined EE, to get the answer to this question, but I sill await my solution.

I would really appreciate it if you could help me with the function now, as we have spent a lot of time understanding the details, but no advancements have been done on my part in completing this task, I hope you understand.

Kind Regards,
A student.

0
 

Author Comment

by:Student_101
ID: 23645609
Just to reconfirm, the relationship occurs keeping the X-axis as the original data, and the Y-axis as the Isotope data.
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23645633
Student_101,

OK, I do hope that although they are scientists, they actually do their analysis correctly. I will not question any more but I will say that it is never wise to assume that because someone has been doing something for a while, or because they label themselves as scientists, that they have much of an idea about analysing their data. Still I won't go down that path any more.

I'll see what I can do.

Patrick

0
 

Author Comment

by:Student_101
ID: 23645652
I appreciate your help,

but findings they report are relevant to the ones that are found by several ministry of environments around the world, and I believe that assuming all of them are doing the analysis in a manner they don't know of will be incorrect.

I really hope we can get this completed soon.

Thank you for all your help.

Regards

0
 
LVL 6

Expert Comment

by:VK
ID: 23647717
good morning student,

I'm so sorry, that i couldn't deal with your problem at the weekend.

Fortunately there are more samples now,
and i hope that your conversation with patrickab
will give me more details about the problem.

I will be back after work at the evening.

0
 
LVL 45

Expert Comment

by:patrickab
ID: 23648893
Student_101,

As there is a linear relationship between the two variables it is valid to plot either of them against time to see how it changes. Doing this shows there to be a sinusoidal variation with a secondary sinusoidal overlay variation of a slightly longer frequency. See attached file.

I still stand by what I have said. I do not believe the data is being analysed correctly. There is an obvious long term downwards trend in the sinusoidal variations. With some effort a decent mathematician/statistician would be able to construct a formula that would have a good fit with the data. To do it on the current basis of 'cheese paring' is dubious to say the least.

I understand you are tasked with automating the analysis process but I believe the data analysis as currently carried out to be deeply flawed.

Patrick

Sample-Data-for-keeling-Plot-05.xls
0
 

Author Comment

by:Student_101
ID: 23649444
hmmmmm..

Ok Patrick, I thought we weren't going down that road again,

but I will speak to my supervisor today and get all the details, so I can explain this to you in a manner that you are satisfied that the analysis is correct, and what this analysis is done for.

Regards.
0
 

Author Comment

by:Student_101
ID: 23649739
Ok Patrick,

I got some info over the web to.

Here is a detailed explanation of the concept of the Keeling Plot, and again as I told you before, this is not a NEW concept, or a new way on analyzing data, but it has been practiced for several years around the globe.

In front of us however, are only 3 columns, and with the lack of detailed atmospheric scientific knowledge, as well as proper explanation NOT given to all the details (not needed here), we are assuming that the analysis being done world-wide is incorrect.

However, for your personal satisfaction, I am trying to get as much details so that you feel satisfied that this analysis being practiced around the world by MANY scientists, organizations, and ministry of environments is not "dubious"

Please see the attached link, and I really hope that instead of verifying something that has already been verifying, we actually get to the solution of this problem.

I do appreciate all your help and concern in this matter.

Kind Regards.

Please see the links below for a detailed explanation of the Keeling Plot

http://www.sciencedirect.com/science?_ob=MiamiCaptionURL&_method=retrieve&_udi=B6V66-4HG67CS-3&_image=fig1&_ba=1&_user=3297716&_coverDate=01%2F01%2F2006&_rdoc=1&_fmt=full&_orig=search&_cdi=5806&view=c&_acct=C000058537&_version=1&_urlVersion=0&_userid=3297716&md5=18f8b123f53dde04df79e52a25021144

http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V66-4HG67CS-3&_user=3297716&_coverDate=01%2F01%2F2006&_rdoc=1&_fmt=full&_orig=search&_cdi=5806&_sort=d&_docanchor=&view=c&_acct=C000058537&_version=1&_urlVersion=0&_userid=3297716&md5=c36731a0e346594877262db92d979ffa#sec1

http://ecophys.biology.utah.edu/Research/OTTER/Keeling%20plot%20explained.html

http://www.biogeosciences.net/3/539/2006/bg-3-539-2006.html

sercnigec.ua.edu/presentations/Fumigation20.ppt
(pg 7, 12)

P.S

As said earlier, the ranges for which the RSQ is closest to one, gives us meaningful insight that helps these scientists study the composition of the isotopes, and how they are affected by the weather at those stages.
0
 

Author Comment

by:Student_101
ID: 23649989
another point to keep in mind is that we are using ALL THE DATA

so it's not that we just use the data that gives us a RSQ close to one, but we use the whole data set, and flexibly report the ranges that give us an RSQ close to 1.
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23650104
Student_101,

Thank you for going to so much trouble to explain the background and purpose of the analysis. I have no doubt that it is very frustrating for you have to go to such lengths, however I do appreciate the effort you have made to answer my criticisms of the method of analysis.

Let me see what I can do on the VBA front. I am busy this afternoon. However if VK comes up with a solution earlier than I can get to it, do please go with that.

Patrick
0
 

Author Comment

by:Student_101
ID: 23650124
Hi Patrick,

I understand that part of you providing me the correct solution is that you ensure that what ever you are doing from your part is correct, and that's why you had all those concerns.

I will appreciate any help from your part, and your comments have also helped both us get a better understanding of this question.

Lets hope that we can get to a solution soon.

Take your time, and I really appreciate all your help :)

Kind Regards.
A Student
0
 

Author Comment

by:Student_101
ID: 23653579
VK,

ALL the details that any expert needs to know about this question are present here, as we have 56 comments on this question so far.

I really hope that either you or Patrick can help me get to the solution of this.

Waiting...
0
 

Author Comment

by:Student_101
ID: 23653852
I have manually analyzed the data and attached a sample output file.

This is read data for 2 years.

Please see file attached.
temp.xls
0
 

Author Comment

by:Student_101
ID: 23653857
>> This is REAL data for 2 years<<
0
 
LVL 6

Assisted Solution

by:VK
VK earned 180 total points
ID: 23657192
Hello student.

It emerged that there was a misapprehension by me in the terms you use.
Because my native language is german i investigated at wikipedia.

R squared value:

http://en.wikipedia.org/wiki/Coefficient_of_determination
http://de.wikipedia.org/wiki/Bestimmtheitsma%C3%9F


Correlation Function:

http://en.wikipedia.org/wiki/Correlation
http://de.wikipedia.org/wiki/Korrelation


That implicates the next question to you:

If the problem is the implementation of an adequate algorithm basing upon the theory and the definition of your task then i believe that we will get ahead with it.

Im experienced in vb and could also help in excel topics. Furthermore im a passonate mathematician. But my knowledge of stochastics doesnt include correlation theory.

Could we narrow the whole stuff to the problem or am i forced to deal with the theory?
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23658411
Student_101,

Thanks for your test.xls file.

I am having problems figuring out the rules the macro needs to follow. The problems is illustrated by your second group of R^2 figures - see below:

1
0.997710869
0.994729162
0.984402516
0.9847637850.928672566
0.896878219
0.940202453
0.939311191
0.941852034
0.944222987
0.976414163
0.976628252
0.976170064
0.976261497
0.976293635
0.97583348
0.975805569
0.980417425
0.983299124
0.984463541
0.987249236

The problem arises because I cannot see a logical reason why you have not made a break at the highlighted figures. The R^2 figure drops to 0.928672566 from 0.984763785 so it seems the most logical place to start again. Do you not start again because it is before 6 rows of data have been included? Do please explain.

Patrick



0
 

Author Comment

by:Student_101
ID: 23658526
Hi Patrick,

Yes I believe you are right, it is only logical to break at that point, and start the function again.

I was manually doing this, so I guess it skipped my understanding.
0
 

Author Comment

by:Student_101
ID: 23658538
VK,

You absolutely don't need to even think about the theory part.

All the formula's needed are present, and all we need to do is the make the loop such that every time we get a RSQ close to 1, we report the range, and the other things I requested for, and then move along with it.

Patrick has already provided all the info in terms of the formulas we need. We just need to work on the VBA front to get this done soon now.

I have a deadline before the end of this week to get this function completed, and I really hope that you experts can help me reach that deadline.

All your help is much appreciated.
Regards,

Student
0
 

Author Comment

by:Student_101
ID: 23658556
Sorry Patrick,

to get back to that question again, we need to analyze at least 6 rows to data before we report the Range.

So that's why we didn't choose the 0.984763785 value.

I think we can overcome that barrier by using a minimum of 6 rows to data.

0
 

Author Comment

by:Student_101
ID: 23658566
VK,

we don't need any theory from the web.

The algorithm has been extensively defined in the comments above.

We just need to get it done.
0
 

Author Comment

by:Student_101
ID: 23658582
VK,

Just to add to my previous comment, I think you would be able to perfectly understand how this function needs to be if you read the comments between me and Patrick.

The function needs to look at a minimum of 6-15 points before it reports the range of the best RSQ value.
0
 
LVL 45

Accepted Solution

by:
patrickab earned 1320 total points
ID: 23659462
>So that's why we didn't choose the 0.984763785 value.
>I think we can overcome that barrier by using a minimum of 6 rows to data.

I'm sorry but that is too vague. The 6 row minimum is OK but the R^2 value threshhold is crucial. It can't be on the bsis of "Oh well, I decided it wasn't good enough at that stage". It needs to follow strict rules - and that's what I need from you.

The macro so far is really quite simple to implement (if I've got it right) - it's something like that below...

But please come back with concrete rules for R^2 values thrshholds.

Patrick

Sub ranger()
Dim startrow As Long
Dim i As Long
Dim n As Long
 
i = 1
startrow = 2
For n = 1 To 400
    Cells(startrow + i, 7).FormulaR1C1 = _
"=RSQ(R" & startrow & "C3:R" & startrow + i & "C3" & ",R" & startrow & "C4:R" & startrow + i & "C4)"
    If Cells(startrow + i, 7) < Cells(startrow + i - 1, 7) And i >= 6 And i <= 15 And Cells(startrow + i - 1, 7) > 0.984 Then
'        Cells(startrow + i - 1, 7).Font.Bold = True
        Cells(startrow + i, 7).ClearContents
        startrow = startrow + i
        i = 0
    End If
    i = i + 1
'MsgBox "i = " & i
Next n
 
End Sub

Open in new window

0
 
LVL 45

Expert Comment

by:patrickab
ID: 23659478
ps. I'm out for the next 8hrs

Patrick
0
 

Author Comment

by:Student_101
ID: 23659481
Hi Patrick,

I am here right now,

and I will give you very concrete examples. I apologize for the confusion above.

Let me verify this macro and see the results and get back to you right away.

Thanks
0
 

Author Comment

by:Student_101
ID: 23659585
In that case,

Before you come back I will keep a very specific and concrete example ready for you.

Thank you very much Patrick !
0
 

Author Comment

by:Student_101
ID: 23660154
Latest update,

I showed the detailed Excel file which I posted earlier, that had the RSQ values for the first 2 years, and we came to the conclusion that automatic the process will be very difficult, because we can't define concrete ranges for the function to look it, in terms of specifiying how many minimum/maximum points it should read.

What I am trying to do now, is from my form, make it manually as user-friendly as I can, to assist the user in finding the best RSQ ranges.

When the user selects any rows from the grid, it will automatically calculate the RSQ value and the result will be displayed in the text box, or the user can select the start/ending dates from the combo boxes and get the RSQ values that way.

I asked a new question as to what Charting option will best display this in VB6, that when the user selects these rows, it is as user friendly for him as possible.

http://www.experts-exchange.com/Programming/Languages/Visual_Basic/Q_24150183.html

I would like to thank Patrick specially for all your help. It really increased my understanding on this topic.

VK, thank you for your support as well.

I will be closing this question now, and more points go 2 Patrick because he has remained very active throughout the process of leading me to this conclusion.

Regards,

A humble student.
0
 

Author Comment

by:Student_101
ID: 23660472
Patrick,

were to able to get any sort of output from the macro that you gave me?

If yes, can you please send me that code as well the text file so I can see the results.

Regards.
0
 

Author Comment

by:Student_101
ID: 23663723
Patrick, can we please continue what I asked you after accepting the solution in this question

http://www.experts-exchange.com/Programming/Languages/Visual_Basic/Q_24138898.html

Regards.
0
 
LVL 45

Expert Comment

by:patrickab
ID: 23664993
Student_101,

You have rated my contribution worthy of a B grade, which is fair enough if you consider it of that level of value. I understand that decision, so I must conclude that my input is of limited value. I am therefore somewhat disinclined to continue with this project.

I also understand your frustration with the slow and tortuous progress that I made of the whole project. Perhaps that is why you have rated my input worthy of a B grade. It's rare that I make such a meal of it, but it is equally rare for me to contribute to a project that at first sight made me wonder whether I was doing the right thing in automating something that with my limited knowledge on the subject made me doubt the wisdom of the approach. You were very persuasive on the subject - so I have persevered.

Patrick
0
 

Author Comment

by:Student_101
ID: 23670314
>>So far I have yet to receive a response to my comments in your last question. Perhaps you would attend to that first.

Let me be blunt. I provided you with the formulae and a macro for your last question and you closed it early and awarded a B grade. Why should I be bothered to assist here if you award a low grade, few points and even worse points for a contribution made by VK that didn't even contribute to the solution. In fact you even said "You absolutely don't need to even think about the theory part." Despite that you assigned an 'Assist' to his contribution. I cannot see the logic in your decision. If VK had actually contributed to the solution you would not have heard a comment from me about him receiving an 'Assist'.<<

Hi Patrick,

The B grade issue really surprises me. The formula's that you sent me were fine, but the whole point of this question was to be able to get that function working, which did not happen. The macro had no results or output, and even the file you sent me yesterday just gave all the corresponding values from the dates, and did not satisfy the purpose of the function.

Yes, after all that we went through in terms of verification of this process, and after much feed back from you, the B grade wasn't enough, but that grade was not a representation of your efforts, rather a representation of what we were left with after a week of discussions.

I closed this question early so that I could give you the points for my 2nd open question as well, as that question would've been deleted yesterday if I didn't do so. And because you had already given me the macro, which was incomplete but I know that it could've almost been finished by you without much effort, therefore I decided to accept your solution with 440 points, and give you the remaining 500 points from the other question as well.

I thank you for all your help, and I apologize for the B grade.

You did more then any else in helping me through this, and I will close with this now.

Take care.
Thanks for your help.

A Student

P.S.
I will be deleting that question, and I will accept your solution as well, even though the macro was not complete, as I will be leaving from EE.

Regards.



0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

You need to know the location of the Office templates folder, so that when you create new templates, they are saved to that location, and thus are available for selection when creating new documents.  The steps to find the Templates folder path are …
This article describes a serious pitfall that can happen when deleting shapes using VBA.
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.
This Micro Tutorial will demonstrate how to use longer labels with horizontal bar charts instead of the vertical column chart.

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question