Auto Correlation Function, Output Range and Regression Analysis data to Excel

Hi,

I am working on an Auto-correlation function, and need help finishing it off.

This function loops through the flex grid, and finds the R-squared value closest to 1. As soon as this value is reached, the function will give the Range (the Start date, where the function started this calculation, and the End Date, where it finished this calculation) and then it continues this until the end.

I also need some other functionality, such as the slope, Y-intercept, Standard errors, and the regression analysis functions, but I guess those are built in excel so it wont be a problem.

I have attached a sample output and input file.

A logical explanation could be, and to make the problem more simpler, the loop should look up to the first 6-8 values from the point it begins the loop, as every month has 4 data values, so we can look at 2 months data.

A temp variable can be created that checks how the R-squared value is affected by adding the first 6 values, the first 7 values, or the first 8 values, if the R-squared values keeps increasing as n increases, then we should keep increasing the loop, but if the R-squared value decreases as n increases, then we should keep the value of n that resulted in the highest R-squared value.

This is just a logical explanation to solve this problem,
now I need to put it into code.

So I will need a loop, a temp variable to keep track of n, and the R-squared value.
Might need If case.

Your help and suggestions are much awaited and appreciated

Public Function Corell() Dim objApp As Excel.Application Dim objBook As Workbook Dim objSheet As Worksheet On Error Resume Next Set objApp = GetObject("excel.application") If Err.Number <> 0 Then Set objApp = CreateObject("excel.application") End If Set objBook = objApp.Workbooks.Add Set objSheet = objApp.Sheets(1) Dim j As Integer, high As Integer Dim current_max, max As Double ' This compares the result with the max Dim n As Integer high = grd_result.Rows - 1 For j = 1 To high n = j objSheet.Cells(j, 1).value = grd_result.TextMatrix(j, 2) objSheet.Cells(j, 2).value = grd_result.TextMatrix(j, 3) objSheet.Range("a" & n + 1 & ":a" & n + 1).formula = "=Correl(a1:a" & n & ", b1:b" & n & ")^2" result = FormatNumber(CDbl(objSheet.Range("a" & n + 1 & ":A" & n + 1).value), 5, vbTrue) If result > current_max Then max = result ElseIf current_max > result Then max = current_max End If Next End If objSheet.Range("a" & high + 1 & ":a" & high + 1).formula = "=Correl(a1:a" & high & ", b1:b" & high & ")^2" result = FormatNumber(CDbl(objSheet.Range("a" & high + 1 & ":A" & high + 1).value), 5, vbTrue) objApp.DisplayAlerts = False Set objSheet = Nothing Set objBook = Nothing objApp.Quit Set objApp = NothingEnd Function'''''Private Sub cmd_OutputKeelingPlot_Click()Call Corell Dim objApp As Excel.Application Dim objBook As Workbook Dim objSheet As Worksheet Dim objExcelCI As Excel.Chart Dim high As Integer On Error Resume Next Set objApp = GetObject("excel.application") If Err.Number <> 0 Then Set objApp = CreateObject("excel.application") End If Set objBook = objApp.Workbooks.Add Set objSheet = objApp.Sheets(1) objApp.Visible = True objApp.UserControl = True objApp.WindowState = xlMaximized objApp.DisplayAlerts = False With objSheet .Cells(1, 2).value = " Starting Date " .Cells(1, 3).value = " Ending Date " .Cells(1, 4).value = " Plot Date " .Cells(1, 5).value = " n " .Cells(1, 6).value = " R-squared " .Cells(1, 7).value = " Standard Error " .Cells(1, 8).value = " Slope " .Cells(1, 9).value = " Standard Error " .Cells(1, 10).value = " Y-intercept " .Cells(1, 11).value = " Standard Error " ''' Have the code for the loop that gives all the values to these columns and defiens the range ''' High = end row from Autocorrellation high = 10 .Range("B1:K" & high).Select With Selection .HorizontalAlignment = xlCenter .VerticalAlignment = xlBottom .WrapText = False .Orientation = 0 .AddIndent = False .IndentLevel = 0 .ShrinkToFit = True .ReadingOrder = xlContext .MergeCells = False .Borders(xlDiagonalDown).LineStyle = xlNone .Borders(xlDiagonalUp).LineStyle = xlNone With Selection.Borders(xlEdgeLeft) .LineStyle = xlContinuous .Weight = xlMedium .ColorIndex = xlAutomatic End With With Selection.Borders(xlEdgeTop) .LineStyle = xlContinuous .Weight = xlMedium .ColorIndex = xlAutomatic End With With Selection.Borders(xlEdgeBottom) .LineStyle = xlContinuous .Weight = xlMedium .ColorIndex = xlAutomatic End With With Selection.Borders(xlEdgeRight) .LineStyle = xlContinuous .Weight = xlMedium .ColorIndex = xlAutomatic End With End With .Range("B1:K1").Select .Columns.AutoFit With Selection .Font.Bold = True .Borders(xlDiagonalDown).LineStyle = xlNone .Borders(xlDiagonalUp).LineStyle = xlNone With Selection.Borders(xlEdgeLeft) .LineStyle = xlContinuous .Weight = xlMedium .ColorIndex = xlAutomatic End With With Selection.Borders(xlEdgeTop) .LineStyle = xlContinuous .Weight = xlMedium .ColorIndex = xlAutomatic End With With Selection.Borders(xlEdgeBottom) .LineStyle = xlContinuous .Weight = xlMedium .ColorIndex = xlAutomatic End With With Selection.Borders(xlEdgeRight) .LineStyle = xlContinuous .Weight = xlMedium .ColorIndex = xlAutomatic End With End With End With Set objExcelCI = objExcelW.Charts.Add With objExcelCI .ChartType = xlBarStacked .SetSourceData Source:=Sheets("Sheet1").Range("E1:E" & high & "J1:J" & high), PlotBy:=xlColumns .Location Where:=xlLocationAsNewSheet, Name:="Keeling Plot" .HasTitle = True .ChartTitle.Characters.Text = "Keeling Plot" .Axes(xlCategory, xlPrimary).HasTitle = True .Axes(xlCategory, xlPrimary).AxisTitle.Characters.Text = "Y-Intercept" .Axes(xlValue, xlPrimary).HasTitle = True .Axes(xlValue, xlPrimary).AxisTitle.Characters.Text = "Plot Date" .HasLegend = False With .PlotArea.Border .ColorIndex = 16 .Weight = xlThin .LineStyle = xlContinuous End With Selection.Fill.OneColorGradient Style:=msoGradientDiagonalUp, Variant:=3, _ Degree:=0.231372549019608 With Selection .Fill.Visible = True .Fill.ForeColor.SchemeColor = 43 End With With .SeriesCollection(1).Border .ColorIndex = 1 .Weight = xlHairline .LineStyle = xlDot End With With .SeriesCollection(1) .MarkerBackgroundColorIndex = 2 .MarkerForegroundColorIndex = 1 .MarkerStyle = xlCircle .Smooth = False .MarkerSize = 5 .Shadow = False End With End With Set objSheet = Nothing Set objBook = Nothing Set objApp = Nothing Set objExcelCI = NothingEnd Sub

>So that's why we didn't choose the 0.984763785 value.
>I think we can overcome that barrier by using a minimum of 6 rows to data.

I'm sorry but that is too vague. The 6 row minimum is OK but the R^2 value threshhold is crucial. It can't be on the bsis of "Oh well, I decided it wasn't good enough at that stage". It needs to follow strict rules - and that's what I need from you.

The macro so far is really quite simple to implement (if I've got it right) - it's something like that below...

But please come back with concrete rules for R^2 values thrshholds.

Patrick

Sub ranger()Dim startrow As LongDim i As LongDim n As Longi = 1startrow = 2For n = 1 To 400 Cells(startrow + i, 7).FormulaR1C1 = _"=RSQ(R" & startrow & "C3:R" & startrow + i & "C3" & ",R" & startrow & "C4:R" & startrow + i & "C4)" If Cells(startrow + i, 7) < Cells(startrow + i - 1, 7) And i >= 6 And i <= 15 And Cells(startrow + i - 1, 7) > 0.984 Then' Cells(startrow + i - 1, 7).Font.Bold = True Cells(startrow + i, 7).ClearContents startrow = startrow + i i = 0 End If i = i + 1'MsgBox "i = " & iNext nEnd Sub

I urge you not to rely on Excel for regression calculations. Most of the time Excel will generate the right
answer, but under certain conditions Excel produces absurd output, such as negative R^2, which of course
is mathematically impossible.

For all of my regression work I use Minitab, which is very, very robust, albeit pricey. There are several
commercial add-ins available for Excel that are worthwhile, but while they are much less expensive than
Minitab, they are not free.

Regards,

Patrick

0

Student_101Author Commented:

Hi Matthewspatrick,

Thanks for the prompt feedback, but this is just a function that I need in my application.

I need to get the R-squared value, the Y-intercept, and the Slope. Other regression calculations are not so important for me.

I have been manually doing this for a long time, and I haven't come across such problems yet.

Thanks for your feedback :)

0

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Well, if you need the R^2, the bad news is that Excel will not always calculate it correctly. The fact that
it has not happened to you yet does not mean that it never will happen to you.

Regards,

Patrick

0

Student_101Author Commented:

matthewspatrick,

I am designing this application for my supervisor, and they are well-acquainted with Excel, and have asked me to do this in Excel.

I understand that your comment is based on your knowledge and experience, but lets put it this way,

I have no other choice, and I have to use excel for this, as buying another software will not be allowed.

Student_101 said:
>>Can you please kindly help me with my function?

I will not help you with this current approach--nothing personal, and I wish you the best, but in my opinion your
supervisor is pushing you down the wrong path.

If no other Expert joins in with a few hours, I suggest using the 'request attention' to ask the Mods for help in
bringing in some other Experts.

i'm sure that i will find a solution for you.
Since i have no flex grid i will import your excel data into an access table.
Then i will change your code so that it reads the data from the access table.
The bad excel calculation is not a problem for me because we can do it in vb/vba.

I'm now finished with my job at my office and will go home.
We can continue solving your problem at the weekend.
I hope you can spend more than 500 points if the solution will be very extensive.

0

Student_101Author Commented:

Hi VK,

I actually get this data from an SQL QUERY, using the ADODB recordset from my Access Database.

That's how I get the the data to my flexgrid.

I really hope we can solve this, as I have been looking for a solution for almost 2 weeks now.

I am also new at Expert Exchange, and I was under the impression that the max points you could award is 500?

can you please inform me if there is a limit to the points awarded, and do I have a limit per month as to how many points I use? or does it cost me anything to award more points??

0

Student_101Author Commented:

This is how I get the info to my flex grid

OpenConnection '' connection string to databaseDim cmd_Results As New ADODB.Command Dim RS_Results As New ADODB.Recordset Dim Row As Long Dim newcell As String On Error Resume Nextgrd_result.Clear With cmd_Results .ActiveConnection = m_Conn .CommandText = "Select * From qry_distinctValues;" ''' This is the query that gets the values to the flexgrid .CommandType = adCmdText End With With RS_Results .CursorType = adOpenStatic .CursorLocation = adUseClient .LockType = adLockOptimistic .Open cmd_Results End With Set grd_result.DataSource = RS_Results grd_result.ColAlignmentFixed(-1) = flexAlignCenterBottom grd_result.TextMatrix(0, 2) = " 1/Original " grd_result.TextMatrix(0, 3) = " Isotope " For i = 1 To grd_result.Cols - 1 grd_result.ColAlignment(i) = flexAlignCenterBottom Next Call SizeColumns For i = 1 To grd_result.Rows - 1 grd_result.TextMatrix(i, 2) = 1 / grd_result.TextMatrix(i, 2) grd_result.TextMatrix(i, 2) = FormatNumber(grd_result.TextMatrix(i, 2), 6, vbTrue, vbUseDefault, vbFalse) Next iDim cmd_graph As New ADODB.ConnectionDim rs_graph As New ADODB.RecordsetWith cmd_graph.ActiveConnection = m_Conn.CommandText = "SELECT tbl_Operation.Value, tbl_keelingPlot.Value2 FROM tbl_keelingPlot INNER JOIN tbl_Operation ON tbl_keelingPlot.Row = tbl_Operation.Row ORDER BY tbl_Operation.Date;".CommandType = adCmdTextEnd With With rs_graph .CursorType = adOpenStatic .CursorLocation = adUseClient .LockType = adLockOptimistic .Open cmd_graph End With With Chart_1 .HasChartSpaceTitle = True .ChartSpaceTitle.Caption = cbo_measurementParam.List(0) & " VS. " & cbo_measurementParam.List(1) If keeling_Select = True Then .Refresh high = therowsel For j = therow To high .SetData chDimXValues, grd_result.TextMatrix(j, 2), rs_graph!value .SetData chDimYValues, grd_result.TextMatrix(j, 3), rs_graph!value2 Next j Else .SetData chDimXValues, 2, expr1000 'Categories are first field .SetData chDimYValues, 3, rs_graph!value2 'Values are second field End If End With Set grd_graph.DataSource = RS_Results cbo_RowSelected.Clear cbo_RowSelected.RefreshEnd Sub

m = 0.000157 =INDEX(LINEST($C$2:$C$398,$D$2:$D$398),1)
c = 0.003958 =INDEX(LINEST($C$2:$C$398,$D$2:$D$398),2)

0

Student_101Author Commented:

Hi patrickab,

I was just trying out the formulas in excel.

Yes, the slope and Y-intercept works fine. But I need this for every range outputted with the R-squared value.

For example, every time the function gets a R-squared value closest to 1, it should give the relevant slope, Y-intercept, and other functions such as the standard error etc.

Your solution is correct, and I will award points to you for that part after I have received the complete function from VK so I can incorporate this for the slope and Y-intercept.

Thanks for your help :)

btw, can i show the standard error to for the m and c ?

Generally speaking 'fishing' for points here in the Excel forum is discouraged as it's close to passing points to people. We all have the choice as to whether or not we contribute for whatever points are on offer. I have already 'voted' to contribute whatever I can. If you ask a new question then I will of course look at that as well.

Patrick

0

Student_101Author Commented:

Hi Patrickab,

My question still remains the same, it was not just to find the slope and the Y-intercept, but to complete a function that gives me all the ranges of the values in closest R-squared to 1.

The other parts such as Slope, Y-intercept, will all follow once the R-squared values have been obtained, because on the basis of those values I will calculate the slope and Y-intercept.

I have no problem in accepting your solution right now, but I am new to EE, so can I just award you the points that you deserve for the slope and Y-intercept, and yet keep the remaining points for VK.

I think I've missed the point because I can't see what the problem is in calculating those basic stats. - even if they have to be done without using the built-in Excel functions.

Patrick

0

Student_101Author Commented:

Oh, Ok.

The point is not to just calculate it once for the whole range patrickab, i can do that very easily.

Lets give an example, in the file I have uplodaded, I can MANUALLY get a R-squared value of 0.984269 (=CORREL(C2:C7,D2:D7)^2) for the ranges 2-7, now the function will get this BEST RANGE automatically, and the continue from cell 8 downwards and give me the next BEST corresponding R-squared value, its Range(start/end date) the slope, the Y-intercept, and other regression analysis values :)

Is Sample-Data-for-keeling-Plot.xls
the output of Public Function Corell()
?

0

Student_101Author Commented:

Hi VK,

no, that is just dummy data that I made up to illustrate the output format.

0

Student_101Author Commented:

The correl function still needs a lot more work,

because It doesn't know what current is, so I think we will need a subroutine to define and keep a temp variable. If you analyze the data pattern, usually one or two months of data gives the best R-squared value, and then the pattern changes.

You will be able to better see this is make a scatter plot for the whole range, and it will show you the different patterns. Usually consecutive months follow the same pattern. So our R-squared value will be closest to 1 in consecutive months. i.e, withing 6-12 values.

I still don't see the problem. Surely all that's needed is to use MAX() in Conditional Formatting for the R^2 column of results or to use INDEX(MATCH) to get the best result. I think I must still be missing something. There is only one problem in doing a 'rolling' set of figures - as more data is added to the calculation the R^2 figure drops so the best figure of 1 will always be at the start of the range of data - see attached file.

Apart from that it is very 'dubious' statistically to be selective over what data to include and what not to include - all depending on how 'good' the results are.

Once the max R-squared value has been reached, which after manual extraction was 0.984269 for cells 2-8, then the function should begin this calculation again WITHOUT ADDING TO THE PREVIOUS R-SQUARED VALUE

So when the loop starts the 2nd time, from cell 9, again after manual analysis I got rows 9-13 to give me a value of 0.09844
RSQ($C$8:$C13,$D$8:$D13)

and so on, the function should give the max at every range it finds it, and then output the Start Date, the End Date, and the other things I requsted for in the sample output format.

I have attached the 2 trials that I talked about in this comment.

Thank you for helping :)

Let me know if there's still some clarification required. temp.xls

0

Student_101Author Commented:

***So when the loop starts the 2nd time, from cell 9, again after manual analysis I got rows 9-13 to give me a value of 0.9844***
RSQ($C$8:$C13,$D$8:$D13)

0

Student_101Author Commented:

and Patrick, this is just one file, I don't need the info for just one file, I have several different data sources that I get data from, and I want to build this function in my application so I can get the ranges every time, as this helps my supervisor (chemical Scientist) in her scientific research.

0

Student_101Author Commented:

>> If you analyze the data pattern, usually one or two months of data gives the best R-squared value, and then the pattern changes.

You will be able to better see this is make a scatter plot for the whole range, and it will show you the different patterns. Usually consecutive months follow the same pattern. So our R-squared value will be closest to 1 in consecutive months. i.e, withing 6-12 values. <<

0

Student_101Author Commented:

So lets put it this way, we are looking for the best range of values that produce a R-squared value closest to 1, once that range is obtained, we start looking for the next BEST range after that date, and the process goes on, Outputting the ranges and other regression calculations.

The function should take a minimum of 6-8 points, and maximum of 15 points.

>OK, so if I've got it this time, what you are wanting to do is make the calculations on a 5 row window of data - starting on row 8.

Nope, that doesn't look right either. I just don't see what is the correct number of rows to include in the calculation. Do please explain.

0

Student_101Author Commented:

The calculation is not fixed to 5 rows, it should be flexible based on either 6,7,8,9,10,11,12,13,14 or 15 row which gives the R-squared value closest to one.

So for example, in the first try, we got the closest RSQ to be 0.984 in 7 rows
The one after that was in 5 rows
Then lets say we get the next RSQ to be .0977 after 10 rows.

So the function should flexible determine how many rows, (more then 5 and less then 15) result in the best RSQ.

because if we just use 2 rows lets say, then of course the RSQ will be very close to 1.
So that's why I said minimum or 6 rows.

0

Student_101Author Commented:

May be if you check my output file you will get a better idea.

In which I have the different Ranges that give me the best RSQ values.

But that is just dummy data that I created to describe the output format. It will still help I think.

>>A logical explanation could be, and to make the problem more simpler, the loop should look up to the first 6-8 values from the point it begins the loop, as every month has 4 data values, so we can look at 2 months data.

A temp variable can be created that checks how the R-squared value is affected by adding the first 6 values, the first 7 values, or the first 8 values, if the R-squared values keeps increasing as n increases, then we should keep increasing the loop, but if the R-squared value decreases as n increases, then we should keep the value of n that resulted in the highest R-squared value.<<

It is in my view only possible to achieve what's wanted by using VBA. I will have a go at it, but I'm sure you appreciate that, like everyone all contributors here, I am a volunteer, so I will tackle it when I am free.

I have been considering your whole approach to the analysis of the data and find that I cannot agree with it at all. You should not be attempting to extract 'a good fit' from a random number of rows of data - the number of rows depending on whether the fit gets worse. That is data manipulation to fit a theory of the worst kind. I mentioned that much earlier and I still hold to that view. If you chart the whole data set and can then see some sort of datal-based pattern then it is fair to extract that pattern. However that is not how you are going about it. Instead your supervisor is attempting to manipulate the figures to fit his/her theory.

If I had attempted to analyse my data when I was a researcher, in the same way that your supervisor is proposing, I would have been severely criticised by my Professor and his Senior Lecturers.

Do please contradict me if you have an explanation that refutes my belief in what I see as a flawed analysis. Meantime I will await your response.

Patrick

0

Student_101Author Commented:

Hi Patrick,

I understand your explanation, but the explanation behind this data analysis is that a pattern does exist.

When the RSQ reaches closest to 1, it just tells us the range of that pattern. My supervisor has been doing this manually, by hand, to find out which months illustrate that pattern, from which we can conclude that during these months, the data follows a certain pattern.

This is not about manipulating the data to get the answer we are looking for, the data has a pattern, and I mentioned that earlier as well,

>> If you analyze the data pattern, usually one or two months of data gives the best R-squared value, and then the pattern changes.

You will be able to better see this is make a scatter plot for the whole range, and it will show you the different patterns. Usually consecutive months follow the same pattern. So our R-squared value will be closest to 1 in consecutive months. i.e, withing 6-12 values. <<

Just try making a scatter plot for the whole data set, and you will see many patterns, and all we are doing through the concept of RSQ is getting the ranges of those patterns.

I understand the idea but I still strongly disagree with approach. It is manipulating the data in an attempt to find some patterns. For a moment consider what your likely solution might be. Let's say you believe you have found a pattern. As the pattern will have been found over an inconsistent number of rows you cannot then claim that the pattern is valid for anything other than for that specific set of circumstances. In other words it will be invalid for general application and as such will have little or no value.

If you believe there is a datal (monthly) pattern then, in my opinion, you should be attempting to prove that by charting many corresponding months on the same scale. If there is a pattern then it will show up.

I have looked at the total plot and there are some interesting trends but I have no idea of their significance (if any). For a linear regression line the R^2 figure is not good - but then that shouldn't surprise as the data is pretty well spread..

If you, or your supervisor, believe there is some sort of monthly trend you could plot the results as in the attached file - even so the results are not obvious as they are datally bases and that can only be deciphered by refence to the series keys - see attached file.

This data represents the concentration of a molecule and an Isotope of its kind.

I didn't want to give you these details as they were not necessary for solving the problem, but the scientists have been using this approach for several years, and again as i said before, it is not manipulation of data to get the results WE DESIRE.

The months follow a pattern, and the Y-intercept value gives a significance meaning which cannot be described due to company policy, but it gives the relationship between the environment and the changes that occur in the molecular state of these isotopes and the weather.

Through this concept, the RSQ tells which months have a similar pattern, and when these isotopes give those relationships.

I appreciate that you are trying to make a logical sense of it, but the details in terms of the true relationships go far beyond my understanding as I am just a student, and the scientists that have been working on this procedure know what they are doing.

And again I assure you that this is not just false manipulation of data to fit our desired conclusions/research.

I have asked this question about 2 weeks from now, and to be very frank with you, this was the reason I joined EE, to get the answer to this question, but I sill await my solution.

I would really appreciate it if you could help me with the function now, as we have spent a lot of time understanding the details, but no advancements have been done on my part in completing this task, I hope you understand.

Kind Regards,
A student.

0

Student_101Author Commented:

Just to reconfirm, the relationship occurs keeping the X-axis as the original data, and the Y-axis as the Isotope data.

OK, I do hope that although they are scientists, they actually do their analysis correctly. I will not question any more but I will say that it is never wise to assume that because someone has been doing something for a while, or because they label themselves as scientists, that they have much of an idea about analysing their data. Still I won't go down that path any more.

I'll see what I can do.

Patrick

0

Student_101Author Commented:

I appreciate your help,

but findings they report are relevant to the ones that are found by several ministry of environments around the world, and I believe that assuming all of them are doing the analysis in a manner they don't know of will be incorrect.

As there is a linear relationship between the two variables it is valid to plot either of them against time to see how it changes. Doing this shows there to be a sinusoidal variation with a secondary sinusoidal overlay variation of a slightly longer frequency. See attached file.

I still stand by what I have said. I do not believe the data is being analysed correctly. There is an obvious long term downwards trend in the sinusoidal variations. With some effort a decent mathematician/statistician would be able to construct a formula that would have a good fit with the data. To do it on the current basis of 'cheese paring' is dubious to say the least.

I understand you are tasked with automating the analysis process but I believe the data analysis as currently carried out to be deeply flawed.

Ok Patrick, I thought we weren't going down that road again,

but I will speak to my supervisor today and get all the details, so I can explain this to you in a manner that you are satisfied that the analysis is correct, and what this analysis is done for.

Regards.

0

Student_101Author Commented:

Ok Patrick,

I got some info over the web to.

Here is a detailed explanation of the concept of the Keeling Plot, and again as I told you before, this is not a NEW concept, or a new way on analyzing data, but it has been practiced for several years around the globe.

In front of us however, are only 3 columns, and with the lack of detailed atmospheric scientific knowledge, as well as proper explanation NOT given to all the details (not needed here), we are assuming that the analysis being done world-wide is incorrect.

However, for your personal satisfaction, I am trying to get as much details so that you feel satisfied that this analysis being practiced around the world by MANY scientists, organizations, and ministry of environments is not "dubious"

Please see the attached link, and I really hope that instead of verifying something that has already been verifying, we actually get to the solution of this problem.

I do appreciate all your help and concern in this matter.

Kind Regards.

Please see the links below for a detailed explanation of the Keeling Plot

As said earlier, the ranges for which the RSQ is closest to one, gives us meaningful insight that helps these scientists study the composition of the isotopes, and how they are affected by the weather at those stages.

0

Student_101Author Commented:

another point to keep in mind is that we are using ALL THE DATA

so it's not that we just use the data that gives us a RSQ close to one, but we use the whole data set, and flexibly report the ranges that give us an RSQ close to 1.

Thank you for going to so much trouble to explain the background and purpose of the analysis. I have no doubt that it is very frustrating for you have to go to such lengths, however I do appreciate the effort you have made to answer my criticisms of the method of analysis.

Let me see what I can do on the VBA front. I am busy this afternoon. However if VK comes up with a solution earlier than I can get to it, do please go with that.

Patrick

0

Student_101Author Commented:

Hi Patrick,

I understand that part of you providing me the correct solution is that you ensure that what ever you are doing from your part is correct, and that's why you had all those concerns.

I will appreciate any help from your part, and your comments have also helped both us get a better understanding of this question.

Lets hope that we can get to a solution soon.

Take your time, and I really appreciate all your help :)

Kind Regards.
A Student

0

Student_101Author Commented:

VK,

ALL the details that any expert needs to know about this question are present here, as we have 56 comments on this question so far.

I really hope that either you or Patrick can help me get to the solution of this.

Waiting...

0

Student_101Author Commented:

I have manually analyzed the data and attached a sample output file.

If the problem is the implementation of an adequate algorithm basing upon the theory and the definition of your task then i believe that we will get ahead with it.

Im experienced in vb and could also help in excel topics. Furthermore im a passonate mathematician. But my knowledge of stochastics doesnt include correlation theory.

Could we narrow the whole stuff to the problem or am i forced to deal with the theory?

The problem arises because I cannot see a logical reason why you have not made a break at the highlighted figures. The R^2 figure drops to 0.928672566 from 0.984763785 so it seems the most logical place to start again. Do you not start again because it is before 6 rows of data have been included? Do please explain.

Patrick

0

Student_101Author Commented:

Hi Patrick,

Yes I believe you are right, it is only logical to break at that point, and start the function again.

I was manually doing this, so I guess it skipped my understanding.

0

Student_101Author Commented:

VK,

You absolutely don't need to even think about the theory part.

All the formula's needed are present, and all we need to do is the make the loop such that every time we get a RSQ close to 1, we report the range, and the other things I requested for, and then move along with it.

Patrick has already provided all the info in terms of the formulas we need. We just need to work on the VBA front to get this done soon now.

I have a deadline before the end of this week to get this function completed, and I really hope that you experts can help me reach that deadline.

All your help is much appreciated.
Regards,

Student

0

Student_101Author Commented:

Sorry Patrick,

to get back to that question again, we need to analyze at least 6 rows to data before we report the Range.

So that's why we didn't choose the 0.984763785 value.

I think we can overcome that barrier by using a minimum of 6 rows to data.

0

Student_101Author Commented:

VK,

we don't need any theory from the web.

The algorithm has been extensively defined in the comments above.

We just need to get it done.

0

Student_101Author Commented:

VK,

Just to add to my previous comment, I think you would be able to perfectly understand how this function needs to be if you read the comments between me and Patrick.

The function needs to look at a minimum of 6-15 points before it reports the range of the best RSQ value.

and I will give you very concrete examples. I apologize for the confusion above.

Let me verify this macro and see the results and get back to you right away.

Thanks

0

Student_101Author Commented:

In that case,

Before you come back I will keep a very specific and concrete example ready for you.

Thank you very much Patrick !

0

Student_101Author Commented:

Latest update,

I showed the detailed Excel file which I posted earlier, that had the RSQ values for the first 2 years, and we came to the conclusion that automatic the process will be very difficult, because we can't define concrete ranges for the function to look it, in terms of specifiying how many minimum/maximum points it should read.

What I am trying to do now, is from my form, make it manually as user-friendly as I can, to assist the user in finding the best RSQ ranges.

When the user selects any rows from the grid, it will automatically calculate the RSQ value and the result will be displayed in the text box, or the user can select the start/ending dates from the combo boxes and get the RSQ values that way.

I asked a new question as to what Charting option will best display this in VB6, that when the user selects these rows, it is as user friendly for him as possible.

I would like to thank Patrick specially for all your help. It really increased my understanding on this topic.

VK, thank you for your support as well.

I will be closing this question now, and more points go 2 Patrick because he has remained very active throughout the process of leading me to this conclusion.

Regards,

A humble student.

0

Student_101Author Commented:

Patrick,

were to able to get any sort of output from the macro that you gave me?

If yes, can you please send me that code as well the text file so I can see the results.

Regards.

0

Student_101Author Commented:

Patrick, can we please continue what I asked you after accepting the solution in this question

You have rated my contribution worthy of a B grade, which is fair enough if you consider it of that level of value. I understand that decision, so I must conclude that my input is of limited value. I am therefore somewhat disinclined to continue with this project.

I also understand your frustration with the slow and tortuous progress that I made of the whole project. Perhaps that is why you have rated my input worthy of a B grade. It's rare that I make such a meal of it, but it is equally rare for me to contribute to a project that at first sight made me wonder whether I was doing the right thing in automating something that with my limited knowledge on the subject made me doubt the wisdom of the approach. You were very persuasive on the subject - so I have persevered.

Patrick

0

Student_101Author Commented:

>>So far I have yet to receive a response to my comments in your last question. Perhaps you would attend to that first.

Let me be blunt. I provided you with the formulae and a macro for your last question and you closed it early and awarded a B grade. Why should I be bothered to assist here if you award a low grade, few points and even worse points for a contribution made by VK that didn't even contribute to the solution. In fact you even said "You absolutely don't need to even think about the theory part." Despite that you assigned an 'Assist' to his contribution. I cannot see the logic in your decision. If VK had actually contributed to the solution you would not have heard a comment from me about him receiving an 'Assist'.<<

Hi Patrick,

The B grade issue really surprises me. The formula's that you sent me were fine, but the whole point of this question was to be able to get that function working, which did not happen. The macro had no results or output, and even the file you sent me yesterday just gave all the corresponding values from the dates, and did not satisfy the purpose of the function.

Yes, after all that we went through in terms of verification of this process, and after much feed back from you, the B grade wasn't enough, but that grade was not a representation of your efforts, rather a representation of what we were left with after a week of discussions.

I closed this question early so that I could give you the points for my 2nd open question as well, as that question would've been deleted yesterday if I didn't do so. And because you had already given me the macro, which was incomplete but I know that it could've almost been finished by you without much effort, therefore I decided to accept your solution with 440 points, and give you the remaining 500 points from the other question as well.

I thank you for all your help, and I apologize for the B grade.

You did more then any else in helping me through this, and I will close with this now.

Take care.
Thanks for your help.

A Student

P.S.
I will be deleting that question, and I will accept your solution as well, even though the macro was not complete, as I will be leaving from EE.

Regards.

0

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

>I think we can overcome that barrier by using a minimum of 6 rows to data.

I'm sorry but that is too vague. The 6 row minimum is OK but the R^2 value threshhold is crucial. It can't be on the bsis of "Oh well, I decided it wasn't good enough at that stage". It needs to follow strict rules - and that's what I need from you.

The macro so far is really quite simple to implement (if I've got it right) - it's something like that below...

But please come back with concrete rules for R^2 values thrshholds.

Patrick

Open in new window