Weighted least Squares Excel

This question is a follow-up question on my earlier one regarding LSq optimization in Excel.

Essentially, I have set up the following LSq in excel (thanks again to fmarshal for his insightful help on this)
 - Given is a vector of original exposure across a range of seven nodes (B8:H9)
 - The aim is to replicate this exposure at each point as close as possible from a set of 5 instruments. The contribution of each instrument on any of the 7 nodes is represented in the matrix B11:HI15
 - I then solve for the number of units of each instrument (I11:I15) such that the sum of the squared differences (cell C24) is minimized , using solver (GRG Nonlinear, using constraints of =<1,000,000,000 & > -1,000,000,0000 for the decision variable cells)

Now this is essentially a set-up that assigns equal weightings (importance) to each node.

I now would like to assign relative weightings to the respective nodes, with the aim that the optimization assigns greater importance to find a good fit to say one specific node relative to others (in effect create sort of a 'hierarchy' among the nodes)

In the attached spreadsheet, I have added such exemplary node-weights in row 9; however, I don't quite know where/how to integrate them into the optimization. Any feedback would be most appreciated.
Michael HamacherAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Fred MarshallPrincipalCommented:
It really helps to understand the physics of the situation.  If we knew what the "instruments" are, if we knew what a "node" was, if we knew what "exposure" is, then it might be easier to suggest possible approaches to the mathematics.
Fred MarshallPrincipalCommented:
I think the quick answer is that the node weights should be variables for Solver and you see what you get.  I suggest they be constrained to be >=0.
As "node weights" they would apply individually to the node to which they are assigned.

As "weights", they would multiply the measurements and/or the instrument weights (same thing).

Does that make sense?

Here's the deal:

Each measurement will be multiplied by:
- the Instrument weight (which are equal across nodes)
- the node weight (which is unique to each node but constant across the Instruments).

The end result is a set of Instrument/Node weights that are Instrument and Node dependent and, yet, can all be different.

With 5 Instruments and 7 Nodes, you could have this:

Instrument Weights:
1: 1
2: 2
3: 3
4: 4
5: 5

Node Weights:
1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
7: 7

So, the overall Instrument/Node weights for this case would be:
1.1 = 1
1.2 = 2
1.3 = 3
1.4 = 4
1.5 = 5
1.6 = 6
1.7 = 7
2.1 = 2
2.2 = 4
5.7 = 35
etc.  each of these resulting weights is the product of the Instrument weight and the Node weight.  
The result is that the Instrument/Node weights are possibly different but not independent because the Instrument parts are all the same across the nodes.
Michael HamacherAuthor Commented:

In the present instance solving for the node weights is not really viable for the following reason: in the actual, real-life setting the only decision variable (which I have control over & need to solver for) are the instrument weights = units of instruments (cells I11:I15).
The node weights by contrast are variables outside of my control, they are to reflect probabilities assigned to each node (instead of 'node', one could also use the term 'state'). E.g. in real life, the 0% node represents the most probable state, the +/-0.5% nodes as 2nd most likely etc.
As such, I would like to use the node weights to 'steer' the optimization to yield better fits on the more center nodes (relative to the fits at the peripheral nodes, e.g. +/-2%). Correspondingly, these node weights are to be externally specified; and I wonder how this is best to be done (i.e. in what form - I read somewhere that weightings in LSq take the form of 1/w^2 but am not 100pc sure -, and where/how they are to be integrated into the optimization.

PMI ACP® Project Management

Prepare for the PMI Agile Certified Practitioner (PMI-ACP)® exam, which formally recognizes your knowledge of agile principles and your skill with agile techniques.

Fred MarshallPrincipalCommented:
OK.  It was worth a try....
I never did understand the purpose of those % figures....  What are they?

If you really want to bias the result towards the center of the nodes then why not just do that?
I would then likely weigh the node contributions to the error accordingly so that the middle nodes have more influence on the outcome.

Is that what you need?

In that case, when you compute the error, weigh the node errors by (for example) 1.0 in the middle and with smaller numbers at the edges and tapered from middle to edge.  Then the errors in the middle will affect the outcome more.

Here's another with the 4th case weighting the errors.  There are lots of ways to do this (linear, squared, to higher powers, etc.)  it's up to you.
Michael HamacherAuthor Commented:
Thank you very much for supplying yet another example. To address your question regarding the % numbers: they represent effectively different states of an underlying asset (the idea of the optimization is to neutralize the risk of this asset under different state scenarios with a given portfolio of hedging instruments).

Just one question regarding your solution: I noticed that you included the node weights as decision variable cells. How does this ensure that the optimization retains the intended hierarchy of weightings (e.g. largest weight to 0% node etc), and doesn't just overwrite them?

Also, when I load the solver case from  R77:R96, the resulting optimization sets several node weights to zero...
Fred MarshallPrincipalCommented:
What I'd intended to do was this (I only changed the 4th case):
- node *error* weights are not supposed to be variables.  They show up in the calculation of the error for that node only.

It appears the file hadn't been updated completely.  Here's' another one.
Case 4 is the only change and the Load should work now.

Thanks for helping me understand the application a bit better.
It occurs to me that the least squares problem in a case like this is subject to a direct solution.
I'm a bit concerned that the matrix of Intrument/Node values could be ill-conditioned.  So, looking at it from a more classical perspective may lend some insight.  I'll take a look and respond later.
Michael HamacherAuthor Commented:
Thank you very much, the updated file seems to work. And yes, would be very grateful if you have any further insights as to how to improve the optimization/identify potential problems.
Fred MarshallPrincipalCommented:
From what I read at http://en.wikipedia.org/wiki/Least_squares#Weighted_least_squares, under Weighted Least Squares, which is what I understand you want, the idea is to weight each of the squared errors.
And, the weight to be used is:
"the reciprocal of the variance of the measurement".

The idea is that measurements with high variance should not affect the fit as much as measurements with low variance.
I'm just not sure at the moment how to apply this to your situation.
If you knew the squared error at each point (which I'm not sure you do) then you would divide the squared error by the variance at that point.  This then would bias the fit to be more responsive to the apparently more accurate numbers and less responsive to the apparently less accuract numbers.

I'm just not sure how to apply this notion to your problem set.

You should ask yourself whether this mathematical approach is particularly meaningful in your case.  
As long as you have the idea, perhaps you can do something similar or ad hoc or just use the math.
Fred MarshallPrincipalCommented:
I'm still pondering this.  Slow because I've not done the analytical approach for some long time.
I wonder about the coefficients you provide in the first place.
I don't get the % part.
The % parts don't add to 100% in either the rows or the columns. ???
This means that no instrument contributes much to any node?
This means that any node is not fully done with all the instruments?
And, the coefficients are antisymmetric 1 and 7 are the same but opposite and so forth.
Michael HamacherAuthor Commented:
Thanks. Regarding the %age numbers:
- the node weights were intended to add up to 100%, so that is just a user-error (thanks for pointing that out)
- the per unit exposure of any instrument is typically between 0.05%-50% (the effective exposure will be of course be this per-unit exposure X number of units)
- within a given node, the sum of the per-unit exposures of all 5 instrument indeed may not add up to 100% (or 1%, to stay within the order of magnitude). But I am wondering whether this really is relevant; as ultimately what matters is that the total exposure (sumproduct of exposure/unit x number units across all instruments within a node) to matches the original exposure at any given node (and these numbers are in the range of 50k - 1m). If it was to streamline the problem, one could scale up the original per-unit weights by a constant factor of course.
- by antisymmetric co-efficients, I believe you refer to the exposure of instrument 1 being the mirror-image of the exposure of instrument 7? This may be true but not necessarily, as it depends on the instruments chosen (so these profiles are somewhat discretionary)...

Just for reference: this sheet is to an except a conceptual one to get see if the optimization is a viable approach to the problem. In reality, the number of nodes may vary (include more nodes), and so may the number of instruments (albeit less, typically there will be 3-6 instruments)
Fred MarshallPrincipalCommented:
I don't know if this will help at all but I'm still trying to understand the underlying structure and choices.
Here is an Excel spreadsheet that goes as far as calculating the Variance-Covariance matrix and it's inverse using very simple Excel steps.  The only advanced tools here are "minverse" and the use of Excel arrays.

What I'm trying to understand better is the implication of doing these calculations in getting to a weighted least squares.  I usually approach these things from the bigger picture so that the mathematics isn't applied incorrectly.

I don't understand where the matric "A" comes from.  Normally this would be a set of select values from a set of basis functions.  For example, the basis functions could be x^0, x^1, x^2, .... let's say x^0:x^(m-1) so there are m variables.  Then, these are evaluated at select values of x: x1 ... xn so there are n points in x.

Then, if one knows the value of some independent variable y1 ... yn at each of those values of x ( commonly that's how those values of x were selected) we can write a system of linear equations:
with dimensions  nxm mx1  nx1

In this case we have A and y and want to find b' such that y' is close to y.
In other words, minimize:
 sum (y-y')^2

From this determination of the b', a new value of x can be used to estimate a new value of y.

I figure you know all this but I state it to help confirm clear communication.

Translating this into the present case,  
We have the A matrix of [I do not know what]
It appears that the 5 "instruments" are the basis set.
It appears that the 7 "nodes" are the selected points in x.
The 7 "known" node values are the values in y at those same points in x.

What I'm trying to do now is relate the weights calculated in Case 3 with the weights one might calculate use from the Variance-Covariance matrix.

One revealing perspective would be: "what's next?"

For example, once the values of b' are calculated then it would be typical to choose a new value of x and calculate y'.  And this would entail having a vector of new values such as are in A.  So this could be used for interpolation or extrapolation.  But here we don't have other values of x to introduce nor rules for how to generate that new vector.
Perhaps I'm confusing things more than helping.....

What I believe is correct to say is that the weights are used in calculating b'.  Once the b' are established then there's no more least squares involved - just linear equations.
What I did in Case 3 was to optimize the outcome using optimized weights for each node (each value of x).  This seems to be the same motivation in using the reciprocal of the variance as a weight.
Michael HamacherAuthor Commented:
Thanks for the detailed reply...for some reason, it appears that the spreadsheet didn't come through, would it be possible to post it?

Just to give you a better idea of the underlying issue at hand: the objective is to determine the portfolio of underlying assets that best hedge the risk exposure (more explicitly: vega exposure) of an underlying asset at different FX spot levels.
The FX spot levels are represented by the 7 nodes; the FX vega (= volatility risk) exposure of the underlying asset is given by the values in row B8:H8 (= original exposure).
To hedge this exposure of the underlying asset, I choose n (in this case 5) different hedging instruments, the vega exposure of which at a given node = spot step (per unit of instrument) is given in the matrix B11:H15. The hedging instruments differ in their vega profile across spot (hence more hedging instruments can be expected to improve the fit, although empirically the incremental benefit tends to decrease beyond 5 instruments....
It is generally more important to hedge the original exposure at 0% = current FX spot level (i.e. the 0% node), whereas the further away spot levels (nodes) are more unlikely to manifest. Hence the need for a weighted optimization, that allows for greater weights to be assigned to the centre nodes relative to the peripheral ones.
The objective is to find the combination of units/hedging instrument that gives the best fit for the original exposure...
Fred MarshallPrincipalCommented:
Ah!  OK, I should be able to understand that once I've paid some attention to it.
Working on a project right now....
Michael HamacherAuthor Commented:
Thanks...much appreciate any further insights on this...cheers
Fred MarshallPrincipalCommented:
Here is that spreadsheet.
The Matrix LSE sheet calculates the least squares solution getting the same results you had.
The Variance-Covariance sheet calculates just that (if it's all correct that is...)

I still need to spend some time with the description you sent.
Michael HamacherAuthor Commented:
Thank you very much, once again well appreciated.

Just to clarify: the calculations in tab 'Matrix LSE' contain the regular least square solution? I ask because it appears pretty much an analytical solution, without any optimization process.

Regarding the Variance-Covariance calculations: may I ask how I am to interpret the results/which results I am to look at?

If its ok, I shall leave this question open for now until you have had the time to go through my explanations...

Thank you!
Fred MarshallPrincipalCommented:
Yes, please leave it open as I intend to continue working on it.  But work keeps me on other things.

Well, I'm rather pioneering this myself.  I've seen it in the past and can read and do .. but I'm also dealing with interpretation and implementation.
As I get it, one would use the diagonal on the Variance-Covariance matrix as the weights.
What I don't get is that it seems to be independent of the known values.

More to come ....
Michael HamacherAuthor Commented:
Just mean to clarify: I understand that may still be working on this thread (and I would indeed appreciate any more solid approach to the problem), so I've still left it open...in case you wont get to it any more just let me know & I shall close it

Fred MarshallPrincipalCommented:
This has been a busy week... sorry for the delay
Fred MarshallPrincipalCommented:
Fred MarshallPrincipalCommented:
I think it would be easier for us to come together on this if the problem to be solved was put into linear algebra terms.  
While I sort of understand options trading, this approach is beyond what I've done.
So, what I'd be looking for would be cast in the following manner:

There is a relationship:
Ax=b where A is a matrix of coefficients, x is a set of variables and b is a set of constants.
Normally the issue is to find the values of x that best match the values of b.
AND the values of A are deterministic values, generally of a set of basis functions like x^n n=0...M.

Here is an example where the objective is to find an optimum set of values for the variables in the case of an approximation problem:

Desired vector =D
The approximant to D is F
The error in matching D is D-F
So, the objective is to minimize the error in some sense (one of which could be least squares or the L2 norm).
We have:
Ax=F and the objective is to find that vector x which minimizes D-F using some measure of the difference such as the sum of the differences squared.
error = Ax-D=F -D which ideally would be zero everywhere.
Minimize sum(F-D)^2
or more generally:
Minimize sum|(F-D)|^N where N can be 1, 2, .... infinity.
If it's 1 then it's the sum of the absolute value of the errors
If it's 2 the it's the sum of the squares of the absolute value of the errors
If it's infinite it's the sum of the peaks of the error in effect and the result is minimax: the peaks of the errors are minimize and the result has many outcomes of equal error value.
The higher the value of N, the more emphasis on peak errors.

The selection of what to do has much to do with the underlying problem and objectives.
That's what I don't understand very well.....
Might you be able to cast this problem in this context?

You already have a matrix A in B11:H15
You already have a vector b in B8:H8 I believe.
You have calculated a vector x'  which gives a least squares fit F to B8:H8.
That result is vector F.
And vector D, it appears, are the original exposure values b B8:H8.

Now you want to weight the values in F such that you weigh the results more heavily at E8, is that right?

So, the way to do that is to weigh the errors in some fashion.  I'm not sure how to advise on selecting the weights because you have already decide that E8 should have the heavier weight and, thus, the smaller relative error.  Right?
I don't think this is the same thing as if one were to use the variance values.

I would suggest just picking some reasonable weights and recalculating the values of x'.
That's because you already know what the weights might be within reason and can play around with a variety of schemes.

So, with no weights:

And with weights:
and the optimization criterion might be:
minimize sum(|F-D|)
minimize sum(|(F-D)|^2
minimize max(|F-D|) which is the same as minimize max (|F-D|)^infinity

Does this make any sense for you?
Michael HamacherAuthor Commented:
Thanks for the link. I am familiar with the page but haven't actually seen this specific function. It seems to have a different scope if I understand it correctly, in that it optimizes portfolio sensitivities v.s. hedging cost at a single spot level & in a non-weighted fashion - whereas I am looking to minimize the vega sensitivity across different spot levels (in a weighted fashion and irrespective of hedging cost)....Also (sadly), Matlab is not really an option...
Thanks for continuing to help really; there is no rush at all just meant to see whether I shall still leave it open.
Fred MarshallPrincipalCommented:
I think I'm starting to zero in on things here:
I found this: https://fp7.portals.mbs.ac.uk/Portals/59/docs/MC%20deliverables/WP22%20O%20Mahmoud%20research%20report3_VegaRisk.pdf

I also found

It occurs to me that there are two particular situations where least-squares or other optimization criteria are used:
- where the underlying information is perfect
- where measured information is noisy.
In both cases, one selects a *model* to represent the data and computes the optimum model *parameters*.

In the "perfect information" case, the objective is to optimize the selection of parameters to minimize some error measure between the perfect information and the perfectly-known model.

In the "noisy information" case, the objective is to optimize the selection of parameters to minimize some error measure between the noisy information and the perfectly-known model.

In the "perfect information" case, once the optimization is done, there's not much else to do.  
But, I hasten to add that even in these cases there are often application of weights or weight functions to the error.
For example, if I am designing an audio filter, I know what I want for filter frequency response perfectly.
If I select a model which is a sum of harmonically related sines and cosines well, that's just typical.
Then I need to optimize the model parameters, the coefficients, to approximate the perfect response.
Often, filters are designed in the minimax sense (minimize the maximum error).
If I were to use no weights then the specification would be that the stop band of the filter would have the same deviations as in the passband.  Yet, the stop band criteria are usually much more stringent.
So, in this case, I weigh the errors in the stop band much more heavily so that the deviations that result there are much smaller than I'm going to allow in the  pass band.

In the "noisy information" case, only the model is perfect .. well, perfect in the sense that I know what it is.
None of the models may be necessarily a good choice for the situation at hand.
A common model is a polynomial model:
ao*x^o + a1*x^1+ a2*x^2 + .........an*x^n
where I have to at least select the number of terms in the model and often fewer is better than more depending on the data.  For example, it makes little sense to model data lying on a straight line with a 4th order polynomial.
Anyway, in this case there is the notion of the deviation between the "perfect" model and the noisy data.  In that case, one is motivated to weigh the errors at each point in order to reduce the influence of data that's "worse" or noisier while increasing the influence of data that's less noisy or "better".
That's where the weights based on deviations is introduced.

My problem is that I don't know whether you are dealing with the noisy data case or with the perfect information case.
So, the notion of setting weights is a problem for me still.

I've attached a spreadsheet that assumes perfect information and introduces manually-determined weights under the assumption that I know something about how the errors might be weighted as a function of the independent variable (in this case, the nodes).  And, as you have done, I've optimized on the model coefficients.

If the situation is one of noisy data then there is some notion that the coefficients might be fixed and the optimization done on the weights according to deviation.  I've not done one of those yet really, although I started on it earlier.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Michael HamacherAuthor Commented:
Thanks once more for the elaborate answers. To address the following question of yours (as it strikes me as fundamental)

The selection of what to do has much to do with the underlying problem and objectives.
That's what I don't understand very well.....
Might you be able to cast this problem in this context?

The objective is to hedge the vega profile of a structured FX product with vanilla option hedges as closely as possible. Structured products cannot be cleared in the market, only so called  ‘vanilla’ options can, therefore the need to construct a combination of vanilla hedges that matches the vega profile of the structured product as closely as possible.
The vega profile is defined as the vega exposure at different FX spot levels (in this case: vector b).
Each row of matrix A represents the vega profile of each of the vanilla hedging instruments. These are knowns, and typically the vega of an option at a given spot level is expressed in terms of bps (% age fraction).  
For instance, instrument 3 may represent an ATM option, which has a vega of 0.40% per unit at the current (0%) spot step, but loses its vega exposure either side from current spot (i.e. the further away we move from 0%, the lesser the vega exposure of an ATM option)

Instrument 2 may represent a 25delta call option, which gives greater vega exposure at the nodes right of the center (the FX spot steps > 0%) compared to the ATM option, but lesser exposure at the nodes left of the center. It has its maximum vega of 0.40% at the +1.0% spot step, before it begins to taper off; on the other.

Instrument 4 may represent a 25delta put option, which confers a greater vega exposure to the left of the center (FX spot steps < 0%) compared to the ATM option.

Instruments 1 & 5 may represent lower delta calls & puts, respectively (although the sample numbers I provided are not entirely representative here for these two instances).

It is noteworthy that
-      I can buy or sell any one instrument/option (i.e. allowing for positive or negative notionals)
-      The vega exposure of each instrument is 100% independent of the vega exposure of any other instruments.

The objective is to determine the amount that we have to hold in each of the vanilla options (=hedging instruments), in order to best match the original structured vega profile. These ‘amounts’ (or notional amounts) represent the decision variables, represented by vector X.
The motivation for using weights arises from the fact that I generally care less about hedging the exposure at distant spot steps, compared to the more closely located ones.

The constraint is the hedge portfolio is supposed to be static; i.e. the notional amounts of the vanilla option hedges are to remain constant across each spot step (node)
In terms of perfect v.s. noisy data: for the purpose of this analysis, it should be fair to assume that the original input data (i.e. the vega exposure across different nodes of the structured payoff) has negligible noise, i.e. are non-random.

I hope that helps clarify the problem at hand...

Fred MarshallPrincipalCommented:
OK.  So then I would suggest you use Case B and put the weights in that you think best represent your notion that
" I generally care less about hedging the exposure at distant spot steps, compared to the more closely located ones."
I did that with my own notion of perhaps what that might mean.

Then, just using the optimized variables and *not the weights* will give you the results having used the weights.  So, I would expect those errors at distant spot steps to be larger in general since you didn't put so much weight on their errors in the optimization.
It's the unweighted result that you want to use in the end.  Well, at least in the types of applications that I have, that's how it's done.
Michael HamacherAuthor Commented:
Well noted, thank you for the dedicated help.

I would have a few follow-up questions

-  could you pls elaborate the comment you added to Case D of your last spreadsheet:
Case D appears to show that there is only two degrees of freedom because there are only three maximum errors that are equal.  With 5 variables there would be 6 of them.
- you tried to tackle the problem via a variance-covariance matrix...would you possibly have any further insights into this approach (I do not necessarily need to use LSq; it was the one approach I was reasonably familiar with but I am happy to try out other optimization approaches)
- Regarding the idea of optimizing the (node) weights): I am not per se against this (apologies if it came across as such), I just have problems understanding the practical implications of adjusting the node weights...if I can translate this into an improved hedge portfolio I am fully for it. Would you have an example of using the variance-rules to guide the specification of the weights?
- In the spreadsheet attached to your poste on the 28Oct (LeastSquares-Matrices) you calculate the least squares solution directly...would there be any benefits/advantages over the solver-generated solution?

Fred MarshallPrincipalCommented:
-  could you pls elaborate the comment you added to Case D of your last spreadsheet:
Case D appears to show that there is only two degrees of freedom because there are only three maximum errors that are equal.  With 5 variables there would be 6 of them.

If you do a minimax optimization / solution as I did in Case D then it works like this:
Let's say we want to match using a constant a0*x^0 model to some data in a minimax sense:
If you start with a0 equal to maxError - minError then the peak errors will be +/- (maxError-minError)/2 and of opposite sign.
There is one degree of freedom .. you are free to modify a0 .. and you get two peaks.

If you do a minimax optimization / solution using a0*x^0 + a1*x^1 then you are free to modify a0 and a1 so apparently 2 degrees of freedom and, similarly there will be three points where the magnitude of the error is maximum.
And so forth ....
It just happens that the signs of the maximum errors generally alternate.
There are some cases where there may be n+2 instead of n+1 peaks but that's simply an unusual case.

Also, let's suppose you're using a basis set that's not x^0, x^1, .... x^(n-1) but something else like cos(0*x) + cos(x) + cos(2x) + ... cos((n-1)x) which we'll call z0, z1, z2 .... z(n-1),  just as an example of something different, then, as always, there's a coefficient matrix which is structured according to the values of "x" that you've chosen and the values of cos(kx).
Now, if any of those columns in that matrix can be generated as a weighted sum of any of the other columns then you lose a degree of freedom.
For example:
1   2   3   3   5
2   3   4   6   6
3   7   3   9   6
4  9  2   12   7

1   2   3   3   5
2   3   4   6   7
3   7   3   9  10
4  9  2   12  11

In the first case, the 4th column is 3X the 2ndt column.  So they aren't independent and one can generate a family of solutions with identical results by weighting those two columns in a variety of ways as long as the sum of their values in x are the same.  Zero and "K" would be one of those.  "3K" and zero would be another.  and so on....  So this removes a degree of freedom because the coefficient of one column may as well be zero.

In the second case, not only is the 4th column 3X the 2ndt column, but the 5th column is the sum of the 2nd and 3rd columns.  So, another degree of freedom is lost.  The coefficient of the 5th column may as well be zero.

- you tried to tackle the problem via a variance-covariance matrix...would you possibly have any further insights into this approach (I do not necessarily need to use LSq; it was the one approach I was reasonably familiar with but I am happy to try out other optimization approaches)
This seems appropriate for the "noisy data" case where the matrix is the noisy data.  Wikipedia treats this case pretty well at:
I guess they just assume that the data points are equally spaced in x.  This would be the case if x is time and the data is regularly sampled in time - a common thing to do.  But I see no requirement for this in the context of this discussion.

- Regarding the idea of optimizing the (node) weights): I am not per se against this (apologies if it came across as such), I just have problems understanding the practical implications of adjusting the node weights...if I can translate this into an improved hedge portfolio I am fully for it. Would you have an example of using the variance-rules to guide the specification of the weights?
See: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CB4QFjAA&url=http%3A%2F%2Fstat.psu.edu%2F~rho%2Fstat462%2FApril16.doc&ei=U9ZXVL6iMIb1iQKNloC4Ag&usg=AFQjCNFdlyVByUy-jTL812mjZciCKLXYQQ&bvm=bv.78677474,d.cGE
I really get hung up on the idea of a variance at a single point in x.  That implies that there are multiple data points for each value of x.  IF that's the case then it makes sense.  If that's not the case then it bothers me.
It appears that using weighted least squares is as much an art as a science except where 1/sigma^2 is the optimum weight... and that appears to require multiple points.

Applied in your case, as I understand it, you would compute the variance of the values in the matrix at each node - that is, across the instruments at each node.  I've done that in the attached spreadsheet in Sheet:Optimizations Case B: Weighted Solver Solution with manually-entered weights.  
And the "manually entered weights" come from Sheet:Cases in Q65:Q71.
I arrived at these weights taking
1/sigma^2 = 1/var
Then I normalized them to add up to 1.0 in order to compare with the cases with no weights (or all weights=1.0).
It should not matter how the weights are scaled as they should be relative.
In this case, the unweighted sum squared errors is higher than the unweighted case.
I have *no idea* if this makes sense but it seems to be mathematically in line with what others describe.  I'm pretty sure I'd *not* do it if I had confidence in the matrix values.
As I said, some of these weighting approaches are as much art as science in some sense....

- In the spreadsheet attached to your poste on the 28Oct (LeastSquares-Matrices) you calculate the least squares solution directly...would there be any benefits/advantages over the solver-generated solution?
Perhaps.  I rather like the Solver and I'd guess it's a lot easier to set up.  But, if compute time is critical then you could try both and see.

I hope this helps.
Fred MarshallPrincipalCommented:
It may be that the weights should be applied as W*E^2 instead of (W*E)^2 to be mathematically correct on some sense.
But, above all, whatever weighting method is used, it should be well-justified and thought through.
I see nothing wrong with manually setting weights based on something you know or want.
Michael HamacherAuthor Commented:
Thanks for reverting. To recap your explanation regarding the relationship between number of max errors and degree of freedom: in the present example you would have expected 6 identical max errors (as number of max errors = n +1, and in the present case we have 5 coefficients/decision variables). If I understand you correctly, this implies that the information content of the instruments chosen is in some sense 'overlapping'/redundant/non-orthogonal, in the sense that one or more may be represented as linear combinations of others....
Accordingly, would it be fair to assume that in this case we can expect to achieve the same fit with lesser instruments?

Also: I have just posted the last & final chapter in this optimization thread...if I could ask your expertise on this one last time that would be great....
Fred MarshallPrincipalCommented:
Yes.  It would appear that this could be the case.  But I had the impression that the numbers in the matrix weren't real.  ?
Michael HamacherAuthor Commented:
They are pretty close to real ;-)

btw my last question on this topic is labelled 'LeastSquaresOptimization-ImprovingEfficiency'...just in case Sir, would be great to have your input.

Fred MarshallPrincipalCommented:
Michael:  Thanks for the points and thanks for the opportunity to learn a few new things in the process.  This application is just a little outside my own application space.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Math / Science

From novice to tech pro — start learning today.