Solved

# Least Squares optimization in excel

Posted on 2014-10-20
211 Views
I would like to perform a least squares optimization in excel using solver.
I tried the following set-up:
- Given is a vector of original exposure across a range of seven nodes (C8:I2)
- The aim is to replicate this exposure at each point as close as possible from a set of 5 instruments. The contribution of each instrument on any of the 7 nodes is respresented in the matrix C11:I15
- I try to solve for the number of units of each instrument (K11:K15) such that the sum of the squared differences (cell C24) is minimized , using solver

The issue is that the solution returned by solver is fairly poor in terms of fit/node (C20:I20). As a check, I have used Excel's linest functions, and the fit appears much better (S9:S14, which indicate a much better fit per node than the solver-based solution as displayed in row B20-H20).

Any idea as to how I may improve the set-up of the Lsq-optimization using excel would be appreciated
0
Question by:Michael Hamacher
• 7
• 6

LVL 25

Expert Comment

ID: 40393528
How about an Excel file of the data?  There is much needed to understand.

C8:I2 is a matrix, not a vector .... ????

There are "nodes", "points", "instruments", etc.

I've built a trial spreadsheet but don't understand what the values in rows 2-8 represent.  Measurements over time? or?......
0

Author Comment

ID: 40393543
Hi

pls find a sample spreadsheet attached

Thank you
Lsq-concept-v2.xlsx
0

LVL 25

Expert Comment

ID: 40395524
OK.  Thank you! I'm looking at it now....
0

LVL 25

Expert Comment

ID: 40395895
OK.  I worked on it some and got what appear to be good results.

I couldn't tell what constraints you put on the variables so I bounded them between -1,000,000,000 and 1,000,000,000.
This is necessary for Solver to run.

I ran one case as you did with the same weight applied for each Instrument.  This gives the same results as the straight line fit.

I ran another case using different weights for each instrument for each node.  It's overdetermined so it gets zero error at each node and the weights can have any number of outcomes.  So perhaps not too useful.

I ran another case which may be realistic for you by using the first case and applying a node weight.  That looks pretty good.  So, the instruments are all weighted the same but then they are weighted according to the node - so they are *relatively* weighed the same.
0

Author Comment

ID: 40396082
Thanks vm, well appreciated. Can you pls share you spreadsheet solution?

Cheers!
0

LVL 25

Expert Comment

ID: 40397628
Sorry, I'm sure that I had included it.  I don't know what happened.
0

Author Comment

ID: 40398422
First of all, thank you very much indeed for undertaking the effort to construct and post your own spreadsheet solution, appreciated really.

I went through the sheet and several questions/issues emerged (it almost appears that some information was 'lost' in the process of attaching the sheet/re-opening it on my end)

(1)      I am unable to replicate your results even just for the base case (equal weights for nodes and instruments). It appears that my original formulae have been retained, therefore it would be very helpful to see what caused the difference. Would you be able to  share how you set the solver up?

(2)      I am looking at variant 3, i.e. any one instrument retains the same weight across nodes (as this is a necessary condition), but each node may be assigned a different weighting.
-      Where exactly are the node weights specified? I do noticed you have a row (64) ‘Node Weight’, but these numbers are in the order of magnitude of millions? (for weighted least squares, my understanding is that the weights are specified in the format of 1/(y^2). Moreover, the instrument weights appear to differ across each node, at least when I open the file (e.g. C59:C63 = weights for 1st node shows different weights for Instruments 1-5 than E59:E63 = weights for 2nd node, and the same applies for all consecutive nodes)
-      How do you set up the solver in this case please? When I open your spreadsheet & the solver set up, the solver input mask (I have attached the screenshot for your reference) shows as objective to minimize cell O59 (which appears to ), and by minimizing any of the instrument weights in each node (plus the node weights), i.e. C59:C64, E59:E64 etc), in contrast to the idea that the instrument weights are to be kept constant
Would it be possible to maybe highlight the decision variable cells, and the optimization target, and explain which solver parameters you use?

3)      It appears that you use the ‘Evolutionary Solver’ Method. Is there any particular reason why? It seems that this particular method takes longer than the other two (GRG nonlinear & simplex LP)

Thank you very much!
SolverSet-Up.png
0

LVL 25

Expert Comment

ID: 40398582
(1)      I am unable to replicate your results even just for the base case (equal weights for nodes and instruments). It appears that my original formulae have been retained, therefore it would be very helpful to see what caused the difference. Would you be able to  share how you set the solver up?
The first case used I11:I15 as the variable cells.
Then each of those cells was set with constraints:
>=-1,000,000,000 and <=1,000,000,000
0

LVL 25

Expert Comment

ID: 40398594
(2)
I am looking at variant 3, i.e. any one instrument retains the same weight across nodes (as this is a necessary condition), but each node may be assigned a different weighting.
Correct.

Where exactly are the node weights specified? I do noticed you have a row (64) ‘Node Weight’, but these numbers are in the order of magnitude of millions? (for weighted least squares, my understanding is that the weights are specified in the format of 1/(y^2).
They are specified in Row 64.  This really has nothing to do with least squares, it's just a scaling that goes along with each node.  Then these node weights are also "variables" in the Solver and are given a wide range (actually the same range as the other variables just because I didn't know any better).

I believe that the variables were the parameters in P59:P63 and the node weights in row 64.  This would keep the relative instrument weights the same as scaled by the node weights.  Somehow the cell expressions have been lost so I'll have to redo it.

So, each Instrument/Node weight was the (node weight) X P59:P63 respectively.

More to come....
0

Author Comment

ID: 40399115
Thanks a lot! and yes, if you can repost it with the re-instated cell expressions that would be much appreciated indeed (I tried fooling around with it myself, but don't quite trust the results).
One more q: I noticed you added Sqrt(Sum(Sq)): do you minimize this expression (instead of Sum(SqD)?
Lastly, It seems that GLG nonlinear seems to be sufficient (and faster than the evolutionary approach)....
0

LVL 25

Accepted Solution

Fred Marshall earned 500 total points
ID: 40400048
OK.  Now that I've had some coffee...
I Saved each Solver case in column R.  This is better and less work than pictures.  All you have to do is open Solver and Load from the cells below the "Solver Case X" title and down until it's empty.  e.g. R6:R11 and so on....  Then you can see and examine how Solver was set up and you can run it.

Case 1 is the original case and the results are the same as those you had originally using the straight line fit.

Case 2 uses separate weights for each instrument/node pair.  As I mentioned before, this is an overdetermined case and there will be many solutions.  You can always get zero error doing it this way but it's not very satisfying or valid.  For example, you could use all zero instrument weights except for one at each node at least for five of the instruments.

Case 3 uses the same weights for each instrument (shown in column P).  And, it adds a weight for each node (shown in row 64). This time, I limited the node weights to be positive so the instrument weights would always "look the same" (sign) relative to the column P numbers.
Case 3 could be solved in many ways also but not pathologically as in Case 2.  For example, divide the node weights by 2 and multiply the instrument weights in column P by 2 and the node/instrument weights will remain the same.
The node instrument weights are all:

[instrument weight Px]*[node weight x64]*1,000,000.

One could just as easily multiply the values in col P by 1,000,000 and leave them at that.

The limits for the variables in Solver are simply big numbers in order to make sure they aren't actually limiting the solution.  They are there just to satisfy the rules for Solver variables.  Well, except for the node weights which I did want to keep >=0 as I mention above.

Case 3 gives you the opportunity to play around with the values.
For example, you can take the weights from Case 1 and put them in column P59:P63.
Then set the node weights all to 1.0.
The result will be the same as Case 1.  A good check.
Then you can run the Case 3 Solver to get the result that's shown here.

I also set the node weight in column C to be 1.0 and not a variable.  So, you might say it's the "reference" node.  The other node weights and all of the instrument weights then change to accommodate this.

Yes, you can use GRG Nonlinear and it is faster.  My Solver was already set to Evolutionary because I've been running some large, nonlinear solutions lately.
0

Author Comment

ID: 40400824
Thanks a lot, that's perfect really.

Fyi: I do have one follow up question which I however decided to post as a separate question as I feel you have already dedicated so much effort to this one (ID: 28543341, Weighted least Squares Excel). Feel free to have a look (after another coffee that is)....
0

Author Closing Comment

ID: 40400826
Very comprehensive solution; re-designed the spreadsheet and included additional case which was very insightful
0

## Featured Post

A theme is a collection of property settings that allow you to define the look of pages and controls, and then apply the look consistently across pages in an application. Themes can be made up of a set of elements: skins, style sheets, images, and o…
How to Win a Jar of Candy Corn: A Scientific Approach! I love mathematics. If you love mathematics also, you may enjoy this tip on how to use math to win your own jar of candy corn and to impress your friends. As I said, I love math, but I gu…
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…
Excel styles will make formatting consistent and let you apply and change formatting faster. In this tutorial, you'll learn how to use Excel's built-in styles, how to modify styles, and how to create your own. You'll also learn how to use your custo…