# Least Squares optimization in excel

I would like to perform a least squares optimization in excel using solver.
I tried the following set-up:
- Given is a vector of original exposure across a range of seven nodes (C8:I2)
- The aim is to replicate this exposure at each point as close as possible from a set of 5 instruments. The contribution of each instrument on any of the 7 nodes is respresented in the matrix C11:I15
- I try to solve for the number of units of each instrument (K11:K15) such that the sum of the squared differences (cell C24) is minimized , using solver

The issue is that the solution returned by solver is fairly poor in terms of fit/node (C20:I20). As a check, I have used Excel's linest functions, and the fit appears much better (S9:S14, which indicate a much better fit per node than the solver-based solution as displayed in row B20-H20).

Any idea as to how I may improve the set-up of the Lsq-optimization using excel would be appreciated
###### Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

PrincipalCommented:
How about an Excel file of the data?  There is much needed to understand.

C8:I2 is a matrix, not a vector .... ????

There are "nodes", "points", "instruments", etc.

I've built a trial spreadsheet but don't understand what the values in rows 2-8 represent.  Measurements over time? or?......
0
Author Commented:
Hi

pls find a sample spreadsheet attached

Thank you
Lsq-concept-v2.xlsx
0
PrincipalCommented:
OK.  Thank you! I'm looking at it now....
0
PrincipalCommented:
OK.  I worked on it some and got what appear to be good results.

I couldn't tell what constraints you put on the variables so I bounded them between -1,000,000,000 and 1,000,000,000.
This is necessary for Solver to run.

I ran one case as you did with the same weight applied for each Instrument.  This gives the same results as the straight line fit.

I ran another case using different weights for each instrument for each node.  It's overdetermined so it gets zero error at each node and the weights can have any number of outcomes.  So perhaps not too useful.

I ran another case which may be realistic for you by using the first case and applying a node weight.  That looks pretty good.  So, the instruments are all weighted the same but then they are weighted according to the node - so they are *relatively* weighed the same.
0
Author Commented:
Thanks vm, well appreciated. Can you pls share you spreadsheet solution?

Cheers!
0
PrincipalCommented:
Sorry, I'm sure that I had included it.  I don't know what happened.
0
Author Commented:
First of all, thank you very much indeed for undertaking the effort to construct and post your own spreadsheet solution, appreciated really.

I went through the sheet and several questions/issues emerged (it almost appears that some information was 'lost' in the process of attaching the sheet/re-opening it on my end)

(1)      I am unable to replicate your results even just for the base case (equal weights for nodes and instruments). It appears that my original formulae have been retained, therefore it would be very helpful to see what caused the difference. Would you be able to  share how you set the solver up?

(2)      I am looking at variant 3, i.e. any one instrument retains the same weight across nodes (as this is a necessary condition), but each node may be assigned a different weighting.
-      Where exactly are the node weights specified? I do noticed you have a row (64) ‘Node Weight’, but these numbers are in the order of magnitude of millions? (for weighted least squares, my understanding is that the weights are specified in the format of 1/(y^2). Moreover, the instrument weights appear to differ across each node, at least when I open the file (e.g. C59:C63 = weights for 1st node shows different weights for Instruments 1-5 than E59:E63 = weights for 2nd node, and the same applies for all consecutive nodes)
-      How do you set up the solver in this case please? When I open your spreadsheet & the solver set up, the solver input mask (I have attached the screenshot for your reference) shows as objective to minimize cell O59 (which appears to ), and by minimizing any of the instrument weights in each node (plus the node weights), i.e. C59:C64, E59:E64 etc), in contrast to the idea that the instrument weights are to be kept constant
Would it be possible to maybe highlight the decision variable cells, and the optimization target, and explain which solver parameters you use?

3)      It appears that you use the ‘Evolutionary Solver’ Method. Is there any particular reason why? It seems that this particular method takes longer than the other two (GRG nonlinear & simplex LP)

Thank you very much!
SolverSet-Up.png
0
PrincipalCommented:
(1)      I am unable to replicate your results even just for the base case (equal weights for nodes and instruments). It appears that my original formulae have been retained, therefore it would be very helpful to see what caused the difference. Would you be able to  share how you set the solver up?
The first case used I11:I15 as the variable cells.
Then each of those cells was set with constraints:
>=-1,000,000,000 and <=1,000,000,000
0
PrincipalCommented:
(2)
I am looking at variant 3, i.e. any one instrument retains the same weight across nodes (as this is a necessary condition), but each node may be assigned a different weighting.
Correct.

Where exactly are the node weights specified? I do noticed you have a row (64) ‘Node Weight’, but these numbers are in the order of magnitude of millions? (for weighted least squares, my understanding is that the weights are specified in the format of 1/(y^2).
They are specified in Row 64.  This really has nothing to do with least squares, it's just a scaling that goes along with each node.  Then these node weights are also "variables" in the Solver and are given a wide range (actually the same range as the other variables just because I didn't know any better).

I believe that the variables were the parameters in P59:P63 and the node weights in row 64.  This would keep the relative instrument weights the same as scaled by the node weights.  Somehow the cell expressions have been lost so I'll have to redo it.

So, each Instrument/Node weight was the (node weight) X P59:P63 respectively.

More to come....
0
Author Commented:
Thanks a lot! and yes, if you can repost it with the re-instated cell expressions that would be much appreciated indeed (I tried fooling around with it myself, but don't quite trust the results).
One more q: I noticed you added Sqrt(Sum(Sq)): do you minimize this expression (instead of Sum(SqD)?
Lastly, It seems that GLG nonlinear seems to be sufficient (and faster than the evolutionary approach)....
0
PrincipalCommented:
OK.  Now that I've had some coffee...
I Saved each Solver case in column R.  This is better and less work than pictures.  All you have to do is open Solver and Load from the cells below the "Solver Case X" title and down until it's empty.  e.g. R6:R11 and so on....  Then you can see and examine how Solver was set up and you can run it.

Case 1 is the original case and the results are the same as those you had originally using the straight line fit.

Case 2 uses separate weights for each instrument/node pair.  As I mentioned before, this is an overdetermined case and there will be many solutions.  You can always get zero error doing it this way but it's not very satisfying or valid.  For example, you could use all zero instrument weights except for one at each node at least for five of the instruments.

Case 3 uses the same weights for each instrument (shown in column P).  And, it adds a weight for each node (shown in row 64). This time, I limited the node weights to be positive so the instrument weights would always "look the same" (sign) relative to the column P numbers.
Case 3 could be solved in many ways also but not pathologically as in Case 2.  For example, divide the node weights by 2 and multiply the instrument weights in column P by 2 and the node/instrument weights will remain the same.
The node instrument weights are all:

[instrument weight Px]*[node weight x64]*1,000,000.

One could just as easily multiply the values in col P by 1,000,000 and leave them at that.

The limits for the variables in Solver are simply big numbers in order to make sure they aren't actually limiting the solution.  They are there just to satisfy the rules for Solver variables.  Well, except for the node weights which I did want to keep >=0 as I mention above.

Case 3 gives you the opportunity to play around with the values.
For example, you can take the weights from Case 1 and put them in column P59:P63.
Then set the node weights all to 1.0.
The result will be the same as Case 1.  A good check.
Then you can run the Case 3 Solver to get the result that's shown here.

I also set the node weight in column C to be 1.0 and not a variable.  So, you might say it's the "reference" node.  The other node weights and all of the instrument weights then change to accommodate this.

Yes, you can use GRG Nonlinear and it is faster.  My Solver was already set to Evolutionary because I've been running some large, nonlinear solutions lately.
0

Experts Exchange Solution brought to you by

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Author Commented:
Thanks a lot, that's perfect really.

Fyi: I do have one follow up question which I however decided to post as a separate question as I feel you have already dedicated so much effort to this one (ID: 28543341, Weighted least Squares Excel). Feel free to have a look (after another coffee that is)....
0
Author Commented:
Very comprehensive solution; re-designed the spreadsheet and included additional case which was very insightful
0
###### It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Development

From novice to tech pro — start learning today.