asked on

LeastSquaresOptimization-ImprovingEfficiency

This question represents the third (& final) installment of my two earlier ones regarding least squares optimization, which centered on constructing an optimized vega hedge portfolio. In this last & final step, I am trying to introduce another risk sensitivity, so called rega, in addition to vega, which effectively means that I am adding quantities to the optimization/

One of the challenges is that the rega is in different order of magnitude really (vega risk is typically reported 5 – 10 x as large as rega risk). I have tried to address this by scaling the weights for the rega risk up so that it matches - in order of magnitude - approximately the vega risk (specifically, I scale it by the square root of the maximum vega/maximum rega).
The attached spreadsheet illustrates the set-up; I have tried a variety of different set-ups in order to tackle the problem, but none seems to deliver a satisfactory fit really:

Case 1: just optimizes for vega only; as the example shows the resulting hedge portfolio would imply a substantial net rega exposure

Case 2: optimizes for vega & rega, with both exposures equally weighted. There is still a significant net rega exposure (215k – 324k) across all spot steps, and the fit for vega is relatively poor locally (e.g. 323k at +2%)

Case 3: Weighted <> scales up the rega weights. The net rega exposure seems to be better now, yet the vega fit is extremely poor (difference of 800k at +1.0% step).

Case 4: same set-up as Case 3, but optimized for max error; fit is equally poor.

Case 5: I added a 6th hedging instrument & optimized for vega & rega in a weighted fashion. Rega fit looks better now but vega still relatively poor (particularly at +1.0% & +2.0% nodes)

Case 6: as above but using the max error as objective function. Vega fit at +1.0% & +2.0% nodes has been improved but still rather poor.

Case 7: here, I used a different approach; instead of minimizing the exposure at the respective nodes, I tried to minimize the vega & rega exposure at the 0% nodes only, as well as the changes in the exposure between any two nodes. Vega fit is very poor here (especially at +1.% node, which shows a deviation of 599k)

Overall, case 5 & 6 (i.e. which include a 6th instrument) seem to yield the most promising result, but they still seem far from satisfactory. I am therefore wondering whether there may be any suggestion as to further improve the fit; either by
(1)      Calibrating the weightings
(2)      Changing the objective function
(3)      Changing the optimization routine (fmarshal hinted in another post that extending the optimization to the weights may improve the fit)/

The main additional challenge seems to risk units for rega & sega would differ from the vega units (e.g. rega & sega numbers would typically be in the order of 1000-50000, so some sort of normalization would have to be applied which I figured may be done by using multiplicators to scale the numbers up to the same order of magnitude as the vega numbers. Alternatively, I would weigh the errors for rega/sega much more heavily).

Cheers

hypercube

Yes, I did foolishly? try optimizing the weights but quickly realized that wasn't a useful approach as the Solver would set all the weights to zero in order to minimize the error - no matter the objective. :-)

We're going to want to see those cases......

That is, the file isn't attached

Michael Hamacher

ASKER

Apologies. Pls find attached the spreadsheet....

the Solver would set all the weights to zero in order to minimize the error - no matter the objective. :-)
Well - shall keep that one up my sleeve as a last resort ;-)
Lsq-concept-v3.xlsx

hypercube

OK. Thanks.

hypercube

I'm getting the sense that the variables are mix numbers relative to each instrument?
So, suppose I'm going to invest 100% of some hedge fund and the relative values of the variables represent the percentages to be used???
If that's the case then I'd rather use those percentages. Numbers in the 100's of millions don't resonate with me.
The more understanding I get, the easier to suggest things.

Nonetheless, I think I understand what you've done. Here are some observations:

You initially introduced an additional set of measures but no new degrees of freedom (or at least no new variables which might normally be the same thing). This means that the optimization "energy" now has to be split between the two sets of measures: Vega and Rega. This is easy to see if you look at the errors in the mimimax solutions where there are two error peaks for each.

In some respects this structure is a bit different than what I'm used to. Normally I'd have *lots* of "nodes". I rather suspect that what's happening is that because there are so few nodes, the peak error seeking gets "stuck" and unable to go any further. I do see some error peaks that are sorta like the max values so they may be an example of what I'm talking about.

Perhaps this will help:
If there were 1,000 nodes then the peak errors would occur somewhere, in the appropriate number and with alternating sign (ignoring many technical details around the edges).
If, we found an optimum solution over those 1,000 nodes and then threw out all but 10 of them, keeping the variable values, then the "error surface" would be very "pixelated" so to speak and we'd not see the peaks any more - rather some interim values. So, I think it must be something like that.

I found that running the Evolutionary solutions did improve the results. So I'd not trust any of the other Solver methods in view of that.

It appears that the Vega matrix has 5 and 6 linearly independent rows so all of the variables should play in either case.

I didn't understand why you would want to weigh the Rega errors heavily at the expense of the Vega errors. So I tried cases with those weights equal.

You should check the formulas carefully, particularly in cases 6-8.

I don't know what is "good" and what is "bad" in terms of magnitude of the differences.....

This thing right now is aborting uploads..... so I'll just send the comments along first.

hypercube

I posted the file to this question using EEStuff.

Michael Hamacher

ASKER

Thank you very much for reverting, comments & sheet well received

A few points in this context
I found that the solution is not very reproducible. For instance, when trying to reproduce your case 8 (optimizing the max error using the evolutionary solver, based on equal weights), I tend to get different solution if I vary the original target vector x (i.e. the initial guess). See e.g. Case8' in attached file, which merely replicates your Case 8 but has arrived at a different solution (max error of 150k, v/s 185k for Case 8). So based on my layman's understanding, it would seem that the optimization is prone to get stuck in local optima...As such, is there a way to prevent or at least mitigate that?
Moreover, would it make sense to look at other optimization approaches? I have heard e.g. of Newton-Raphson, although I have literally zero experience….

Just to give you a bit more colour on the backgrounds: it not quite as fanciful as investing in hedge funds ;-), the objective is a bit more profane, namely to find the most effective hedge for the risk profile a structured FX product, using a portfolio of FX vanilla options. Essentially, in the context of the basic example, I assume three basic type of vanilla options ; ATM, Call, Put. ATM options have peak vega at the current spot level (0% node); Call options to the nodes right of the center, whereas put options to the left of the center node. The exact locality of the peak vega & shape of the vega profile depends on the so-called delta of the option: a 25delta call option has its peak vega closer to the center than a 10delta call option (correspondingly, instrument 1 has its peak vega at +2.0%, whereas instrument 2 at +1.0%, so they could be regarded as 25delta and 10delta Calls).
Specifically: instrument 3 a so called ATM (= at-the money option, even though the nomenclature isn’t really relevant). At the 0% node (= 0% FX spot level), one unit of this option has a vega of 0.40%, and then tapers off symmetrically either side from the ATM level. See e.g. http://www.optiontradingtips.com/greeks/vega.html)
On the other hand, Instrument 2 may represent a 25delta call option, which gives greater vega exposure at the nodes right of the center (the FX spot steps > 0%) compared to the ATM option, but lesser exposure at the nodes left of the center. It has its maximum vega of 0.40% at the +1.0% spot step, before it begins to taper off.
Instrument 2 may represent a 25delta call option, which gives greater vega exposure at the nodes right of the center (the FX spot steps > 0%) compared to the ATM option, but lesser exposure at the nodes left of the center. It has its maximum vega of 0.40% at the +1.0% spot step, before it begins to taper off; on the other.
Copy-of-Lsq-concept-v4-mod.xlsx

hypercube

Oh. So the % values are the option price distance from ATM? I've been wondering.
This implies that there could be more "nodes" does it not? Or, are the price distances more or less fixed? If I had more points across the "nodes" then I would use them. It doesn't change the degrees of freedom I don't but it may allow one to find optimum solutions with less granularity.

Optimization works in wondrous ways... You might imagine that you are in mountainous terrain with no ability to see more than a few feet in any direction. Your objective is to get to the lowest altitude point within a 100x100 mile patch.

If the topology is such that you are in a smooth bowl then all you have to do is follow the slope to the bottom. Linear methods may well find the location in a single step. And, I believe this is the essence of Newton-Raphson which is a slope following method.

If the topology is such that you are on a bumpy surface with multiple local minima then linear methods won't suffice unless you are going to be happy with a non-optimum minimum. And, of course, this simple example is in two dimensions x,y while you have a 5 or 6-dimensional space. Well, the surface of the earth is a 3D space and easy to think of in polar coordinates where you look for the optimum angles: phi, theta and zeta .. or whatever to give you a max or a min.

If you have the latter condition then you want your optimization method to have ways to (what I call) "jump around".
And I often give it help by doing what I call "giving it a kick". Sometimes this is as simple as running the Evolutionary solver method multiple times with one run starting where the last one left off. And, setting the solver parameters could be important as well. I'm no expert on that. Another approach is to change the value of the starting variable values to something new but, shall we say, "likely" in your own view. - maybe just change one of them drastically.

Another approach is to do an exhaustive search over the variable space. In your case you might try values that span the various outcomes as they all seem to be rather close together. They all have the same sign. They seem to be within a distance of 20%-50% of one another (variable for variable). Even if you do this on a coarse grid of values you could use the best case as a starting point thereafter.

You might find an optimum and then use the result to optimize further on but a single variable - then set that one to start far away like set it to 1.0 or 0.0 or -1,000,000,000. Either the result will be better but it likely won't be worse. If it's better then you have a new starting point.

Yes, these are manual tricks and maybe a bit extreme but if you want to understand the sort of space you're working with then it may lend insight.

FrontlineSolvers (the makers of Solver) say:
http://www.solver.com/standard-excel-solver-grg-nonlinear-solver-stopping-conditions

Bear in mind that all of these messages assume that the model is reasonably well-scaled, and/or that the Use Automatic Scaling box has been checked in the Solver Options dialog (it is unchecked by default). A poorly scaled model can "fool" all of the Solver's algorithmic methods, as discussed in Problems with Poorly Scaled Models.

And, also:

The best that current nonlinear optimization methods can guarantee is to find a locally optimal solution. We recommend that you run the GRG Solver starting from several different sets of initial values for the decision variables -- ideally chosen based on your own knowledge of the problem. In this way you can increase the chances that you have found the best possible "optimal solution."

So, the result of an Evolutionary run can depend on the starting point and there will be multiple "solutions" - even if there is only one global optimum.
While I'm not an expert on the inner workings of Solver really, some of the methods use large steps initially and then shorten the steps as the process proceeds. So, starting over my help IF the initial steps are large enough.

I ran a number of your cases with the Evolutionary method using multiple consecutive runs. They improved on the second run and a tiny bit on the 3rd run. Case 5 seems to converge to the same result after 2 runs and only slightly improves on the 3rd run - but it does "improve" slightly. I also ran a number of Evolutionary runs using "far off" variable values to start. Each case converged to the same result for Case 5 at least and I suspect all of them would behave that way .. but can't prove it.

It may be instructive to ponder the coefficients that are generated. Introduction of Rega changes them rather drastically. In a problem space like this, I wonder if that's at all realistic. I would think that the mix of instruments would be stable to some degree and "improvements" to the approach would do some fine tuning. So, when I see some coefficients as large numbers AND that they change sign from one formulation to another, that's worrisome. At that, what does a negative coefficient mean in your application?

The second file I added has a "Population Report". I'm not sure how to interpret it entirely but I have to wonder about the "standard deviations".
Lsq-concept-v4-mod-2014-11-05-.xlsx
Lsq-concept-v4-mod-2014-11-05-A-.xlsx

Michael Hamacher

ASKER

Just to clarify:

So the % values are the option price distance from ATM? I've been wondering.

I just mean to confirm: are your referring to the %age values that denote the 'nodes' (e.g. B7:M7)? these represent the distance from the current spot level in %age terms. So if the product was in EURUSD, 0% would correspond to 1.25 (= the current EURUSD exchange rate), -0.5% & +0.5% would correspond to 1.24375 and 1.25625 respectively; -1.0% and +1.0% to 1.2375 and 1.2625...
The ranges B8:H8 & I8:M8 represent the vega & rega exposure of the structured product at these different spot levels, respectively (in absolute terms). Likewise, the matrices B11:H15 & I11:M11 respresent the vega & rega exposure of the different option hedges at these spot levels (in %age terms - we are then solving for the notional amount per hedge instrument)
And yes, you are entirely correct, we could add more nodes.

what does a negative coefficient mean in your application?
Since the coefficients denote the notional amount for a given hedging instrument, a negative one simply means that the we sell this option.

I ran a number of your cases with the Evolutionary method using multiple consecutive runs. They improved on the second run and a tiny bit on the 3rd run. Case 5 seems to converge to the same result after 2 runs and only slightly improves on the 3rd run - but it does "improve" slightly. I also ran a number of Evolutionary runs using "far off" variable values to start. Each case converged to the same result for Case 5 at least and I suspect all of them would behave that way

As mentioned, I tried this for case 8, but I am unable to get the solution to converge (as e.g. exemplified by case 8 and 8')

So, the result of an Evolutionary run can depend on the starting point and there will be multiple "solutions" - even if there is only one global optimum. While I'm not an expert on the inner workings of Solver really, some of the methods use large steps initially and then shorten the steps as the process proceeds. So, starting over my help IF the initial steps are large enough.

I did think along similar lines; and I was wondering whether it may be appropriate to consider alternatives to solver that allow for more control? It appears that differential evolution is a popular approach in such instances...was wondering if you may have experience in that?

Cheers

hypercube

Thanks for helping me understand. If you have the luxury and the information then I'd add nodes.

As far as the last two Case 8 situations, I don't know quite what to make of this. I ran both of them over and over and over alternating between Evolutionary and GRG Nonlinear. And, I tweaked the convergence, seed, population, etc.
I did check that the formulas are the same and I checked that the same variable values give the same results in both. So that leaves the optimization. I now have them "close" as well as a bit improved.

I have relabeled the last two cases as Case 9 and Case 10 and added a Case 11.
I added a "Save buffer" in Cases 9,10,11 where I could conveniently save old sets of variable values. This helped compare variable sets between Cases.

Here are some observations:

- The Evolutionary method seems to not change the outcome or the variables at all at times. At other times it makes a bit of progress. Usually the GRG Nonlinear does nothing .. but not always. FrontlineSolvers suggests that one might follow an Evolutionary run with a GRG Nonlinear run - it may show improvement that way.

- I "kicked" one of the variables to pull it closer to the other Case 8 and didn't seem to get much agreement in the result thereafter.

- Something is causing Cases 9 and 10 to be different. One hypothesis is that the convergence criterion is such that they stop before matching.

- Another hypothesis is that the surface is rough. I have to wonder a bit if the size of the numbers we're dealing with is just too much? That motivated Case 11. But it appears to not have helped.

- Doing minimax is probably "nonlinear" to the Solver - even though there are iterative linear methods to solve it. So, one might do GRG Nonlinear in the least squares sense and then do Evolutionary in the minimax sense. The idea is to get close and go from there.
Lsq-concept-v4-mod-2014-11-05-C-.xlsx

hypercube

I wanted to see if we could envision what's happening so I added some plots. I don't know that they help all that much but there was some progress re: improving the outcome with this particular matrix and set of objectives.

Each plot shows the contribution of each row in the matrix multiplied by its corresponding variable value.
So, these are the "constituent parts" of the estimate.

In this case, it looks like Instrument 5 is doing nothing really.
Otherwise, overall it looks like the other Instruments contributions are sometimes a bit "bumpy" which doesn't help in doing optimization.
I even tried switching the order of Vega and Rega in the matrix just to see what the Solver would do with it. Well, the result was that the outcome improved just a bit.
From what I see, the outcomes are quite sensitive to the variable values and that doesn't seem like a good thing.
I still wonder how this might behave if there were more nodes. And, by "more" I mean "in between".
I would also look at why they look the way they do. Could there be any errors? The plots should help because they allow you to see the profile of each Instrument vs. node. The weighting by the variable value is just a scale factor. Some of them look a bit strange to me but I wouldn't know how to fill that matrix in the first place.
Lsq-concept-v4-mod-2014-11-07-D-.xlsx

Michael Hamacher

ASKER

Thank you for your comments...I have added some extra nodes & this doesn't seem to improve the fit unfortunately (see attached workbook)

I don't know that they help all that much but there was some progress re: improving the outcome with this particular matrix and set of objectives.
Can you please elaborate this a bit more, i.e. where you see this progress?

Also, on a more general level, I was wondering whether there is any specific reason why in this case we optimize for the max error (as opposed to e.g. the sum of the squared differences)...
Copy-of-Lsq-concept-v4-mod-2014-11-07-E.

ASKER CERTIFIED SOLUTION

hypercube

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Michael Hamacher

ASKER

Thanks. Could you please explain the comment below a bit more

In fact, if you see errors in the same sequence Vega or Rega, and since they are ordered in some sense then the peaks should alternate in sign. If they don't (and they don't) then there is likely some constraint in between two peaks of the same sign

Where exactly do you look at for the error peaks? Row 24?

Lastly, do you think it may be worth programming a different algorithm, e.g. differential evolution?

hypercube

Hmmm... I thought I had answered this one earlier. Anyway:

Yes, row 24.

I think that Solver's Evolutionary is Differential Evolution by another name. But, I'm sure there are varieties of such things of the same name.

I'm attaching my latest spreadsheet. I ran "Vega only" and got some pretty good results.
Then I applied those results to a Rega/Vega version and the Rega errors (unweighted) were really high.
Then I ran the Rega/Vega version starting at those variable values and it seemed like the Rega peaks were much lower.
Then I ran the Rega/Vega using different weights on Rega that were <100% in order to bring the Rega peaks up.

I don't know if this is the direction you'd take it but I thought the results were interesting and the Vega numbers were better that way.

I hope this helps.
Lsq-concept-v4-mod-2014-11-07-G.xlsx

Michael Hamacher

ASKER

Thanks for your help Sir. Just in case, it would be great if you could help elaborate on this one point regarding L2 you made earlier:

Note: LMS is also known as the L2 norm because the error is raised to the second power before summing them all up.
And it has a lot of nice physical meaning and nice mathematical features.

Cheers!

hypercube

Well, there are L1, L2, L3, L4 .... "norms" or forms of measure. The numbers refer to the value of the exponent that's used.
In L1, the sum of the errors is minimized, or maybe, the sum of the magnitudes of the errors.
In L2, the sum of the squares of the error is minimized - or the SQRT of that sum is minimized.
In L3, the sum of the cubes of the errors, or maybe, the sum of the magnitudes of the cubes of the errors.
In L4, the sum of the 4th power of the errors is minimized.
.
In L-infinity, if you imagine the errors being raised to a very large power (conceptually an infinite power) then the sum of those results is going to be driven by the peak values. So the peak values are minimized. So, minimax. This has the benefit of the error being the smallest possible at any chosen point.

In L2 if you minimize the square root of the sum of the squares, you are minimizing the root mean square (rms) value which is also that value that represents something analogous to a voltage in the same rms sense and, power is V^2/R in a simple electrical circuit. So, if you minimize the rms value, you are minimizing the "power" in the error. That's why I said it's an aggregate sort of measure.
And, in L2, there are analytical approaches to solving for the optimum as I did with the matrix algebra.
I'm sure there are many more "nice" things about L2.

Thanks for the points and the opportunity to dig into something interesting!