# Predict expected value from past and current data

Fred Marshall used Ask the Experts™
on
I have a cash projection model in Excel.
I have around 10 years of monthly data.
There are around 10 sources of cash and around 10 uses of cash.
I have determined the type of distribution which best fits each of the 20 series.
I have calculated the pertinent statistical parameters of each of the 20 series (e.g. mean or mode, standard deviation, etc.)
I have fitted the selected distributions with their respective parameters.
Some of the series have a normal distribution around a relatively high mean value.
Some of the series have a more Rayleigh or Weibull distribution - always positive but not symmetrical.

The purpose of the model is to predict the cash position over each of the next 12 months.
In some cases, a series has a growth factor to be included.

Each month, for each series, the current actual value is appended resulting in a new series of actual values.

Then an expected value is calculated for each of the next 12 months.

The question is:
Given a random series with known distribution parameters, what is a good way of generating expected value predictions?

In some cases, it appears that the best prediction is the current value.
However, if there's an outlier in the current value, this doesn't work well.
Comment
Watch Question

Do more with

EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Partner

Commented:
By the look at your question I see that it is a bit hard to visualize unless with an attached workbook. I understand though by your description that the figures may be very sensitive and confidential reason why I suggest you mockup a file and change sensitive data and post same and then by indicating specifically on the data posted what you want I guess we can take it from there on.

Gowflow
Principal

Commented:
gowflow:  Thank you!
Well, as usual, I likely gave too much information.  The Excel aspect isn't all that important.  It boils down to:

The question is:
Given a random series with known distribution parameters, what is a good way of generating expected value predictions?

In some cases, it appears that the best prediction is the current value.
However, if there's an outlier in the current value, this doesn't work well.

Said another way, given a random series with rather strong medium-length trends, what is a good way of generating expected value predictions?
Stock prices are be a good example of this.  Some time ago, I had figured the best expected value prediction was the current value - considering relatively short intervals of time.

Said another way, given a random series with small medium-length trends and with possible large deviations from the "normal" expected value, what is a good way of generating expected value predictions?
Monthly mortgage loans made are an example of where the standard deviation isn't so large but any monthly value can be an outlier.
Partner

Commented:
OMG ! you threw out of the league !!! If you have something specific in Excel I can help you with to the max extent but this is like trying to see the best way to gamble to win the lottery !!!.

Sorry I withdraw
Gowflow
Principal

Commented:
Yes, I have the feeling that the "problem" isn't well-posed.  That's why I'm asking for inputs.
Some thoughts:
It's certainly not gambling - it's analysis even though Monte-Carlo does come up.  :-)

Why bother to calculate future expected values when one can generate a distribution using different methods which, unto itself, will have an expected value?
One answer might be "as a check".  Another answer might be: "because it's more efficient" or "it takes less code".
So, if one just uses Excel, the latter will likely require using VBA in order to generate many solutions based on the distributions of the inputs.

But, if one uses Excel with Frontline Solvers' Analytic Solver and Monte-Carlo approach then it's easy.

The basic question remains:
Given a random series with known distribution parameters, what is a good way of generating expected value predictions?
Most Valuable Expert 2013
Commented:
"However, if there's an outlier in the current value, this doesn't work well. "

That's always the problem when real life and mathematical modelling collide :)

So you have monthly and annual trends for about 10y worth of real world data and a separate trend for growth in the system (inflation/productivity increases)

It sounds like you have annual modelling already in place provided your monthly data fits witihn a previous pattern so what you could look at is highlighting these outliers and their variance from expectation .  You've probably enough data to have a mean and SD for historic monthly totals that could do this.

You can then produce two predictions for the year; one that shows real data but highlights those months that are outliers and variance from expected mean (+ growth) - Range bars are useful for this; a second graph could use the outlier monthy figure as a mean of the outlier and all previous months - again showing variance from expected.

This kind of data profiling is useful where there are external influencers of productivity but are also useful in identifying external pressures but also damp out unexpected variation that can have an undue effect on predicting the rest of the year.
Principal

Commented:
MASQ:  You've certainly given me some ideas here.
One is that we can use the past data to bound the outliers in throwing outliers out in making a prediction.  Otherwise I probably would have used a guesstimate.  I think the limit needs to be fairly tight but not very accurate.

I'm now wondering if my notion of projecting expected value is a flawed idea from the outset.
I have the statistics and can simulate a distribution of predicted outcomes.  If I do that, the expected value should be the expected value of the historical population, right?
What I was trying to do was pick up any current trend and used the one current value for the expected value.

I can imagine a number of similar approaches:
- throw out any outliers to begin with.
1) then continue to use the current value as the expected value.
or
2) do a fit of recent values (could be a straight line)
or
3) do a fit of recent values most heavily weighted toward the current value(s).

In past simulations I found that #1 seemed to be best because one uses an actual value as the starting point.  That the projection is a flat line (less any desired trends), may seem a bit strange but then it *is* an expected value and not a simulation of outcomes with a variance.
So, perhaps throwing out the outliers first is all that's needed?

Thanks for the response!

Perhaps I should explain further:
With the 10 constituents adding and 10 constituents subtracting, if we generate an expected value for each then we can take the sum for an overall expected value.  And that number is useful for display and discussion purposes.
So that's a motivation.

We aren't actually doing a simulation at this point, although I'd like to.  We are doing a "manual simulation" by recalculating the sum of constituents based on their underlying statistics - and, presumably, their current values - and simply observing the range of outcomes out of 50 or 100 trials.
So, this situation makes the one-time expected value calculation rather helpful.
Principal

Commented:
Thanks!