Solved

find distribution for strange data

Posted on 2014-02-05
5
327 Views
Last Modified: 2014-02-11
Hi

I have data which represent the time for process. I have spent more than two weeks to find a good distribution for it, but I coudln't. Actually I have sued exponentiol with Lambda 142.43 but when I insert that in my simulation I faced a problem due to the varionce between sample huge and that value for Lambda would genertate wrong values. So, please I need any hints or help in this point

The data attached to this question
Send-it-to-EE.xlsx
0
Comment
Question by:obad62
5 Comments
 
LVL 19

Expert Comment

by:Ken Butters
ID: 39836327
Not sure if this is exactly what you are looking for or not.  

What I did..

insert a chart to map the points, then inserted a trendline using the option for automatic polynomial.  

selected to show the polynomial in the chart.   It seems to capture the distribution of points rather well.

Attaching updated spreadsheet with chart.
Send-it-to-EE.xlsx
0
 
LVL 27

Expert Comment

by:aburr
ID: 39836791
You can fit any set of data points to a polynomial curve, but the real question is , Does the curve mean anything?"

What is the precision of your data?  (It looks pretty good)
What is the relation you expect between y- axis and x- axis?
(mu guess is you expect it to be flat.
Butters give you a nice graph.
My guess is that something major happened in your process between points 1392 and 1393.
And that an equation reproducing your points will be worthless (except for pinpointing the fact that something drastic happented
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39839187
Here is one way to analyze the data in Excel.

1.  Calculate the time interval between successive events.
2.  Calculate the log of the time differences.
3.  Assign an Index to each log value.
4.  Copy the values of the Log and Index columns to two new columns.
5.  Sort the new columns by log value.
6.  Plot the sorted Log (Top) and Index (Bottom) columns.

There are a few outliers, but most of the Log values are uniformly distributed between -3 and +1.  This corresponds to time intervals between 0.001 and 1.0 seconds (or days or time units).

The Index plot suggests that small interval values may occur anywhere in the sequence.
As the values get larger, they are more likely to appear later.

I still don't know what the data means or what questions you are trying to answer.
Analysis-for-ExEx.pdf
0
 
LVL 21

Accepted Solution

by:
Ejgil Hedegaard earned 500 total points
ID: 39842182
Another approach is to find the number of values in a series of intervals, to see how the values are distributed, and find a matching curve, to give the probability for any time value to occur.

First step is to plot the data to get the frequency distribution, and see what it looks like.
I used a step of 10 up to 1060, counting how many values are in each interval, using the frequency function, divided with the number of datapoints (1396), giving the distribution.
The result for 10 is 0.2048, meaning that the probability is 20 % for a value between 0 and 10. For a value between 10 and 20 it is 11 % etc.

It looks exponential, but a check with the logarithmic function shows that it is only the case for the lower values, the upper values follows a more linear function, and a constant will be ok.
See the comments on the sheet.

I end up with a formula like this P(X) = a*Exp(-b*X) + c

The best approximation is then found by calculating the sum of P(X)-Data(X) squared, finding the minimum. For that I used the add-in Solver function.
It gives a correlation of 0.97 witch is ok with a dataset as this.
And the chart shows that the approximation is ok.
Distribution-strange-data.xlsx
0
 

Author Closing Comment

by:obad62
ID: 39849867
Thanks, That what I am looking for exactly.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Cascading dropdown 9 28
Boolean help 6 27
Access Excel export not behaving 2 25
Excel callender with date slider 5 27
Workbook link problems after copying tabs to a new workbook? David Miller (dlmille) Intro Have you either copied sheets to a new workbook, and after having saved and opened that workbook, you find that there are links back to the original sou…
This article provides a brief introduction to tissue engineering, the process by which organs can be grown artificially. It covers the problems with organ transplants, the tissue engineering process, and the current successes and problems of the tec…
The viewer will learn how to use the =DISCRINV command to create a discrete random variable, use this command to model a set of probabilities and outcomes in a Monte Carlo simulation, and learn how to find the standard deviation of a set of probabil…
This Micro Tutorial demonstrates how to create Excel charts: column, area, line, bar, and scatter charts. Formatting tips are provided as well.

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now