Here is an example in Excel using formulas.
The Within D column is how many points are within that distance of the point. (You can change the max difference in cell F2 and it will update everything).
itjockey,
Could you please post a sample workbook and describe the results you hope to receive from the macro?
I am guessing that you have a scatter chart (as opposed to line chart). I am also guessing that you want to distinguish points that are clustered as opposed to scattered.
One approach you might use is to establish criteria for clustered points, such as having more than 6 other points within a distance of 10% of square root(X-axis scale^2 + Y--axis scale^2). This criteria assumes that the scale (by which I mean the difference between maximum and minimum scale values on the chart) has comparable values on X-axis and Y-axis.
If X-axis and Y-axis have drastically different scale values, then you might use an alternative criteria, such as more than 6 other points within both 7% of X-axis scale and 7% of Y-axis scale.
The code will be easy once we agree on how to distinguish between clustered and scattered. That said, the number of points on your scatter chart will greatly affect how long it takes to make the calculations. This is because each of the n points on a chart must be compared against all n-1 other points. So the difficulty increases as the square of the number of points on the chart.
Sir Brad,
There is no x axis, for understanding sake there is only y axis which represents prices I.e. Points.
I need to find out which price point is minimum distance from all prices. If I average it out then result will not perfect as say 7 points are at 3% near by and 3 points are 10% away from these 7 points. So my average give me false result.i guess you understood my point.
there is no sample spreadsheet it is just thought. Criteria for find cluster point 10% is enough. You can take any number of random numbers. Still you want sample pls let me know.
One personal question.if you wish to share.
What is the meaning of "byundt".i guess there is some meaning behind that.wild guess ancient Egyptian civilization - Pyramid and all that stuff ;-)
itjockey,
If you are looking at prices, then I assume each category item (X-axis) has to be considered separately. If so, you might consider calculating the mean (average) and standard deviation for the prices on each item. Prices that are within two standard deviations of the mean might be considered "clustered," while the ones farther from the mean than that would be considered "scattered."
While I could generate data using random numbers, it would certainly be considered "scattered." And if I tried to create two populations of data, one that are clustered and the other that are scattered, then whatever assumptions I made to create the data would be echoed in the criteria to distinguish clustered from scattered. For this reason, I am reluctant to generate test data and hope that you might have real data to better focus this discussion.
Rather than trying to distinguish clustered from scattered data, another question you might ask is whether certain points should be considered "outliers." There are well-established techniques for doing so in statistics. Wikipedia discusses various means for distinguishing outliers in http://en.wikipedia.org/wiki/Outlier.
Even if we can agree on how to distinguish clustered from scattered data, I'm still not sure how you want to use that distinction. What should the results be from any macro?
Vis-Ã -vis your question on my screen name, the truth is more prosaic than pyramids. You can probably figure it out from this link.
Brad
0
With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.
Sorry For Delay in Reply.
You took my question to new dimension. one quote comes in my mind "universe is not that big that you can imagine but it is that big that you cant imagine". Sir you have too much of knowledge & may others have too. but best part of you is "you believe in sharing".Sir your each & every post is very help full & knowledge added. it may obvious for you but for me it is add to knowledge part. frequently I saw your solutions which you solved for different authors.
Now back to question - there is no chart requirement but only for understanding. want to find cluster point (not cluster points). so In attached screen there is 2 Outlier cluster & one is actual cluster. from that cluster points want to find cluster point.
Criteria for finding cluster point - say X is cluster point then X have minimum % distance from all points excluding outlier (10 % variation).
Apology if I wrote anything wrong as I am not good writer & English is not my native language. and I am just Account graduate so may be some time I don't understand - statistic - coding which is obvious for you.
Since you are doing this in one dimension, when you have a cluster, the "cluster point" that has the least total distance from all the other points will be the point closest to the average of all the values.
For finding clusters, if you have a minimum distance D to consider, then you can just test each point and see how many points are within D of the point.
Here is an example in Excel using formulas.
The Within D column is how many points are within that distance of the point. (You can change the max difference in cell F2 and it will update everything).
The Deviation is the distance from the point to the average of the cluster (lower is better).
The Score is the cluster size/(deviation+1). The higher the score, the better the cluster point. A high score means the point is in the middle of a large cluster.
In your sample data, 54 is clearly the best candidate "cluster point" cluster.xlsx
Instead of B2-$F$2, you could use B2-(B2*$F$2) and put 10% in F2. (and do the same for the + case also). But then the farther a cluster is from 0 the more clustered you will think it is because the 10% is higher there. If that makes sense, then good.
If it should be consistent across the whole dataset, then maybe you should keep what I used but set F2 to be MAX(B:B)*.10 or something.
I just want calculate based on % least difference from points not numerical point base (Cell F2= 10% not like 8). as this is only example. if in case points values in column B is multiple of 1000 then max difference 8 in Cell F2 doesn't make any sense.
See attached I had done what you said. & produced proper result.
Just remember that when you use percents in that way it will not find many clusters near 0.
You could use 3% of the max value instead of 3% of the current point. It all depends on which makes sense for the data.
Excel can be a tricky bit of software to get your head around. Whilst youâ€™ll be able to eventually get to grips with the basic understanding of how to get by, there are a few Excel tips that not everybody will even know about let alone know how to dâ€¦
Do you use a spreadsheet like Microsoft's Excel? Have you ever wanted to link out to a non excel file on your computer or network drive? This is the way I found to do it!
The viewer will learn how to use a discrete random variable to simulate the return on an investment over a period of years, create a Monte Carlo simulation using the discrete random variable, and create a graph to represent the possible returns overâ€¦