?
Solved

Cluster Point

Posted on 2014-03-08
14
Medium Priority
?
378 Views
Last Modified: 2014-03-25
Hi Experts,


I just need to know how do I find cluster point from scattered points. i.e. which point have very less distance from other points.


Obviously using Excel.

Thanks
0
Comment
Question by:Naresh Patel
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 4
  • 2
14 Comments
 
LVL 81

Expert Comment

by:byundt
ID: 39914934
itjockey,
Could you please post a sample workbook and describe the results you hope to receive from the macro?

I am guessing that you have a scatter chart (as opposed to line chart). I am also guessing that you want to distinguish points that are clustered as opposed to scattered.

One approach you might use is to establish criteria for clustered points, such as having more than 6 other points within a distance of 10% of square root(X-axis scale^2 + Y--axis scale^2). This criteria assumes that the scale (by which I mean the difference between maximum and minimum scale values on the chart) has comparable values on X-axis and Y-axis.

If X-axis and Y-axis have drastically different scale values, then you might use an alternative criteria, such as more than 6 other points within both 7% of X-axis scale and 7% of Y-axis scale.

The code will be easy once we agree on how to distinguish between clustered and scattered. That said, the number of points on your scatter chart will greatly affect how long it takes to make the calculations. This is because each of the n points on a chart must be compared against all n-1 other points. So the difficulty increases as the square of the number of points on the chart.

Brad
0
 
LVL 8

Author Comment

by:Naresh Patel
ID: 39915023
Sir Brad,
There is no x axis, for understanding sake there is only y axis which represents prices I.e. Points.
I need to find out which price point is minimum distance from all prices. If I average it out then result will not perfect as say 7 points are at 3% near by and 3 points are 10% away from these 7 points. So my average give me false result.i guess you understood my point.
there is no sample spreadsheet it is just thought. Criteria for find cluster point 10% is enough. You can take any number of random numbers. Still you want sample pls let me know.
One personal question.if you wish to share.
What is the meaning of "byundt".i guess there is some meaning behind that.wild guess ancient Egyptian civilization - Pyramid and all that stuff ;-)  

Thank You
0
 
LVL 81

Assisted Solution

by:byundt
byundt earned 600 total points
ID: 39915080
itjockey,
If you are looking at prices, then I assume each category item (X-axis) has to be considered separately. If so, you might consider calculating the mean (average) and standard deviation for the prices on each item. Prices that are within two standard deviations of the mean might be considered "clustered," while the ones farther from the mean than that would be considered "scattered."

While I could generate data using random numbers, it would certainly be considered "scattered." And if I tried to create two populations of data, one that are clustered and the other that are scattered, then whatever assumptions I made to create the data would be echoed in the criteria to distinguish clustered from scattered. For this reason, I am reluctant to generate test data and hope that you might have real data to better focus this discussion.

Rather than trying to distinguish clustered from scattered data, another question you might ask is whether certain points should be considered "outliers." There are well-established techniques for doing so in statistics. Wikipedia discusses various means for distinguishing outliers in http://en.wikipedia.org/wiki/Outlier.

Even if we can agree on how to distinguish clustered from scattered data, I'm still not sure how you want to use that distinction. What should the results be from any macro?

Vis-à-vis your question on my screen name, the truth is more prosaic than pyramids. You can probably figure it out from this link.

Brad
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 8

Author Comment

by:Naresh Patel
ID: 39916791
Sir Brad,

Sorry For Delay in Reply.
You took my question to new dimension. one quote comes in my mind "universe is not that big that you can imagine but it is that big that you cant  imagine". Sir you have too much of knowledge & may others have too. but best part of you is "you believe in sharing".Sir your each & every post is very  help full & knowledge added. it may obvious for you but for me it is add to knowledge part. frequently I saw your solutions which you solved for different authors.

Now back to question - there is no chart requirement but only for understanding. want to find cluster point (not cluster points). so In attached screen there is 2 Outlier cluster & one is actual cluster. from that cluster points want to find cluster point.

Criteria for finding cluster point - say X is cluster point then X have minimum % distance from all points excluding outlier (10 % variation).Screen Shot
Apology if I wrote anything wrong as I am not good writer & English is not my native language. and I am just Account graduate so may be some time I don't understand - statistic - coding which is obvious for you.

Thank You
Book1.xlsx
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39917627
Since you are doing this in one dimension, when you have a cluster, the "cluster point" that has the least total distance from all the other points will be the point closest to the average of all the values.

For finding clusters, if you have a minimum distance D to consider, then you can just test each point and see how many points are within D of the point.
0
 
LVL 37

Accepted Solution

by:
TommySzalapski earned 1400 total points
ID: 39917640
Here is an example in Excel using formulas.
The Within D column is how many points are within that distance of the point. (You can change the max difference in cell F2 and it will update everything).

The Deviation is the distance from the point to the average of the cluster (lower is better).
The Score is the cluster size/(deviation+1). The higher the score, the better the cluster point. A high score means the point is in the middle of a large cluster.

In your sample data, 54 is clearly the best candidate "cluster point"
cluster.xlsx
0
 
LVL 8

Author Comment

by:Naresh Patel
ID: 39918237
Mr.TommySzalapski,

What if in case of i want count Difference in % not in point term. as in cell F2=10% rather than 8 points.


Thanks
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39918833
It depends on how you define 10% near.

Instead of B2-$F$2, you could use B2-(B2*$F$2) and put 10% in F2. (and do the same for the + case also). But then the farther a cluster is from 0 the more clustered you will think it is because the 10% is higher there. If that makes sense, then good.

If it should be consistent across the whole dataset, then maybe you should keep what I used but set F2 to be MAX(B:B)*.10 or something.

What do you mean by 10% near?
0
 
LVL 8

Author Comment

by:Naresh Patel
ID: 39919707
I just want calculate based on % least difference from points not numerical point base (Cell F2= 10% not like 8). as this is only example. if in case points values in column B is multiple of 1000 then max difference 8 in Cell F2 doesn't make any sense.

See attached I had done what you said. & produced proper result.

Thank You
cluster.xlsx
0
 
LVL 8

Author Comment

by:Naresh Patel
ID: 39920252
Sir byundt / Mr. TommySzalapski,

Anything more you want to share or anything which I have to take in account in future. as I thing I got my answer. may I accept this question?


Thanks
0
 
LVL 37

Expert Comment

by:TommySzalapski
ID: 39920420
Just remember that when you use percents in that way it will not find many clusters near 0.
You could use 3% of the max value instead of 3% of the current point. It all depends on which makes sense for the data.

Glad we could help.
0
 
LVL 8

Author Comment

by:Naresh Patel
ID: 39920454
Got it !!!


Thank You Very Much
0
 
LVL 8

Author Closing Comment

by:Naresh Patel
ID: 39922569
Actually I am in dilemma how to split points.

Thanks
0
 
LVL 8

Author Comment

by:Naresh Patel
ID: 39953362
Sir byundt / Mr. TommySzalapski,

Any Views On This Question "Math Cross Check"

if have spare time then pls help me on this.

Thank You
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Excel can be a tricky bit of software to get your head around. Whilst you’ll be able to eventually get to grips with the basic understanding of how to get by, there are a few Excel tips that not everybody will even know about let alone know how to d…
When we purchase storage, we typically are advertised storage of 500GB, 1TB, 2TB and so on. However, when you actually install it into your computer, your 500GB HDD will actually show up as 465GB. Why? It has to do with the way people and computers…
This Micro Tutorial will demonstrate how to use longer labels with horizontal bar charts instead of the vertical column chart.
This Micro Tutorial will demonstrate the scrolling table in Microsoft Excel using the INDEX function.

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question