I am looking for an algorithm or a best approach for clustering coördinates
I have a number of coördinates and I want to group them in blocks of 10 or 20 (variable)
I want this to be optimised for shortest distance (minimum total (x1 - x2)² + (y1 - y2)² for all xi|yi in a cluster I assume)
cluster size needs to equal in number of coördinates, area should be minimal, number of coördinates should be a given constant
so given an array of coördinates and a cluster size, how do I cluster?
So you want to force the size of the cluster to be the same for every cluster?
One simple and easy to implement approach would be to pick the point with the minimum x,y coordinates then find the one closest to it. Make this a cluster, then find the next point which is closest to the average of all the points (centroid) currently in the cluster, then repeat until you get the size you want. Start the process again for the next cluster.
This will not guarantee the optimal solution, but the problem looks to be NP complete so there is no efficient way to guarantee an optimal solution. The above should get you close enough and is possibly the simplest way to get that close.
For a slightly more complicated version, look up the 'facility location' problem. A 'facility' in your case would be the centers of the clusters. You could put your restriction in for facility service size and minimize the distance to a facility instead of the number of facilities. There are lots of versions of the algorithm out there, many with code. It will give a slightly better result in most cases but will be more complex to implement.
0
Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.
A greedy algorithm might start with clusters of size 1, and at each step merge the clusters with the closest centroids.
If cluster size is paramount, you'd probably prefer joining clusters that are less than half the size of the largest cluster,
minimizing the spread of the cluster sizes.
While this is likely to converge to a non-optimal solution, it should be a reasonable place to start applying transforms to search for a better solution.
Using a quadtree may provide you with a good starting point in identifying your required clusters.
"A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions."
"Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits"
"The point quadtree is an adaptation of a binary tree used to represent two dimensional point data. It shares the features of all quadtrees but is a true tree as the center of a subdivision is always on a point. The tree shape depends on the order data is processed. It is often very efficient in comparing two dimensional ordered data points, usually operating in O(log n) time."
thank you all for your answers. I am still looking for implementing this (attention has been diverted) It seems that I can find what I need in the answers given. If it turns out I need more help, I will post another question. Thanks all for your contribution
0
Featured Post
Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.
One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.
One simple and easy to implement approach would be to pick the point with the minimum x,y coordinates then find the one closest to it. Make this a cluster, then find the next point which is closest to the average of all the points (centroid) currently in the cluster, then repeat until you get the size you want. Start the process again for the next cluster.
This will not guarantee the optimal solution, but the problem looks to be NP complete so there is no efficient way to guarantee an optimal solution. The above should get you close enough and is possibly the simplest way to get that close.