Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
• Status: Solved
• Priority: Medium
• Security: Public
• Views: 399

# Clustering co-ordinates

Hi all,

I am looking for an algorithm or a best approach for clustering coördinates
I have a number of coördinates and I want to group them in blocks of 10 or 20 (variable)
I want this to be optimised for shortest distance (minimum total  (x1 - x2)² + (y1 - y2)² for all xi|yi in a cluster I assume)
cluster size needs to equal in number of coördinates, area should be minimal, number of coördinates should be a given constant

so given an array of coördinates and a cluster size, how do I cluster?

Geert
0
Geert Bormans
4 Solutions

Commented:
So you want to force the size of the cluster to be the same for every cluster?

One simple and easy to implement approach would be to pick the point with the minimum x,y coordinates then find the one closest to it. Make this a cluster, then find the next point which is closest to the average of all the points (centroid) currently in the cluster, then repeat until you get the size you want. Start the process again for the next cluster.

This will not guarantee the optimal solution, but the problem looks to be NP complete so there is no efficient way to guarantee an optimal solution. The above should get you close enough and is possibly the simplest way to get that close.
0

Author Commented:
well, exact size of the cluster is more important than the ultimate minimal spread.
So this approach might actually work, I will give at try, thanks.

In the mean time, I still welcome other ideas of course
0

Commented:
For a slightly more complicated version, look up the 'facility location' problem. A 'facility' in your case would be the centers of the clusters. You could put your restriction in for facility service size and minimize the distance to a facility instead of the number of facilities. There are lots of versions of the algorithm out there, many with code. It will give a slightly better result in most cases but will be more complex to implement.
0

Commented:
A greedy algorithm might start with clusters of size 1, and at each step merge the clusters with the closest centroids.
If cluster size is paramount, you'd probably prefer joining clusters that are less than half the size of the largest cluster,
minimizing the spread of the cluster sizes.
While this is likely to converge to a non-optimal solution, it should be a reasonable place to start applying transforms to search for a better solution.
0

Commented:
Using a quadtree may provide you with a good starting point in identifying your required clusters.
"A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions."

"Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits"

"The point quadtree is an adaptation of a binary tree used to represent two dimensional point data. It shares the features of all quadtrees but is a true tree as the center of a subdivision is always on a point. The tree shape depends on the order data is processed. It is often very efficient in comparing two dimensional ordered data points, usually operating in O(log n) time."

Here is an article using quadtree application:
http://www.ucgis.org/membersonly/summer03workshop/Wang_Armstrong%20copy.PDF
0

Author Commented:
gentlemen, thanks for all answers so far.
I need to put some things in code now
I keep you posted, but I had to give attention to some other stuff
0

Author Commented:
thank you all for your answers. I am still looking for implementing this (attention has been diverted) It seems that I can find what I need in the answers given. If it turns out I need more help, I will post another question. Thanks all for your contribution
0

## Featured Post

Tackle projects and never again get stuck behind a technical roadblock.