Solved

Clustering co-ordinates

Posted on 2011-04-21
389 Views
Hi all,

I am looking for an algorithm or a best approach for clustering coördinates
I have a number of coördinates and I want to group them in blocks of 10 or 20 (variable)
I want this to be optimised for shortest distance (minimum total  (x1 - x2)² + (y1 - y2)² for all xi|yi in a cluster I assume)
cluster size needs to equal in number of coördinates, area should be minimal, number of coördinates should be a given constant

so given an array of coördinates and a cluster size, how do I cluster?

Geert
0
Question by:Geert Bormans

LVL 37

Accepted Solution

So you want to force the size of the cluster to be the same for every cluster?

One simple and easy to implement approach would be to pick the point with the minimum x,y coordinates then find the one closest to it. Make this a cluster, then find the next point which is closest to the average of all the points (centroid) currently in the cluster, then repeat until you get the size you want. Start the process again for the next cluster.

This will not guarantee the optimal solution, but the problem looks to be NP complete so there is no efficient way to guarantee an optimal solution. The above should get you close enough and is possibly the simplest way to get that close.
0

LVL 60

Author Comment

well, exact size of the cluster is more important than the ultimate minimal spread.
So this approach might actually work, I will give at try, thanks.

In the mean time, I still welcome other ideas of course
0

LVL 37

Assisted Solution

For a slightly more complicated version, look up the 'facility location' problem. A 'facility' in your case would be the centers of the clusters. You could put your restriction in for facility service size and minimize the distance to a facility instead of the number of facilities. There are lots of versions of the algorithm out there, many with code. It will give a slightly better result in most cases but will be more complex to implement.
0

LVL 84

Assisted Solution

A greedy algorithm might start with clusters of size 1, and at each step merge the clusters with the closest centroids.
If cluster size is paramount, you'd probably prefer joining clusters that are less than half the size of the largest cluster,
minimizing the spread of the cluster sizes.
While this is likely to converge to a non-optimal solution, it should be a reasonable place to start applying transforms to search for a better solution.
0

LVL 31

Assisted Solution

Using a quadtree may provide you with a good starting point in identifying your required clusters.
"A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are most often used to partition a two dimensional space by recursively subdividing it into four quadrants or regions."

"Each cell (or bucket) has a maximum capacity. When maximum capacity is reached, the bucket splits"

"The point quadtree is an adaptation of a binary tree used to represent two dimensional point data. It shares the features of all quadtrees but is a true tree as the center of a subdivision is always on a point. The tree shape depends on the order data is processed. It is often very efficient in comparing two dimensional ordered data points, usually operating in O(log n) time."

Here is an article using quadtree application:
http://www.ucgis.org/membersonly/summer03workshop/Wang_Armstrong%20copy.PDF
0

LVL 60

Author Comment

gentlemen, thanks for all answers so far.
I need to put some things in code now
I keep you posted, but I had to give attention to some other stuff
0

LVL 60

Author Closing Comment

thank you all for your answers. I am still looking for implementing this (attention has been diverted) It seems that I can find what I need in the answers given. If it turns out I need more help, I will post another question. Thanks all for your contribution
0

Featured Post

Displaying an arrayList in a listView using the default adapter is rarely the best solution. To get full control of your display data, and to be able to refresh it after editing, requires the use of a custom adapter.
"Disruption" is the most feared word for C-level executives these days. They agonize over their industry being disturbed by another player - most likely by startups.
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…