Solved

How do you pick "k" when running k-means clustering?

Posted on 2014-02-05
2
411 Views
Last Modified: 2016-03-23
The more I Google, the more I get the sense that picking the number of "k" clusters to run k-means clustering on is more of an art than a precise science.  Even Wikipedia throws out many options with no clear winner: Determining the number of clusters in a data set - Wikipedia, the free encyclopedia

 

I'd love to hear from my Big Data colleagues across the firm how they pick the number "k" clusters when running this very popular (and common) unsupervised machine learning algorithm.  I've been more of supervised learning classification type of guy up until now, so I'm hoping to benefit from your hard earned best practices as I delve deeper into clustering.

 

Note: I'm using the "kmeans" tool in Mahout for Hadoop on a corpus of text documents transformed into sparse TF-IDF vectors.  However, I suppose the  technique to select a reasonable starting "k" should really be independent of the technology one uses to run the k-means clustering.
0
Comment
Question by:AlHal2
2 Comments
 
LVL 37

Accepted Solution

by:
TommySzalapski earned 250 total points
ID: 39837095
In the absence of any other experts, I'll throw in my 1.5 cents.

Yes, how you choose k should be independent of the tool.

How you pick k is really a combination of trial and error and what k means to your application.

It also depends on what kind of performance you need. The higher k is, the longer it will take to run the algorithm. In my research in sensor networks, we usually pick much lower values for k than you might use because the devices are more constrained.

You just have to look at what you have and what you need and make a decision. Sometimes a higher k will give better results; other times it muddies things.

You really just need to play around and see what you get.
0
 

Author Closing Comment

by:AlHal2
ID: 39845959
thanks.
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Although it can be difficult to imagine, someday your child will have a career of his or her own. He or she will likely start a family, buy a home and start having their own children. So, while being a kid is still extremely important, it’s also …
Big data transfers via information superhighways require special attention and protection. Learn more about the IT-regulations of the country where your server is located. Analyze cloud providers and their encryption systems for safe data transit. S…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Articles on a wide range of technology and professional topics are available on Experts Exchange. These resources are written by members, for members, and can be written about any topic you feel passionate about. Learn how to best write an article t…

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now