K-Means algorithm is an unsupervised clustering algorithm.
How it works
- Randomly initialize clustering centroids
- Optimize centroids until converge
- Assign training data to closest centroid
- Set centroid to the mean of assigned data
Cost function
Debugging
Random initialization
Initialize centroid by picking points from training data.
K-means cost function may converge to a bad local optimum, we could try several random initializations to find the global optimum.
Choosing value
We could try plotting the cost and value. If there is an elbow, it could be a good value. If no, we should determine by data and our purpose.