How to implement guide with k-means clustering in python - centroid implementation guide after first iteration
I've tried everything I can think of but Hey everyone, I'm running into an issue that's driving me crazy... I'm currently implementing the K-Means clustering algorithm using Python and the `numpy` library, but I'm working with an scenario where the centroids do not seem to update after the first iteration. My dataset consists of a 2D array of points, and I've implemented the algorithm as follows: ```python import numpy as np class KMeans: def __init__(self, n_clusters, max_iter=100): self.n_clusters = n_clusters self.max_iter = max_iter def fit(self, X): # Randomly initialize centroids random_indices = np.random.choice(X.shape[0], self.n_clusters, replace=False) self.centroids = X[random_indices] for i in range(self.max_iter): # Assign clusters distances = np.linalg.norm(X[:, np.newaxis] - self.centroids, axis=2) self.labels = np.argmin(distances, axis=1) # Update centroids new_centroids = np.array([X[self.labels == j].mean(axis=0) for j in range(self.n_clusters)]) if np.all(new_centroids == self.centroids): break # Stop if centroids do not change self.centroids = new_centroids # Sample usage X = np.array([[1, 2], [1, 4], [1, 0], [4, 2], [4, 4], [4, 0]]) kmeans = KMeans(n_clusters=2) kmeans.fit(X) print(kmeans.centroids) ``` The question occurs after the first iteration; the centroids remain unchanged even when there are clearly new clusters forming. I've checked that the `self.labels` array is being updated correctly, but the mean calculation for the centroids seems to be returning the old values. I've verified that my input data is correct and that the `numpy` operations should be behaving as expected. I’ve also tried using `print` statements to debug the values of `self.centroids` and `new_centroids`, which show that they are indeed the same after the first iteration, despite the labels being different. Any insights into why the centroids are not updating would be greatly appreciated! Is there something I'm missing in the centroid update step or the way I'm calculating the means? For context: I'm using Python on Ubuntu. How would you solve this? I'm developing on Ubuntu 20.04 with Python. Is there a better approach? Any pointers in the right direction? I'm coming from a different tech stack and learning Python.