implementing Implementing the K-Means Clustering Algorithm in Python - Convergence Problems with Custom Distance Metric

👀 Views: 16 💬 Answers: 1 📅 Created: 2025-06-09

I'm relatively new to this, so bear with me... I'm converting an old project and I'm relatively new to this, so bear with me. I'm currently implementing the K-Means clustering algorithm in Python using NumPy, but I'm working with convergence issues when using a custom distance metric. My goal is to cluster data points based on a weighted Euclidean distance, but it seems like the centroids are not stabilizing after several iterations. Here's the relevant code snippet: ```python import numpy as np def weighted_distance(a, b, weights): return np.sqrt(np.sum(weights * (a - b) ** 2)) class KMeans: def __init__(self, n_clusters=3, max_iter=100, weights=None): self.n_clusters = n_clusters self.max_iter = max_iter self.weights = weights def fit(self, X): # Initialize centroids randomly np.random.seed(42) self.centroids = X[np.random.choice(X.shape[0], self.n_clusters, replace=False)] for i in range(self.max_iter): # Assign clusters based on weighted distance distances = np.array([ [weighted_distance(x, centroid, self.weights) for centroid in self.centroids] for x in X ]) labels = np.argmin(distances, axis=1) # Update centroids new_centroids = np.array([ np.mean(X[labels == j], axis=0) for j in range(self.n_clusters) ]) if np.all(new_centroids == self.centroids): break self.centroids = new_centroids X = np.random.rand(100, 2) weights = np.array([0.5, 2.0]) kmeans = KMeans(n_clusters=3, weights=weights) kmeans.fit(X) print(kmeans.centroids) ``` I have ensured that the weights array matches the number of dimensions in my data. However, I often find that the centroids oscillate between two positions and do not settle, even after the maximum iterations have completed. I've tried tweaking the learning rate and initializing the centroids in different ways, but nothing seems to stabilize the outcome. When I print the centroids after training, they sometimes return NaN values, which confuses me. Is there something I might be overlooking in my weight application or centroid update step? Any advice on debugging or improving this implementation would be greatly appreciated! Any help would be greatly appreciated! This is my first time working with Python stable. Thanks in advance! Am I approaching this the right way? I recently upgraded to Python latest. What would be the recommended way to handle this?