Problems with K-Means Clustering Convergence in Python - Inconsistent Results on Different Runs

👀 Views: 3788 💬 Answers: 1 📅 Created: 2025-06-08

I'm following best practices but I'm implementing the K-Means clustering algorithm using Python's Scikit-Learn library (version 1.0.2) and I'm working with an scenario with convergence. The algorithm produces different cluster centroids on different runs, even with the same input data. I've tried setting the `random_state` parameter to ensure reproducibility, but it still doesn't yield consistent results. Here's my code: ```python import numpy as np from sklearn.cluster import KMeans # Sample data X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]) # Applying KMeans kmeans = KMeans(n_clusters=2, random_state=42) kmeans.fit(X) # Printing results print("Centroids:", kmeans.cluster_centers_) print("Labels:", kmeans.labels_) ``` The output varies every time I run it without specifying `random_state`, but even with it, the centroids sometimes shift slightly. Additionally, I've checked the convergence criteria and ensured that the maximum iterations (`max_iter`) is set to 300, but I still see some variability in the results. Is there a way to ensure that the K-Means algorithm converges to the same centroids on each run? Are there any best practices I should follow to stabilize the results? Also, could data normalization affect this behavior? I'm currently not normalizing my data before clustering. Any insights or suggestions would be greatly appreciated! I'm working in a Windows 11 environment. Has anyone else encountered this?