implementing K-Means Clustering in Python - Unstable Cluster Centers on Different Runs

👀 Views: 30 💬 Answers: 1 📅 Created: 2025-06-11

I'm trying to implement I'm dealing with I've searched everywhere and can't find a clear answer..... I'm relatively new to this, so bear with me. I'm currently working on implementing the K-Means clustering algorithm using Python with the Scikit-learn library (version 1.0.2). I've noticed that the cluster centers I obtain are unstable across different runs, even with the same input data. My dataset is relatively small, consisting of 150 samples and 4 features, and I expect that initializing the centroids randomly might be the cause of this instability. I tried setting the `n_init` parameter to 10 and used the `random_state` parameter to ensure reproducibility, but I still observe different cluster centers. Here's my implementation: ```python import numpy as np from sklearn.cluster import KMeans # Sample data X = np.array([[1.0, 2.0], [1.5, 1.8], [5.0, 8.0], [8.0, 8.0], [1.0, 0.6], [9.0, 11.0]]) # K-Means clustering kmeans = KMeans(n_clusters=2, n_init=10, random_state=42) kmeans.fit(X) print("Cluster centers:", kmeans.cluster_centers_) print("Labels:", kmeans.labels_) ``` Despite setting `random_state`, I've tried running the whole block multiple times and still get different cluster centers most of the time. I've also experimented with different initializations using the `init` parameter (like 'k-means++' and 'random'), but the inconsistency remains. Is there something I'm missing, or is it inherent to the K-Means algorithm when the dataset is small? How can I stabilize the result? Has anyone else encountered this? This is part of a larger CLI tool I'm building. Has anyone else encountered this? I've been using Python for about a year now. What's the correct way to implement this? This is part of a larger service I'm building. I'm open to any suggestions.