Trouble Implementing K-Means Clustering with Scikit-Learn - Clusters Not Converging
Hey everyone, I'm running into an issue that's driving me crazy. I'm dealing with I'm currently trying to implement K-Means clustering using Scikit-Learn (v0.24.2) for a dataset with about 10,000 instances and 8 features, but I'm working with issues with the algorithm not converging to a stable solution... Specifically, the inertia values keep fluctuating significantly between iterations, which leads to inconsistent cluster assignments. Here's how I set it up: ```python from sklearn.cluster import KMeans import numpy as np # Sample random data for demonstration np.random.seed(42) data = np.random.rand(10000, 8) # Initialize KMeans kmeans = KMeans(n_clusters=5, random_state=42, n_init=10, max_iter=300) # Fit the model kmeans.fit(data) # Check inertia y = kmeans.inertia_ print(f'Inertia: {y}') ``` I've tried increasing `n_init` to 20 and also varying the `max_iter`, but the scenario continues. I also noticed that when I visualize the clusters using `matplotlib`, the clusters appear very spread out and not as distinct as I expected. ```python import matplotlib.pyplot as plt plt.scatter(data[:, 0], data[:, 1], c=kmeans.labels_, cmap='viridis') plt.title('K-Means Clustering Results') plt.show() ``` I suspect that it might be an scenario related to the scale of the features since my data is not standardized, but even after applying StandardScaler, the question remains. Any insights on how to troubleshoot this or factors I might be overlooking would be greatly appreciated! Are there any best practices for initializing centroids, or adjustments I should make to the algorithm parameters? The stack includes Python and several other technologies. I'm on CentOS using the latest version of Python. Thanks in advance!