Issue with TensorFlow model not converging during training on imbalanced dataset
I'm stuck on something that should probably be simple. Hey everyone, I'm running into an issue that's driving me crazy. I'm currently training a TensorFlow model (version 2.10.0) for a binary classification problem, but I'm facing issues with the model not converging. The dataset is highly imbalanced, with around 90% of the samples belonging to the negative class and only 10% to the positive class. I've implemented class weights to address this imbalance, but the model still struggles to learn effectively. Here's the code snippet where I set up the model and compile it: ```python import tensorflow as tf from sklearn.model_selection import train_test_split from sklearn.utils import class_weight import numpy as np # Sample data X = np.random.rand(1000, 20) Y = np.array([0] * 900 + [1] * 100) # Split the dataset X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42) # Calculate class weights class_weights = class_weight.compute_class_weight('balanced', classes=np.unique(Y_train), y=Y_train) class_weights = {0: class_weights[0], 1: class_weights[1]} # Build the model model = tf.keras.models.Sequential([ tf.keras.layers.Dense(32, activation='relu', input_shape=(20,)), tf.keras.layers.Dense(16, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) # Train the model model.fit(X_train, Y_train, epochs=50, batch_size=32, class_weight=class_weights) ``` Despite using class weights, I observed that the model's accuracy on the training set remains around 52% even after 50 epochs, and there's little to no improvement on the validation set. I also tried using different optimizers like Adam and SGD with varying learning rates without much success. The training process does not seem to overfit, as the loss remains stagnantly high. Has anyone else faced a similar issue with imbalanced datasets in TensorFlow? Are there any best practices or additional techniques I could apply here to improve the model's learning? I'm also wondering if using techniques like SMOTE for oversampling could be beneficial in this scenario. Any insights would be greatly appreciated! For context: I'm using Python on Ubuntu. Any help would be greatly appreciated! This is part of a larger CLI tool I'm building. Am I missing something obvious?