Unexpected Model Overfitting in TensorFlow with Early Stopping Callback
I'm working on a personal project and I'm a bit lost with I'm converting an old project and I'm prototyping a solution and I'm working on a project and hit a roadblock... I'm currently training a neural network using TensorFlow 2.9.1 for a classification task, and I've noticed that even with an early stopping callback, my model seems to be overfitting quite significantly after just a few epochs. I initially split my dataset into 70% training, 15% validation, and 15% test sets. I'm using the following model architecture: ```python import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout model = Sequential([ Dense(128, activation='relu', input_shape=(input_shape,)), Dropout(0.5), Dense(64, activation='relu'), Dense(num_classes, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) ``` I'm using the following training code with an early stopping callback: ```python from tensorflow.keras.callbacks import EarlyStopping early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True) history = model.fit( train_data, train_labels, epochs=50, validation_data=(val_data, val_labels), callbacks=[early_stopping] ) ``` Despite implementing early stopping, I can see that the validation loss continues to increase while the training loss decreases. The model's performance on the validation set is deteriorating after around epoch 5, which is concerning. The loss values from the training and validation sets are: ```python print(history.history['loss']) print(history.history['val_loss']) ``` The output shows that after epoch 4, the training loss dropped to around 0.2 while the validation loss rose to about 0.5. I tried increasing the dropout rate to 0.6, but it didn't seem to help. I'm also normalizing my input data, yet I continue to face this overfitting issue. Could there be an underlying problem with my model architecture or training process that I'm missing? Is there a recommended strategy to improve generalization without sacrificing model complexity? Thanks in advance! What am I doing wrong? This is happening in both development and production on Linux. Thanks, I really appreciate it! The project is a service built with Python. I'm open to any suggestions.