Unexpected NaNs in TensorFlow model training when using Adam optimizer
I'm not sure how to approach I'm not sure how to approach I've been banging my head against this for hours. I'm training a neural network using TensorFlow 2.8.0 and I've run into an scenario where the loss suddenly becomes NaN after a few epochs. My model is quite simple, consisting of a few dense layers, and I'm using the Adam optimizer with a learning rate of 0.001. Hereβs a snippet of how I'm defining my model: ```python import tensorflow as tf from tensorflow.keras import layers, models model = models.Sequential([ layers.Dense(64, activation='relu', input_shape=(input_shape,)), layers.Dense(64, activation='relu'), layers.Dense(num_classes, activation='softmax') ]) optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy']) ``` I'm feeding the model data that has been normalized, but I noticed that my input data has some extreme outliers. I tried using `MinMaxScaler` to scale my features between 0 and 1: ```python from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() X_scaled = scaler.fit_transform(X) ``` However, even with the scaling, the loss function diverges, and I get warnings: ``` UserWarning: WARNING - Loss is NaN. Model performance will be undefined. ``` Iβve tried playing around with the batch size and the learning rate, but I still see NaNs appearing. I also enabled gradient clipping to help mitigate the scenario: ```python optimizer = tf.keras.optimizers.Adam(learning_rate=0.001, clipnorm=1.0) ``` But the NaNs keep coming back. I've confirmed that my labels are correct and that there isn't any division by zero happening in the model. What could be causing this, and how can I fix these NaN values during training? Any help or advice on debugging this would be appreciated! What am I doing wrong? I'm using Python LTS in this project. Any ideas what could be causing this? I've been using Python for about a year now. Is there a simpler solution I'm overlooking? This issue appeared after updating to Python LTS. Thanks for taking the time to read this!