TensorFlow 2.12: solution with tf.keras.layers.LSTM returning NaN during training with custom loss function
I'm writing unit tests and I'm currently working with an scenario while training my LSTM model in TensorFlow 2.12. Despite normalizing my input data, I keep working with NaN values in my loss during training. I'm using a custom loss function that involves a mean squared behavior calculation combined with a penalty term to prevent overfitting, but it seems to destabilize the training process. Here's my model configuration: ```python import tensorflow as tf from tensorflow.keras import layers, Model model = tf.keras.Sequential([ layers.LSTM(128, return_sequences=True, input_shape=(timesteps, features)), layers.LSTM(64), layers.Dense(1) ]) ``` And my custom loss function looks like this: ```python def custom_loss(y_true, y_pred): mse = tf.reduce_mean(tf.square(y_true - y_pred)) penalty = 0.01 * tf.reduce_mean(tf.square(y_pred)) # Regularization term return mse + penalty ``` I'm compiling the model with the following parameters: ```python model.compile(optimizer='adam', loss=custom_loss) ``` During training, I'm using the `model.fit` method like this: ```python history = model.fit(train_data, train_labels, epochs=50, batch_size=32) ``` However, after a few epochs, the loss value starts to return NaN, and I receive the warning: ``` `Invalid value encountered in reduce_mean` ``` I tried using different initializers for the LSTM layers and adjusted the learning rate of the optimizer, but the scenario continues. Can anyone guide to identify what might be going wrong or suggest any best practices to stabilize the training process in this scenario? I'm working on a mobile app that needs to handle this. Thanks in advance!