Unexpected NaN values in model predictions using TensorFlow 2.8 with custom training loop
I'm currently working with an scenario where my TensorFlow 2.8 model is producing NaN values in the predictions after training for a few epochs. Iโve implemented a custom training loop for a regression task, and despite normalizing my data and using Adam optimizer with a learning rate of 0.001, I still encounter this question. Hereโs a snippet of my training loop: ```python import tensorflow as tf import numpy as np # Sample data X_train = np.random.rand(100, 10) Y_train = np.random.rand(100) # Define the model model = tf.keras.Sequential([ tf.keras.layers.Dense(64, activation='relu', input_shape=(10,)), tf.keras.layers.Dense(1) ]) optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) # Custom training loop for epoch in range(10): with tf.GradientTape() as tape: predictions = model(X_train, training=True) loss = tf.reduce_mean(tf.keras.losses.mean_squared_error(Y_train, predictions)) gradients = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(gradients, model.trainable_variables)) print(f'Epoch {epoch + 1}, Loss: {loss.numpy()}') ``` Despite the loss appearing reasonable during training, I noticed that the predictions started becoming NaN after a few epochs. I've checked the data for NaN or infinite values, and everything seems fine. I also tried reducing the learning rate to 0.0001, but that didn't help either. Could this be related to exploding gradients, or is there something else I might be overlooking in the training process? Any insights would be greatly appreciated!