Issues with Generative AI Model Fine-tuning in TensorFlow 2.6 - Unexpected NaN Loss

👀 Views: 5 💬 Answers: 1 📅 Created: 2025-06-07

tensorflow generative-ai machine-learning Python

I'm collaborating on a project where I've encountered a strange issue with I'm relatively new to this, so bear with me. I'm currently trying to fine-tune a pre-trained generative AI model using TensorFlow 2.6, but I'm running into an issue where the loss function becomes NaN after a few epochs. I've set up my model with a custom loss function and an Adam optimizer with a learning rate of 0.0001, but for some reason, the training seems to diverge. Here's the relevant code snippet: ```python import tensorflow as tf from tensorflow.keras import layers, models # Load a pre-trained model base_model = models.load_model('path_to_pretrained_model') # Create a new model model = models.Sequential([ base_model, layers.Dense(256, activation='relu'), layers.Dense(vocab_size, activation='softmax') ]) # Custom loss function def custom_loss(y_true, y_pred): return tf.reduce_mean(tf.keras.losses.categorical_crossentropy(y_true, y_pred)) # Compile model model.compile(loss=custom_loss, optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), metrics=['accuracy']) # Train the model model.fit(train_dataset, epochs=10, steps_per_epoch=100) ``` I've double-checked the input data for NaN values, and I've also tried normalizing the inputs. I employed gradient clipping as a precaution by adding `clipnorm=1.0` to the optimizer, but it hasn't resolved the issue. The loss starts at a reasonable value but quickly escalates to NaN, and I see the following warning in the logs: ``` WARNING:tensorflow:Gradients do not exist for variables [<tf.Variable 'dense_1/kernel:0' shape=(128, 256) dtype=float32>] when optimizing. ``` Could this be related to the architecture of the model or the way I'm handling the data? Any insights on how to prevent the loss from becoming NaN would be greatly appreciated! What am I doing wrong? What's the best practice here? Any help would be greatly appreciated! Any feedback is welcome! I'm coming from a different tech stack and learning Python.