Unexpected NaN Loss When Using Mixed Precision Training in TensorFlow 2.12 with Custom Model

👀 Views: 53 💬 Answers: 1 📅 Created: 2025-06-17

I'm stuck on something that should probably be simple. I'm working with an scenario with NaN loss values during training when using mixed precision in TensorFlow 2.12. I'm implementing a custom model using the Keras Functional API, and I've enabled mixed precision with the following lines: ```python import tensorflow as tf from tensorflow.keras import layers, models from tensorflow.keras.mixed_precision import experimental as mixed_precision policy = mixed_precision.Policy('mixed_float16') mixed_precision.set_policy(policy) ``` My model definition looks like this: ```python def create_model(): inputs = layers.Input(shape=(32, 32, 3)) x = layers.Conv2D(32, (3, 3), activation='relu')(inputs) x = layers.MaxPooling2D(pool_size=(2, 2))(x) x = layers.Flatten()(x) outputs = layers.Dense(10, activation='softmax', dtype='float32')(x) return models.Model(inputs, outputs) model = create_model() model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) ``` When I train the model, I receive NaN loss values after a few epochs despite using the `GradScaler` for scaling the loss. I've tried different learning rates and reduced the initial learning rate to 1e-5, but the scenario continues. Here's how I'm fitting the model: ```python train_dataset = ... # Replace with actual train dataset model.fit(train_dataset, epochs=10) ``` Additionally, I'm using the Adam optimizer with default settings. I've also checked the input data, and it's correctly normalized between 0 and 1. I suspect it might be related to the mixed precision setting, but I'm not sure how to debug this further or if I need to adjust my model architecture. Any insights on how to resolve the NaN loss scenario would be greatly appreciated! Is there a simpler solution I'm overlooking?