Unexpected NaN Loss When Using Mixed Precision Training in TensorFlow 2.12 with Custom Model
I'm stuck on something that should probably be simple. I'm working with an scenario with NaN loss values during training when using mixed precision in TensorFlow 2.12. I'm implementing a custom model using the Keras Functional API, and I've enabled mixed precision with the following lines: ```python import tensorflow as tf from tensorflow.keras import layers, models from tensorflow.keras.mixed_precision import experimental as mixed_precision policy = mixed_precision.Policy('mixed_float16') mixed_precision.set_policy(policy) ``` My model definition looks like this: ```python def create_model(): inputs = layers.Input(shape=(32, 32, 3)) x = layers.Conv2D(32, (3, 3), activation='relu')(inputs) x = layers.MaxPooling2D(pool_size=(2, 2))(x) x = layers.Flatten()(x) outputs = layers.Dense(10, activation='softmax', dtype='float32')(x) return models.Model(inputs, outputs) model = create_model() model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) ``` When I train the model, I receive NaN loss values after a few epochs despite using the `GradScaler` for scaling the loss. I've tried different learning rates and reduced the initial learning rate to 1e-5, but the scenario continues. Here's how I'm fitting the model: ```python train_dataset = ... # Replace with actual train dataset model.fit(train_dataset, epochs=10) ``` Additionally, I'm using the Adam optimizer with default settings. I've also checked the input data, and it's correctly normalized between 0 and 1. I suspect it might be related to the mixed precision setting, but I'm not sure how to debug this further or if I need to adjust my model architecture. Any insights on how to resolve the NaN loss scenario would be greatly appreciated! Is there a simpler solution I'm overlooking?