Why does my TensorFlow model train slower with mixed precision in 2.8.0?

👀 Views: 57 💬 Answers: 1 📅 Created: 2025-06-10

tensorflow mixed-precision performance Python

I'm trying to debug I'm not sure how to approach I've been struggling with this for a few days now and could really use some help... I've been experimenting with mixed precision training in TensorFlow 2.8.0 for a custom CNN model, hoping to increase performance on a NVIDIA RTX 3060. I'm following the recommended practices and have set up mixed precision using the `tf.keras.mixed_precision` API. However, I noticed that my training speed has actually decreased compared to using full precision. My model's training loop looks like this: ```python import tensorflow as tf from tensorflow.keras import layers, models # Set mixed precision policy = tf.keras.mixed_precision.Policy('mixed_float16') tf.keras.mixed_precision.set_global_policy(policy) # Build a simple CNN model model = models.Sequential([ layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)), layers.MaxPooling2D((2, 2)), layers.Conv2D(64, (3, 3), activation='relu'), layers.MaxPooling2D((2, 2)), layers.Flatten(), layers.Dense(64, activation='relu'), layers.Dense(10, activation='softmax') ]) # Compile the model optimizer = tf.keras.mixed_precision.LossScaleOptimizer(tf.keras.optimizers.Adam()) model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Train the model model.fit(train_dataset, epochs=5, validation_data=val_dataset) ``` When I profile the training, it shows that the GPU utilization is significantly lower than expected, and I often see the following warning: ``` WARNING:tensorflow:Gradients do not exist for variables [<tf.Variable 'conv2d/kernel:0' shape=(3, 3, 1, 32) dtype=float32>], hence the loss is not being returned. ``` I initially thought this could be due to the loss scaling not working correctly. I've tried modifying the loss scale to both 'dynamic' and a fixed value, but it didn't improve the situation. I also made sure that my input data is being correctly cast to the right data type (float16) before feeding it into the model. Here’s how I'm handling the input data: ```python def preprocess_image(image): image = tf.image.convert_image_dtype(image, tf.float16) return image ``` I've also checked if the question might be related to the TensorFlow version, but I’m on 2.8.0, which should support mixed precision well. Is there something I might be missing in my setup? Any tips on diagnosing this scenario or optimizing performance further would be greatly appreciated! For context: I'm using Python on Ubuntu. What am I doing wrong? Any examples would be super helpful. This is happening in both development and production on CentOS. Any examples would be super helpful.