CodexBloom - Programming Q&A Platform

Error while fine-tuning a GenAI model with TensorFlow 2.11 and Transformers 4.20

πŸ‘€ Views: 86 πŸ’¬ Answers: 1 πŸ“… Created: 2025-06-07
tensorflow transformers fine-tuning Python

I'm learning this framework and I'm testing a new approach and I'm currently facing issues while trying to fine-tune a GenAI model using TensorFlow 2.11 and the Hugging Face Transformers library (version 4.20)... I set up my training script to load a pre-trained model and a custom dataset, but I'm encountering a `ValueError` indicating a shape mismatch during training. Specifically, the error message reads: ``` ValueError: Shapes (32, 1) and (32, 6) are incompatible ``` The model I'm using is a GPT-2 configuration, and my dataset contains six unique classes that I'm trying to classify into. Here’s a snippet of the code where I prepare the dataset and the model: ```python from transformers import GPT2Tokenizer, TFGPT2LMHeadModel from tensorflow.keras.preprocessing.sequence import pad_sequences import tensorflow as tf # Load tokenizer and model tokenizer = GPT2Tokenizer.from_pretrained('gpt2') model = TFGPT2LMHeadModel.from_pretrained('gpt2') # Prepare dataset texts = ["sample text 1", "sample text 2", "sample text 3"] # Example texts labels = [0, 1, 2] # Corresponding labels for classification inputs = tokenizer(texts, return_tensors='tf', padding=True, truncation=True) # One-hot encode labels labels_one_hot = tf.keras.utils.to_categorical(labels, num_classes=6) # Create a tf.data.Dataset dataset = tf.data.Dataset.from_tensor_slices((inputs['input_ids'], labels_one_hot)) # Batch and shuffle the dataset train_dataset = dataset.shuffle(1000).batch(32) # Compile model model.compile(optimizer='adam', loss='categorical_crossentropy') ``` I've checked the shapes of `inputs['input_ids']` and `labels_one_hot`, and I see that `inputs['input_ids']` has a shape of `(32, 20)` (where 20 is the max sequence length), while `labels_one_hot` has a shape of `(32, 6)`. However, when I call `model.fit(train_dataset, epochs=3)`, I get the shape mismatch error. I suspect that it might be related to how I've set up the model for classification tasks since GPT-2 is generally used for text generation. I've also tried using a `tf.keras.Model` subclassing to modify the output layer, but that leads to another error about the output shape not matching the expected dimensions. Has anyone encountered this error while using the Hugging Face Transformers with TensorFlow, and how can I properly configure the model for my classification task? I appreciate any insights! Is there a simpler solution I'm overlooking?