Pandas DataFrame String Manipulation with Complex Regex: Unexpected Results When Removing Substrings

👀 Views: 106 💬 Answers: 1 📅 Created: 2025-06-06

I'm stuck on something that should probably be simple... I've been banging my head against this for hours. I'm working with an scenario while trying to remove specific substrings from a column in my Pandas DataFrame using regex. I have a DataFrame with a column named 'text' that contains various strings, and I want to strip out any occurrence of a pattern that includes a word followed by a number. For example, I want to remove occurrences like 'foo123', 'bar456', etc. Here's the code I've been trying: ```python import pandas as pd data = {'text': ['foo123 is here', 'This is a test bar456', 'No matches here']} df = pd.DataFrame(data) # Attempting to remove words followed by numbers pattern = r'\w+\d+' df['text'] = df['text'].str.replace(pattern, '', regex=True) ``` However, the output still includes the original strings with the patterns. When I print the DataFrame, it shows: ``` 0 foo123 is here 1 This is a test bar456 2 No matches here Name: text, dtype: object ``` I expected to see the strings cleaned up, at least getting rid of 'foo123' and 'bar456'. Also, I'm running this on Pandas version 1.3.3. I tried adjusting the regex pattern, but nothing seems to work. Am I missing something in the regex syntax or the way I'm applying it? Any insights or guidance would be much appreciated! What am I doing wrong? My development environment is Ubuntu 20.04.