Pandas DataFrame String Manipulation with Complex Regex: Unexpected Results When Removing Substrings
I'm stuck on something that should probably be simple... I've been banging my head against this for hours. I'm working with an scenario while trying to remove specific substrings from a column in my Pandas DataFrame using regex. I have a DataFrame with a column named 'text' that contains various strings, and I want to strip out any occurrence of a pattern that includes a word followed by a number. For example, I want to remove occurrences like 'foo123', 'bar456', etc. Here's the code I've been trying: ```python import pandas as pd data = {'text': ['foo123 is here', 'This is a test bar456', 'No matches here']} df = pd.DataFrame(data) # Attempting to remove words followed by numbers pattern = r'\w+\d+' df['text'] = df['text'].str.replace(pattern, '', regex=True) ``` However, the output still includes the original strings with the patterns. When I print the DataFrame, it shows: ``` 0 foo123 is here 1 This is a test bar456 2 No matches here Name: text, dtype: object ``` I expected to see the strings cleaned up, at least getting rid of 'foo123' and 'bar456'. Also, I'm running this on Pandas version 1.3.3. I tried adjusting the regex pattern, but nothing seems to work. Am I missing something in the regex syntax or the way I'm applying it? Any insights or guidance would be much appreciated! What am I doing wrong? My development environment is Ubuntu 20.04.