CodexBloom - Programming Q&A Platform

Regex Not Capturing Alphanumeric Strings with Optional Special Characters in Python

👀 Views: 0 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-12
regex python string-manipulation Python

I'm sure I'm missing something obvious here, but I'm relatively new to this, so bear with me... I've been banging my head against this for hours. I'm trying to extract alphanumeric strings from a text that may also include optional special characters such as hyphens and underscores. For instance, I want to match strings like `abc123`, `abc-123`, and `abc_123`, but not strings like `abc.123` or `123!abc`. I've tried the following regex pattern: ```python import re text = "Here are some codes: abc123, abc-123, abc_123, abc.123, and 123!abc" pattern = r'[\w-]+' # My regex pattern matches = re.findall(pattern, text) print(matches) ``` When I run this, I get the output: ``` ['abc123', 'abc-123', 'abc_123', 'abc', '123'] ``` As you can see, it captures all words including `abc` and `123` which I don't want. Additionally, it also does not exclude invalid patterns correctly. I tried modifying the pattern to `r'[a-zA-Z0-9_-]+'`, but that didn't help either. What regex pattern should I use to ensure I only capture the entire alphanumeric strings that may contain hyphens or underscores, but nothing else? I'm using Python 3.9, and I need this to be efficient because the text can get quite large (up to several MB). Any pointers would be greatly appreciated! Is there a better approach? This is part of a larger web app I'm building. Am I missing something obvious? Could someone point me to the right documentation? I'm developing on macOS with Python.