CodexBloom - Programming Q&A Platform

Regex Not Extracting Version Numbers from Mixed Strings in Python - Need guide with Overlapping Matches

👀 Views: 35 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-09
regex python string-manipulation Python

I'm updating my dependencies and Quick question that's been bugging me - I've looked through the documentation and I'm still confused about I'm trying to extract version numbers from a set of strings that might contain additional metadata... For instance, I have strings like 'Service v1.0.3 (Build 250)' and 'Service v1.0.3-beta (2023)', and I want to capture the version number while ignoring everything else. My current regex pattern is as follows: ```python import re strings = [ 'Service v1.0.3 (Build 250)', 'Service v1.0.3-beta (2023)', 'Service v2.1.0', 'Service v2.1.0-alpha (Release candidate)' ] pattern = r'v?(\d+)\.(\d+)\.(\d+)(?:[-\.\w]*)?' for s in strings: match = re.search(pattern, s) if match: print(f'Extracted version: {match.group()}') else: print('No match found') ``` However, when I run this code, it only captures '1.0.3' but fails on '1.0.3-beta' and other variations. I expected it to handle the version number regardless of the presence of additional descriptive tags. I also noticed that the `match.group()` method returns additional content in some cases, and I want to strictly get the version number without any extra characters. I tried using `` word boundaries and other variations of the regex, but I keep hitting edge cases where it captures undesired portions or misses numbers altogether. Can anyone suggest a more efficient regex pattern that correctly handles these scenarios, or point out what I might be doing wrong? I'm using Python 3.9.1 and the regex seems to be quite sluggish, especially with longer strings. Thanks in advance! This is part of a larger application I'm building. What am I doing wrong? Is there a simpler solution I'm overlooking?