CodexBloom - Programming Q&A Platform

Regex for Extracting Versions from a Complex String - Unexpected Matches

👀 Views: 14 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-03
regex javascript node.js JavaScript

I'm getting frustrated with I'm trying to extract version numbers from a string that contains various formatted data... The string might look something like this: `"Release 2.0.1 and Fix 2.0.1-alpha, also see Build 3.0.0"`. My goal is to match the version numbers, which could be in the form of `X.Y.Z` or `X.Y.Z-prerelease`. Initially, I wrote the following regex pattern: ```regex \b(\d+)\.(\d+)\.(\d+)(?:-(\w+))?\b ``` Using this regex, I attempted to find all matches in JavaScript like so: ```javascript const str = "Release 2.0.1 and Fix 2.0.1-alpha, also see Build 3.0.0"; const regex = /\b(\d+)\.(\d+)\.(\d+)(?:-(\w+))?\b/g; const matches = str.match(regex); console.log(matches); ``` However, the output gives me `['2.0.1', '2.0.1-alpha', '3.0.0']`, which is expected, but I also need to ensure that it captures only non-repeated versions without any overlaps (for example, it shouldn't capture the same version in different formats). I tried to enforce uniqueness by converting matches into a Set before returning them: ```javascript const uniqueMatches = [...new Set(matches)]; console.log(uniqueMatches); ``` But I am still having issues when the string contains more complex versioning or featurized versions like `2.0.1-feature1` or if it has an extra dot like `2.0.1.5`. The regex seems to unexpected result in those cases. Is there a way to adjust my regex pattern to handle these edge cases more effectively? I'm using regex in a Node.js environment (version 14.17.0), and I want to ensure that I cover the variations without bringing in false positives. Any help would be greatly appreciated! Thanks in advance! This is my first time working with Javascript 3.10. What's the best practice here?