CodexBloom - Programming Q&A Platform

Regex Not Matching URLs with Subdirectory Paths in Python - Need guide with Complex Patterns

👀 Views: 90 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-12
regex python url-matching Python

I've tried everything I can think of but I'm following best practices but I'm trying to match URLs that include subdirectory paths using regex in Python, but I'm running into issues with certain edge cases... My regex pattern is supposed to capture URLs like `https://example.com/path/to/resource` and `http://example.org/another/path/`. However, when I test it, it fails to capture URLs when there are additional query parameters or fragments like `https://example.com/path/to/resource?query=1#section`. Here's the regex pattern I currently have: ```python import re pattern = r'^(https?://)?(www\.)?example\.com/(\w+/)*\w+/?(\?.*)?(#.*)?$' ``` I've tried adjusting the groups and using `.*` to be more inclusive, but I keep getting false negatives on URLs with query parameters and fragments. For instance, when I run: ```python urls = [ 'https://example.com/path/to/resource', 'http://example.org/another/path/', 'https://example.com/path/to/resource?query=1#section', 'https://example.com/path/', 'https://example.com/' ] for url in urls: if re.match(pattern, url): print(f'Matched: {url}') else: print(f'Not matched: {url}') ``` I get the following output: ``` Matched: https://example.com/path/to/resource Not matched: http://example.org/another/path/ Matched: https://example.com/path/to/resource?query=1#section Matched: https://example.com/path/ Matched: https://example.com/ ``` As you can see, the URL with a different domain isn't matching, which is expected. However, I'm unsure how to modify my regex pattern so that it can handle various structures more gracefully. Any guidance on how to refine this regex would be greatly appreciated! I'm working with Python in a Docker container on macOS.