Regex Not Ignoring Comments in Markdown Parsing with Python - Trouble with Inline Code Blocks
I'm working on a personal project and I've been researching this but I'm converting an old project and I'm getting frustrated with I'm working on a Markdown parser in Python, and I'm having trouble getting my regex to ignore comments correctly when they appear inside inline code blocks... My goal is to extract text from Markdown while disregarding comments that look like `<!-- This is a comment -->`. However, if comments are inside inline code (like `` `code with <!-- comment -->` ``), I want the parser to treat it as part of the code. So far, I've tried the following regex to strip comments: ```python comment_pattern = r'<!--.*?-->' text_without_comments = re.sub(comment_pattern, '', markdown_text, flags=re.DOTALL) ``` This works fine for most cases, but when I have inline code, it fails to differentiate. For example: ```markdown Here is some text with `inline code <!-- this comment should be ignored -->` and it should capture `inline code <!-- this comment should be ignored -->` correctly. ``` After applying the regex, I get unexpected results like losing parts of the inline code because the comments get stripped out. I've also tried using a lookbehind assertion, but it doesn't seem to give the result I need. I also considered breaking down the parsing process into stages (like first capturing all code segments), but it adds complexity. Is there a cleaner way to handle this? What regex patterns or strategies should I use to properly ignore comments only when they are outside of inline code blocks? I'm using Python 3.10 and the `re` module for regex operations. Any guidance would be greatly appreciated! I'm working on a CLI tool that needs to handle this. Any ideas what could be causing this? I'm using Python 3.9 in this project. I'm working on a mobile app that needs to handle this. Am I missing something obvious? I've been using Python for about a year now. I'd love to hear your thoughts on this.