Regex for Extracting HTML Tags in Go - implementing Nested Tags

👀 Views: 363 💬 Answers: 1 📅 Created: 2025-08-24

Hey everyone, I'm running into an issue that's driving me crazy... I'm trying to extract specific HTML tags from a string using regex in Go. The scenario I'm working with is with nested tags, which seems to be causing my regex to unexpected result. My goal is to extract all `<div>` tags, including their contents, but I need to ensure that I'm capturing them accurately even when they are nested within one another. Here's the regex pattern I've been using: ```go pattern := `<div.*?>(.*?)</div>` ``` I've tried using the `(?s)` flag to allow `.` to match newlines, but I'm still getting unexpected matches or incomplete captures when the `<div>` tags are nested. For example, in the following input: ```html <div class="outer"> <div class="inner"> Content here </div> </div> ``` I'm only getting "Content here" as a match, and I need the outer `<div>` to capture everything, including the inner `<div>`. I also experimented with a more complex regex: ```go pattern := `<div.*?>(?s:(.*?)</div>)` ``` This still doesn't seem to provide the correct results. I'm seeing errors where it sometimes matches the inner `<div>` incorrectly, leading to partial captures. Could someone guide to refine my regex pattern to properly extract all nested `<div>` tags, including their complete contents? I'm using Go version 1.19 and the standard `regexp` package for this. Any insights or suggestions would be greatly appreciated! This is part of a larger service I'm building. I'd really appreciate any guidance on this. I'd really appreciate any guidance on this. I'm coming from a different tech stack and learning Go. I appreciate any insights!