CodexBloom - Programming Q&A Platform

Regex scenarios to Capture Nested HTML Tags in JavaScript - Misalignment with Non-Greedy Matching

👀 Views: 4 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-03
regex javascript html JavaScript

I'm testing a new approach and Hey everyone, I'm running into an issue that's driving me crazy..... I'm trying to extract specific nested HTML tags using regex in my JavaScript application, but I'm running into issues with the greedy vs. non-greedy behavior of my patterns. I'm targeting a structure like this: ```html <div> <span>Text 1</span> <div> <span>Text 2</span> </div> </div> ``` I want to capture everything inside the outer `<div>` tag without including the inner `<div>`. My initial approach was: ```javascript const htmlString = '<div><span>Text 1</span><div><span>Text 2</span></div></div>'; const regex = /<div>(.+)<\/div>/; const match = htmlString.match(regex); console.log(match); ``` The question I'm working with is that the regex captures everything between the first `<div>` and the last `</div>`, including the inner content. I expected to get just `'<span>Text 1</span><div><span>Text 2</span></div>'`, but instead, I'm getting the entire string. I tried modifying it to use a non-greedy match like this: ```javascript const regex = /<div>(.+?)<\/div>/; ``` Now, I get `null` as the output. The reason I believe is that it fails to account for the nested structure properly. I've also tried using `[^<]*`, but that seems too restrictive and doesn't work well with my needs. I'm aware that regex isn't ideal for parsing HTML and that using a DOM parser would be better, but I need a regex solution for compatibility with legacy systems that only accept regex patterns. Any suggestions on how I could rewrite my regex to properly capture the outer HTML without getting the nested content? I'm using Node.js v14.17.0 and the `RegExp` constructor for my regex patterns. Thanks in advance for your help! I'm working on a CLI tool that needs to handle this. For context: I'm using Javascript on Ubuntu. Any ideas what could be causing this? I'm using Javascript 3.9 in this project. Any examples would be super helpful.