Regex for Extracting HTML Tags with Attributes - Unexpected Matches in JavaScript

👀 Views: 15 💬 Answers: 1 📅 Created: 2025-06-06

I'm attempting to set up Can someone help me understand I'm testing a new approach and I'm trying to extract all HTML tags along with their attributes from a string using regex in JavaScript, but I'm encountering unexpected matches. My goal is to capture tags like `<input type="text" value="example">` and ignore any self-closing tags like `<img src="image.jpg" />` or any tags that are not structured correctly. Here's the regex I came up with: ```javascript const regex = /<([a-zA-Z]+)([^>]*)>/g; ``` Initially, I thought this would work, but I'm getting matches that include `<img />`, which I want to avoid. When I test it on the string: ```javascript const htmlString = '<div><input type="text" value="example"><img src="image.jpg" /></div>'; ``` I use the regex with: ```javascript const matches = htmlString.match(regex); console.log(matches); ``` The output is `['<div>', '<input type="text" value="example">', '<img src="image.jpg" />']`, which includes the `<img>` tag. I need to ensure only tags that are not self-closing get matched. I've also tried modifying the regex to: ```javascript const regex = /<([a-zA-Z]+)([^>]*)>(?<!/)/g; ``` But that resulted in no matches at all. I'm not sure how to construct the regex to correctly capture these tags while filtering out the self-closing ones. Any advice or suggestions on how to refine this regex to achieve the expected results would be appreciated! Thanks! The project is a mobile app built with Javascript. This is happening in both development and production on Ubuntu 20.04. Am I approaching this the right way? I'm working in a Ubuntu 22.04 environment. What's the correct way to implement this?