Regex scenarios to Capture HTML Tags with Attributes in Ruby - implementing Whitespace Handling

👀 Views: 46 💬 Answers: 1 📅 Created: 2025-06-09

I'm trying to figure out I've hit a wall trying to I'm working on a Ruby script to extract HTML tags along with their attributes from a string, but I'm running into issues with whitespace handling, which is causing my regex to unexpected result. My current regex pattern is as follows: ```ruby pattern = /<\s*(\w+)([^>]*)>\s*/ ``` This pattern is supposed to match tags like `<div class='container'>` and `<input type='text' />`, but I'm noticing that when there's extra whitespace, such as in `< div class='container' >`, it does not match as expected. I've tried modifying the regex to account for whitespace, but I’m still working with problems. Here's what I attempted: ```ruby pattern = /<\s*(\w+)\s*([^>]*)\s*>/ ``` Yet, when I test it with the following string: ```ruby html_string = "< div class='container' >Some text</ div> <input type='text' />" ``` I only get matches for the `<input>` tag, but not for the `<div>`. The output shows that the regex is not capturing the `<div>` tag correctly due to the extra spaces. I also tried adding `\s*` directly after `<` and before `>`, but it’s still not working as intended. I am using Ruby version 3.0.0, and I am looking for a solution that properly captures HTML tags with varying amounts of whitespace around them. Is there a more robust regex pattern that I can use here, or am I missing something in my implementation? For reference, this is a production service.