Regex Alternative for Safe HTML Tag Stripping in Ruby - Unexpected Tag Retention

👀 Views: 6 💬 Answers: 1 📅 Created: 2025-06-08

I'm trying to debug I'm refactoring my project and This might be a silly question, but I'm trying to strip specific HTML tags from a string in Ruby, but my regex is not behaving as I expected. I want to remove `<script>` and `<style>` tags along with their contents from the given HTML string to prevent any JS or CSS execution. However, when I use the following regex pattern: ```ruby html_string = '<div>Hello World!<script>alert("test");</script><style>body {background: red;}</style></div>' cleaned_string = html_string.gsub(/<script.*?>.*?<\/script>|<style.*?>.*?<\/style>/m, '') ``` I expected `cleaned_string` to return `<div>Hello World!</div>`, but instead, I get `<div>Hello World!</div><script>alert("test");</script><style>body {background: red;}</style>`. It seems that the regex is not matching properly for the contents of the tags and retaining them instead. I've tried various combinations of greedy and non-greedy quantifiers but need to seem to get it to work correctly. I'm using Ruby 3.0.0. Could the question stem from the way I'm using the `gsub` method or the regex pattern itself? Any guidance on how to properly implement this would be greatly appreciated! This is part of a larger CLI tool I'm building. I'm working in a Windows 10 environment. This is part of a larger web app I'm building. Could someone point me to the right documentation?