CodexBloom - Programming Q&A Platform

Regex for Extracting Email Addresses from HTML Content - guide with Edge Cases in JavaScript

👀 Views: 46 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-08
regex javascript html JavaScript

I'm stuck trying to I'm trying to extract email addresses from a block of HTML content using a regex pattern in JavaScript, but I'm working with unexpected behavior with certain edge cases. My current regex looks like this: ```javascript const htmlContent = `Contact us at support@example.com or sales@my-site.org. Follow us on social media!`; const regex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g; const matches = htmlContent.match(regex); console.log(matches); ``` This works fine for standard email formats, but it fails when I have email addresses enclosed in angle brackets, like `<info@example.com>`. When I run the code, it only returns: ``` [ 'support@example.com', 'sales@my-site.org' ] ``` I expected it to also match `info@example.com` when it's enclosed as `<info@example.com>`, but it doesn't. I've tried modifying the regex to handle the brackets, like this: ```javascript const regexWithBrackets = /<\s*([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})\s*>/g; const matchesWithBrackets = htmlContent.match(regexWithBrackets); console.log(matchesWithBrackets); ``` But with this change, I only get `null` as a result. Could anyone advise on how to modify my regex to correctly extract email addresses regardless of whether they are surrounded by angle brackets or not? Additionally, are there any best practices for regex when dealing with HTML content? I'm using Node.js version 14.17.0. Thanks in advance! For reference, this is a production mobile app. Any examples would be super helpful.