CodexBloom - Programming Q&A Platform

Issues with XML Entity Resolution in Java - Special Characters Not Parsing Correctly

👀 Views: 43 đŸ’Ŧ Answers: 1 📅 Created: 2025-06-11
xml java documentbuilder Java

I'm refactoring my project and I'm performance testing and I am working on parsing an XML document using Java's built-in `DocumentBuilder` and `EntityResolver`, but I'm running into issues with special characters not being recognized correctly... My XML contains special characters like `&`, `<`, and `>` which are encoded as `&amp;`, `&lt;`, and `&gt;`. However, when I parse the XML, these characters appear unescaped in the resulting `Document`. My setup is using Java 11 with the following code: ```java import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; import org.xml.sax.EntityResolver; import org.xml.sax.InputSource; import java.io.StringReader; public class XmlParser { public static void main(String[] args) throws Exception { String xml = "<root><value>&amp; &lt; &gt;</value></root>"; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); builder.setEntityResolver(new MyEntityResolver()); Document doc = builder.parse(new InputSource(new StringReader(xml))); System.out.println(doc.getDocumentElement().getTextContent()); } } class MyEntityResolver implements EntityResolver { @Override public InputSource resolveEntity(String publicId, String systemId) { return null; // No external DTD, so return null } } ``` Despite setting up a custom `EntityResolver`, the special characters are being output as plain text instead of being parsed correctly. The output I see is `& < >` instead of expected `& < >`. I've tried ensuring that the XML is well-formed and valid, yet the issue persists. Additionally, I have tried using different input XML strings but the result remains the same. Is there a specific configuration or method I might be missing to properly handle entity resolution in this context? Any insights or suggestions would be greatly appreciated! I'd be grateful for any help. For context: I'm using Java on CentOS. Thanks, I really appreciate it!