CWE-116: Improper Encoding or Escaping of Output

Description

The product prepares a structured message for communication with another component, but encoding or escaping of the data is either missing or done incorrectly. As a result, the intended structure of the message is not preserved.

Improper encoding or escaping can allow attackers to change the commands that are sent to another component, inserting malicious commands instead. Most products follow a certain protocol that uses structured messages for communication between components, such as queries or commands. These structured messages can contain raw data interspersed with metadata or control information. For example, "GET /index.html HTTP/1.1" is a structured message containing a command ("GET") with a single argument ("/index.html") and metadata about which protocol version is being used ("HTTP/1.1"). If an application uses attacker-supplied inputs to construct a structured message without properly encoding or escaping, then the attacker could insert special characters that will cause the data to be interpreted as control information or metadata. Consequently, the component that receives the output will perform the wrong operations, or otherwise interpret the data incorrectly.

Potential Impact

Integrity

Modify Application Data

Integrity, Confidentiality, Availability, Access Control

Execute Unauthorized Code or Commands

Confidentiality

Bypass Protection Mechanism

Demonstrative Examples

This code displays an email address that was submitted as part of a form.

Bad

<% String email = request.getParameter("email"); %>...Email Address: <%= email %>

The value read from the form parameter is reflected back to the client browser without having been encoded prior to output, allowing various XSS attacks (CWE-79).

Consider a chat application in which a front-end web application communicates with a back-end server. The back-end is legacy code that does not perform authentication or authorization, so the front-end must implement it. The chat protocol supports two commands, SAY and BAN, although only administrators can use the BAN command. Each argument must be separated by a single space. The raw inputs are URL-encoded. The messaging protocol allows multiple commands to be specified on the same line if they are separated by a "|" character.

First let's look at the back end command processor code

Bad

$inputString = readLineFromFileHandle($serverFH);
                     
                     # generate an array of strings separated by the "|" character.
                     @commands = split(/\|/, $inputString);
                     foreach $cmd (@commands) {
                        
                        # separate the operator from its arguments based on a single whitespace
                        ($operator, $args) = split(/ /, $cmd, 2);
                        $args = UrlDecode($args);if ($operator eq "BAN") {ExecuteBan($args);}elsif ($operator eq "SAY") {ExecuteSay($args);}}

The front end web application receives a command, encodes it for sending to the server, performs the authorization check, and sends the command to the server.

Bad

$inputString = GetUntrustedArgument("command");($cmd, $argstr) = split(/\s+/, $inputString, 2);
                     
                     # removes extra whitespace and also changes CRLF's to spaces
                     $argstr =~ s/\s+/ /gs;
                     $argstr = UrlEncode($argstr);if (($cmd eq "BAN") && (! IsAdministrator($username))) {die "Error: you are not the admin.\n";}
                     
                     # communicate with file server using a file handle
                     $fh = GetServerFileHandle("myserver");
                     print $fh "$cmd $argstr\n";

It is clear that, while the protocol and back-end allow multiple commands to be sent in a single request, the front end only intends to send a single command. However, the UrlEncode function could leave the "|" character intact. If an attacker provides:

Attack

SAY hello world|BAN user12

then the front end will see this is a "SAY" command, and the $argstr will look like "hello world | BAN user12". Since the command is "SAY", the check for the "BAN" command will fail, and the front end will send the URL-encoded command to the back end:

Result

SAY hello%20world|BAN%20user12

The back end, however, will treat these as two separate commands:

Result

SAY hello worldBAN user12

Notice, however, that if the front end properly encodes the "|" with "%7C", then the back end will only process a single command.

This example takes user input, passes it through an encoding scheme, then lists the contents of the user's home directory based on the user name.

Bad

sub GetUntrustedInput {return($ARGV[0]);}
                     sub encode {my($str) = @_;$str =~ s/\&/\&amp;/gs;$str =~ s/\"/\&quot;/gs;$str =~ s/\'/\&apos;/gs;$str =~ s/\</\&lt;/gs;$str =~ s/\>/\&gt;/gs;return($str);}
                     sub doit {my $uname = encode(GetUntrustedInput("username"));print "<b>Welcome, $uname!</b><p>\n";system("cd /home/$uname; /bin/ls -l");
                     }

The programmer attempts to encode dangerous characters, however the denylist for encoding is incomplete (CWE-184) and an attacker can still pass a semicolon, resulting in a chain with OS command injection (CWE-78).

Additionally, the encoding routine is used inappropriately with command execution. An attacker doesn't even need to insert their own semicolon. The attacker can instead leverage the encoding routine to provide the semicolon to separate the commands. If an attacker supplies a string of the form:

Attack

' pwd

then the program will encode the apostrophe and insert the semicolon, which functions as a command separator when passed to the system function. This allows the attacker to complete the command injection.

Mitigations & Prevention

Architecture and Design

Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, consider using the ESAPI Encoding control [REF-45] or a similar tool, library, or framework. These will help the programmer encode outputs in a manner less prone to error. Alternately, use built-in functions, but consider using wrappers in case those functions are discovered to have a vulnerability.

Architecture and Design

If available, use structured mechanisms that automatically enforce the separation between data and code. These mechanisms may be able to provide the relevant quoting, encoding, and validation automatically, instead of relying on the developer to provide this capability at every point where output is generated. For example, stored procedures can enforce database query structure and reduce the likelihood of SQL injection.

Architecture and DesignImplementation

Understand the context in which your data will be used and the encoding that will be expected. This is especially important when transmitting data between different components, or when generating outputs that can contain multiple encodings at the same time, such as web pages or multi-part mail messages. Study all expected communication protocols and data representations to determine the required encoding strategies.

Architecture and Design

In some cases, input validation may be an important strategy when output encoding is not a complete solution. For example, you may be providing the same output that will be processed by multiple consumers that use different encodings or representations. In other cases, you may be required to allow user-supplied input to contain control information, such as limited HTML tags that support formatting in a wiki or bulletin board. When this type of requirement must be met, use an extremely strict all

Architecture and Design

Use input validation as a defense-in-depth measure to reduce the likelihood of output encoding errors (see CWE-20).

Requirements

Fully specify which encodings are required by components that will be communicating with each other.

Implementation

When exchanging data between components, ensure that both components are using the same character encoding. Ensure that the proper encoding is applied at each interface. Explicitly set the encoding you are using whenever the protocol allows you to do so.

Detection Methods

Automated Static Analysis Moderate — This weakness can often be detected using automated static analysis tools. Many modern tools use data flow analysis or constraint-based techniques to minimize the number of false positives.
Automated Dynamic Analysis — This weakness can be detected using dynamic tools and techniques that interact with the software using large test suites with many diverse inputs, such as fuzz testing (fuzzing), robustness testing, and fault injection. The software's operation may slow down, but it should not become unstable, crash

Real-World CVE Examples

CVE ID	Description
CVE-2021-41232	Chain: authentication routine in Go-based agile development product does not escape user name (CWE-116), allowing LDAP injection (CWE-90)
CVE-2008-4636	OS command injection in backup software using shell metacharacters in a filename; correct behavior would require that this filename could not be changed.
CVE-2008-0769	Web application does not set the charset when sending a page to a browser, allowing for XSS exploitation when a browser chooses an unexpected encoding.
CVE-2008-0005	Program does not set the charset when sending a page to a browser, allowing for XSS exploitation when a browser chooses an unexpected encoding.
CVE-2008-5573	SQL injection via password parameter; a strong password might contain "&"
CVE-2008-3773	Cross-site scripting in chat application via a message subject, which normally might contain "&" and other XSS-related characters.
CVE-2008-0757	Cross-site scripting in chat application via a message, which normally might be allowed to contain arbitrary content.

Taxonomy Mappings

WASC: 22 — Improper Output Handling
The CERT Oracle Secure Coding Standard for Java (2011): IDS00-J — Sanitize untrusted data passed across a trust boundary
The CERT Oracle Secure Coding Standard for Java (2011): IDS05-J — Use a subset of ASCII for file and path names
SEI CERT Oracle Coding Standard for Java: IDS00-J — Prevent SQL injection
SEI CERT Perl Coding Standard: IDS33-PL — Sanitize untrusted data passed across a trust boundary

Frequently Asked Questions

What is CWE-116?

CWE-116 (Improper Encoding or Escaping of Output) is a software weakness identified by MITRE's Common Weakness Enumeration. It is classified as a Class-level weakness. The product prepares a structured message for communication with another component, but encoding or escaping of the data is either missing or done incorrectly. As a result, the intended structure of t...

How can CWE-116 be exploited?

Attackers can exploit CWE-116 (Improper Encoding or Escaping of Output) to modify application data. This weakness is typically introduced during the Implementation, Operation phase of software development.

How do I prevent CWE-116?

Key mitigations include: Use a vetted library or framework that does not allow this weakness to occur or provides constructs that make this weakness easier to avoid. For example, consider using the ESAPI E

What is the severity of CWE-116?

CWE-116 is classified as a Class-level weakness (High abstraction). It has been observed in 7 real-world CVEs.

Description

Potential Impact

Integrity

Integrity, Confidentiality, Availability, Access Control

Confidentiality

Demonstrative Examples

Mitigations & Prevention

Detection Methods

Real-World CVE Examples

Related Weaknesses

CWE-707: Improper Neutralization

CWE-74: Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection')

Taxonomy Mappings

Frequently Asked Questions