CWE-20: Improper Input Validation

Description

The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.

Input validation is a frequently-used technique for checking potentially dangerous inputs in order to ensure that the inputs are safe for processing within the code, or when communicating with other components. Input can consist of: Data can be simple or structured. Structured data can be composed of many nested layers, composed of combinations of metadata and raw data, with other simple or structured data. Many properties of raw data or metadata may need to be validated upon entry into the code, such as: Implied or derived properties of data must often be calculated or inferred by the code itself. Errors in deriving properties may be considered a contributing factor to improper input validation.

Potential Impact

Availability

DoS: Crash, Exit, or Restart, DoS: Resource Consumption (CPU), DoS: Resource Consumption (Memory)

Confidentiality

Read Memory, Read Files or Directories

Integrity, Confidentiality, Availability

Modify Memory, Execute Unauthorized Code or Commands

Demonstrative Examples

This example demonstrates a shopping interaction in which the user is free to specify the quantity of items to be purchased and a total is calculated.

Bad

...public static final double price = 20.00;int quantity = currentUser.getAttribute("quantity");double total = price * quantity;chargeUser(total);...

The user has no control over the price variable, however the code does not prevent a negative value from being specified for quantity. If an attacker were to provide a negative value, then the user would have their account credited instead of debited.

This example asks the user for a height and width of an m X n game board with a maximum dimension of 100 squares.

Bad

...#define MAX_DIM 100...
                     /* board dimensions */
                     
                     int m,n, error;board_square_t *board;printf("Please specify the board height: \n");error = scanf("%d", &m);if ( EOF == error ){die("No integer passed: Die evil hacker!\n");}printf("Please specify the board width: \n");error = scanf("%d", &n);if ( EOF == error ){die("No integer passed: Die evil hacker!\n");}if ( m > MAX_DIM || n > MAX_DIM ) {die("Value too large: Die evil hacker!\n");}board = (board_square_t*) malloc( m * n * sizeof(board_square_t));...

While this code checks to make sure the user cannot specify large, positive integers and consume too much memory, it does not check for negative values supplied by the user. As a result, an attacker can perform a resource consumption (CWE-400) attack against this program by specifying two, large negative values that will not overflow, resulting in a very large memory allocation (CWE-789) and possibly a system crash. Alternatively, an attacker can provide very large negative values which will cause an integer overflow (CWE-190) and unexpected behavior will follow depending on how the values are treated in the remainder of the program.

The following example shows a PHP application in which the programmer attempts to display a user's birthday and homepage.

Bad

$birthday = $_GET['birthday'];$homepage = $_GET['homepage'];echo "Birthday: $birthday<br>Homepage: <a href=$homepage>click here</a>"

The programmer intended for $birthday to be in a date format and $homepage to be a valid URL. However, since the values are derived from an HTTP request, if an attacker can trick a victim into clicking a crafted URL with <script> tags providing the values for birthday and / or homepage, then the script will run on the client's browser when the web server echoes the content. Notice that even if the programmer were to defend the $birthday variable by restricting input to integers and dashes, it would still be possible for an attacker to provide a string of the form:

Attack

2009-01-09--

If this data were used in a SQL statement, it would treat the remainder of the statement as a comment. The comment could disable other security-related logic in the statement. In this case, encoding combined with input validation would be a more useful protection mechanism.

Furthermore, an XSS (CWE-79) attack or SQL injection (CWE-89) are just a few of the potential consequences when input validation is not used. Depending on the context of the code, CRLF Injection (CWE-93), Argument Injection (CWE-88), or Command Injection (CWE-77) may also be possible.

The following example takes a user-supplied value to allocate an array of objects and then operates on the array.

Bad

private void buildList ( int untrustedListSize ){if ( 0 > untrustedListSize ){die("Negative value supplied for list size, die evil hacker!");}Widget[] list = new Widget [ untrustedListSize ];list[0] = new Widget();}

This example attempts to build a list from a user-specified value, and even checks to ensure a non-negative value is supplied. If, however, a 0 value is provided, the code will build an array of size 0 and then try to store a new Widget in the first location, causing an exception to be thrown.

Mitigations & Prevention

Architecture and Design

Consider using language-theoretic security (LangSec) techniques that characterize inputs using a formal language and build "recognizers" for that language. This effectively requires parsing to be a distinct layer that effectively enforces a boundary between raw input and internal data representations, instead of allowing parser code to be scattered throughout the program, where it could be subject to errors or inconsistencies that create weaknesses. [REF-1109] [REF-1110] [REF-1111]

Architecture and Design

Use an input validation framework such as Struts or the OWASP ESAPI Validation API. Note that using a framework does not automatically address all input validation problems; be mindful of weaknesses that could arise from misusing the framework itself (CWE-1173).

Architecture and DesignImplementation

Understand all the potential areas where untrusted inputs can enter the product, including but not limited to: parameters or arguments, cookies, anything read from the network, environment variables, reverse DNS lookups, query results, request headers, URL components, e-mail, files, filenames, databases, and any external systems that provide data to the application. Remember that such inputs may be obtained indirectly through API calls.

Implementation High

Assume all input is malicious. Use an "accept known good" input validation strategy, i.e., use a list of acceptable inputs that strictly conform to specifications. Reject any input that does not strictly conform to specifications, or transform it into something that does. When performing input validation, consider all potentially relevant properties, including length, type of input, the full range of acceptable values, missing or extra inputs, syntax, consistency across relat

Architecture and Design

For any security checks that are performed on the client side, ensure that these checks are duplicated on the server side, in order to avoid CWE-602. Attackers can bypass the client-side checks by modifying values after the checks have been performed, or by changing the client to remove the client-side checks entirely. Then, these modified values would be submitted to the server. Even though client-side checks provide minimal benefits with respect to server-side security, the

Implementation

When your application combines data from multiple sources, perform the validation after the sources have been combined. The individual data elements may pass the validation step but violate the intended restrictions after they have been combined.

Implementation

Be especially careful to validate all input when invoking code that crosses language boundaries, such as from an interpreted language to native code. This could create an unexpected interaction between the language boundaries. Ensure that you are not violating any of the expectations of the language with which you are interfacing. For example, even though Java may not be susceptible to buffer overflows, providing a large argument in a call to native code might trigger an overflow.

Implementation

Directly convert your input type into the expected data type, such as using a conversion function that translates a string into a number. After converting to the expected data type, ensure that the input's values fall within the expected range of allowable values and that multi-field consistencies are maintained.

Implementation

Inputs should be decoded and canonicalized to the application's current internal representation before being validated (CWE-180, CWE-181). Make sure that your application does not inadvertently decode the same input twice (CWE-174). Such errors could be used to bypass allowlist schemes by introducing dangerous inputs after they have been checked. Use libraries such as the OWASP ESAPI Canonicalization control. Consider performing repeated canonicalization until your input does

Implementation

When exchanging data between components, ensure that both components are using the same character encoding. Ensure that the proper encoding is applied at each interface. Explicitly set the encoding you are using whenever the protocol allows you to do so.

Detection Methods

Automated Static Analysis — Some instances of improper input validation can be detected using automated static analysis. A static analysis tool might allow the user to specify which application-specific methods or functions perform input validation; the tool might also have built-in knowledge of validation
Manual Static Analysis — When custom input validation is required, such as when enforcing business rules, manual analysis is necessary to ensure that the validation is properly implemented.
Fuzzing — Fuzzing techniques can be useful for detecting input validation errors. When unexpected inputs are provided to the software, the software should not crash or otherwise become unstable, and it should generate application-controlled error messages. If exceptions or interpreter-generated error messages
Automated Static Analysis - Binary or Bytecode SOAR Partial — According to SOAR [REF-1479], the following detection techniques may be useful:
Manual Static Analysis - Binary or Bytecode SOAR Partial — According to SOAR [REF-1479], the following detection techniques may be useful:
Dynamic Analysis with Automated Results Interpretation High — According to SOAR [REF-1479], the following detection techniques may be useful:

Real-World CVE Examples

CVE ID	Description
CVE-2024-37032	Large language model (LLM) management tool does not validate the format of a digest value (CWE-1287) from a private, untrusted model registry, enabling relative
CVE-2022-45918	Chain: a learning management tool debugger uses external input to locate previous session logs (CWE-73) and does not properly validate the given path (CWE-20), allowing for filesystem path traversal u
CVE-2021-30860	Chain: improper input validation (CWE-20) leads to integer overflow (CWE-190) in mobile OS, as exploited in the wild per CISA KEV.
CVE-2021-30663	Chain: improper input validation (CWE-20) leads to integer overflow (CWE-190) in mobile OS, as exploited in the wild per CISA KEV.
CVE-2021-22205	Chain: backslash followed by a newline can bypass a validation step (CWE-20), leading to eval injection (CWE-95), as exploited in the wild per CISA KEV.
CVE-2021-21220	Chain: insufficient input validation (CWE-20) in browser allows heap corruption (CWE-787), as exploited in the wild per CISA KEV.
CVE-2020-9054	Chain: improper input validation (CWE-20) in username parameter, leading to OS command injection (CWE-78), as exploited in the wild per CISA KEV.
CVE-2020-3452	Chain: security product has improper input validation (CWE-20) leading to directory traversal (CWE-22), as exploited in the wild per CISA KEV.
CVE-2020-3161	Improper input validation of HTTP requests in IP phone, as exploited in the wild per CISA KEV.
CVE-2020-3580	Chain: improper input validation (CWE-20) in firewall product leads to XSS (CWE-79), as exploited in the wild per CISA KEV.
CVE-2021-37147	Chain: caching proxy server has improper input validation (CWE-20) of headers, allowing HTTP response smuggling (CWE-444) using an "LF line ending"
CVE-2008-5305	Eval injection in Perl program using an ID that should only contain hyphens and numbers.
CVE-2008-2223	SQL injection through an ID that was supposed to be numeric.
CVE-2008-3477	lack of input validation in spreadsheet program leads to buffer overflows, integer overflows, array index errors, and memory corruption.
CVE-2008-3843	insufficient validation enables XSS

Showing 15 of 43 observed examples.

Taxonomy Mappings

7 Pernicious Kingdoms: — Input validation and representation
OWASP Top Ten 2004: A1 — Unvalidated Input
CERT C Secure Coding: ERR07-C — Prefer functions that support error checking over equivalent functions that don't
CERT C Secure Coding: FIO30-C — Exclude user input from format strings
CERT C Secure Coding: MEM10-C — Define and use a pointer validation function
WASC: 20 — Improper Input Handling
Software Fault Patterns: SFP25 — Tainted input to variable

Frequently Asked Questions

What is CWE-20?

CWE-20 (Improper Input Validation) is a software weakness identified by MITRE's Common Weakness Enumeration. It is classified as a Class-level weakness. The product receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correct...

How can CWE-20 be exploited?

Attackers can exploit CWE-20 (Improper Input Validation) to dos: crash, exit, or restart, dos: resource consumption (cpu), dos: resource consumption (memory). This weakness is typically introduced during the Architecture and Design, Implementation phase of software development.

How do I prevent CWE-20?

Key mitigations include: Consider using language-theoretic security (LangSec) techniques that characterize inputs using a formal language and build "recognizers" for that language. This effectively requires parsing to be a d

What is the severity of CWE-20?

CWE-20 is classified as a Class-level weakness (High abstraction). It has been observed in 43 real-world CVEs.

Description

Potential Impact

Availability

Confidentiality

Integrity, Confidentiality, Availability

Demonstrative Examples

Mitigations & Prevention

Detection Methods

Real-World CVE Examples

Related Weaknesses

CWE-707: Improper Neutralization

CWE-345: Insufficient Verification of Data Authenticity

CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')

CWE-41: Improper Resolution of Path Equivalence

CWE-74: Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection')

CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer

CWE-770: Allocation of Resources Without Limits or Throttling

Taxonomy Mappings

Frequently Asked Questions