CWE-1007: Insufficient Visual Distinction of Homoglyphs Presented to User

Description

The product displays information or identifiers to a user, but the display mechanism does not make it easy for the user to distinguish between visually similar or identical glyphs (homoglyphs), which may cause the user to misinterpret a glyph and perform an unintended, insecure action.

Some glyphs, pictures, or icons can be semantically distinct to a program, while appearing very similar or identical to a human user. These are referred to as homoglyphs. For example, the lowercase "l" (ell) and uppercase "I" (eye) have different character codes, but these characters can be displayed in exactly the same way to a user, depending on the font. This can also occur between different character sets. For example, the Latin capital letter "A" and the Greek capital letter "Α" (Alpha) are treated as distinct by programs, but may be displayed in exactly the same way to a user. Accent marks may also cause letters to appear very similar, such as the Latin capital letter grave mark "À" and its equivalent "Á" with the acute accent. Adversaries can exploit this visual similarity for attacks such as phishing, e.g. by providing a link to an attacker-controlled hostname that looks like a hostname that the victim trusts. In a different use of homoglyphs, an adversary may create a back door username that is visually similar to the username of a regular user, which then makes it more difficult for a system administrator to detect the malicious username while reviewing logs.

Potential Impact

Integrity, Confidentiality

Other

Demonstrative Examples

The following looks like a simple, trusted URL that a user may frequently access.

Attack

http://www.еxаmрlе.соm

However, the URL above is comprised of Cyrillic characters that look identical to the expected ASCII characters. This results in most users not being able to distinguish between the two and assuming that the above URL is trusted and safe. The "e" is actually the "CYRILLIC SMALL LETTER IE" which is represented in HTML as the character &#x435, while the "a" is actually the "CYRILLIC SMALL LETTER A" which is represented in HTML as the character &#x430.  The "p", "c", and "o" are also Cyrillic characters in this example. Viewing the source reveals a URL of "http://www.&#x435;x&#x430;m&#x440;l&#x435;.&#x441;&#x43e;m". An adversary can utilize this approach to perform an attack such as a phishing attack in order to drive traffic to a malicious website.

The following displays an example of how creating usernames containing homoglyphs can lead to log forgery.

Assume an adversary visits a legitimate, trusted domain and creates an account named "admin", except the 'a' and 'i' characters are Cyrillic characters instead of the expected ASCII. Any actions the adversary performs will be saved to the log file and look like they came from a legitimate administrator account.

Result

123.123.123.123 аdmіn [17/Jul/2017:09:05:49 -0400] "GET /example/users/userlist HTTP/1.1" 401 12846
		  123.123.123.123 аdmіn [17/Jul/2017:09:06:51 -0400] "GET /example/users/userlist HTTP/1.1" 200 4523
		  123.123.123.123 admin [17/Jul/2017:09:10:02 -0400] "GET /example/users/editusers HTTP/1.1" 200 6291
		  123.123.123.123 аdmіn [17/Jul/2017:09:10:02 -0400] "GET /example/users/editusers HTTP/1.1" 200 6291

Upon closer inspection, the account that generated three of these log entries is "&#x430;dm&#x456;n". Only the third log entry is by the legitimate admin account. This makes it more difficult to determine which actions were performed by the adversary and which actions were executed by the legitimate "admin" account.

Mitigations & Prevention

Implementation

Use a browser that displays Punycode for IDNs in the URL and status bars, or which color code various scripts in URLs. Due to the prominence of homoglyph attacks, several browsers now help safeguard against this attack via the use of Punycode. For example, Mozilla Firefox and Google Chrome will display IDNs as Punycode if top-level domains do not restrict which characters can be used in domain names or if labels mix scripts for different languages.

Implementation

Use an email client that has strict filters and prevents messages that mix character sets to end up in a user's inbox. Certain email clients such as Google's GMail prevent the use of non-Latin characters in email addresses or in links contained within emails. This helps prevent homoglyph attacks by flagging these emails and redirecting them to a user's spam folder.

Detection Methods

Manual Dynamic Analysis Moderate — If utilizing user accounts, attempt to submit a username that contains homoglyphs. Similarly, check to see if links containing homoglyphs can be sent via email, web browsers, or other mechanisms.

Real-World CVE Examples

CVE ID	Description
CVE-2013-7236	web forum allows impersonation of users with homoglyphs in account names
CVE-2012-0584	Improper character restriction in URLs in web browser
CVE-2009-0652	Incomplete denylist does not include homoglyphs of "/" and "?" characters in URLs
CVE-2017-5015	web browser does not convert hyphens to punycode, allowing IDN spoofing in URLs
CVE-2005-0233	homoglyph spoofing using punycode in URLs and certificates
CVE-2005-0234	homoglyph spoofing using punycode in URLs and certificates
CVE-2005-0235	homoglyph spoofing using punycode in URLs and certificates

Frequently Asked Questions

What is CWE-1007?

CWE-1007 (Insufficient Visual Distinction of Homoglyphs Presented to User) is a software weakness identified by MITRE's Common Weakness Enumeration. It is classified as a Base-level weakness. The product displays information or identifiers to a user, but the display mechanism does not make it easy for the user to distinguish between visually similar or identical glyphs (homoglyphs), which...

How can CWE-1007 be exploited?

Attackers can exploit CWE-1007 (Insufficient Visual Distinction of Homoglyphs Presented to User) to other. This weakness is typically introduced during the Architecture and Design, Implementation phase of software development.

How do I prevent CWE-1007?

Key mitigations include: Use a browser that displays Punycode for IDNs in the URL and status bars, or which color code various scripts in URLs. Due to the prominence of homoglyph attacks, several browsers

What is the severity of CWE-1007?

CWE-1007 is classified as a Base-level weakness (Medium abstraction). It has been observed in 7 real-world CVEs.