The King of Vulnerabilities: Unicode Compiler Vulnerabilities Threat Global Software Code

Recently, researchers at the University of Cambridge discovered a vulnerability that can affect most computer software code compilers and software development environments today. This vulnerability comes from a component of the digital text encoding standard Unicode. Unicode currently defines more than 143,000 characters in 154 different programming language scripts (except for some non-script character sets, such as emoticons).

In short, almost all compilers (programs that convert human-readable source code into machine code executable by a computer) are vulnerable to malicious attacks. In this kind of attack, the attacker can introduce targeted vulnerabilities into any software without being discovered. The disclosure of the vulnerability was coordinated by multiple organizations, some of which are now issuing vulnerability mitigation updates.

The vulnerability was named “Trojan Source” (Trojan Source). Specifically, the weakness involves Unicode’s two-way or “Bidi” algorithm, which processes Display text containing mixed scripts with different display orders, such as Arabic (read from right to left) and English (from left to right).

But the computer system needs a deterministic method to resolve the direction conflicts in the text. Enter “Bidi override”, which can be used to make left-to-right text read from right to left, and vice versa.

“In some cases, the default sorting set by the Bidi algorithm may not be sufficient,” the Cambridge researchers wrote. “For these situations, Bidi override forces the control characters to switch the display order of character groups.”

Bidi override can even display individual script characters in an order different from its logical encoding. As the researchers pointed out, this feature has previously been used to disguise the file extension of malicious software spread via email.

The problem is that most programming languages ​​allow developers to put these Bidi override control characters in comments and strings. This is bad, because most programming languages ​​allow comments, and all text in comments (including control characters) is ignored by the compiler and interpreter. Equally bad is that most programming languages ​​allow strings of arbitrary characters (including control characters) to be used.

This is the first “simple and elegant” super loophole that endangers almost all software.

Ross Anderson, a professor of computer security at the University of Cambridge and a co-author of the study, said: “So you can use them in source code that appears to be harmless to human reviewers, but (secretly) do some nasty things.” “For projects like Linux and Webkit, this is absolutely bad news. These projects accept code contributions from anyone and merge them into critical code after manual review. As far as I know, this vulnerability is the first impact Almost all (software) vulnerabilities.”

The research paper called the vulnerability “Trojan Source” and pointed out that although comments and strings have specific syntax to indicate their start and end positions, Bidi overrides do not abide by these boundaries. The paper states:

“So, if Bidi control characters are intentionally placed in comments and strings, we can sneak them into the source code in a way that is acceptable to most compilers. Our main insight is that we can rearrange the source code characters, Make them look like syntactic source code.”

“Combining all of these together, we are able to perform a new type of supply chain attack on the source code. By injecting Unicode Bidi control characters into comments and strings, the attacker can generate syntactically valid source code in most modern computer languages. The display sequence of the characters presents a logic that is different from the actual logic. In fact, we have surreptitiously converted program A into program B.”

Anderson said that such an attack may be difficult for human code reviewers to detect because the rendered source code looks completely acceptable.

“If the logical change is subtle enough to be undetected in subsequent tests, the attacker may introduce targeted vulnerabilities without being discovered,” he said.

It is also worrying that Bidi control characters reside through the copy and paste functions on most modern browsers, editors, and operating systems.

“Any developer who copies code from an untrusted source into a protected code base may inadvertently introduce an invisible vulnerability.” Anderson pointed out: “This kind of code duplication is an important source of real-world security vulnerabilities. .”

Matthew Green, an associate professor at the Johns Hopkins Institute for Information Security, said that Cambridge research clearly shows that most compilers can be tricked by Unicode to process code in ways that are different from what readers expect.

“Before reading this paper, the idea that Unicode can be exploited in some way would not surprise me,” Green pointed out: “It surprised me how many compilers would happily without any defenses. How effective is the parsing of Unicode and their right-to-left encoding technology in sneaking code into the code base. This is a very clever technique, people never thought of this possibility before.”

Green said that the good news is that researchers have conducted extensive vulnerability scans, but have been unable to find evidence that anyone is exploiting the vulnerability. but:

“The bad news is that it has no defensive measures. Now people know that criminals may start to take advantage of it,” Green said. “I hope compiler and code editor developers can quickly fix this problem! But because some people do not regularly update the development Tools, at least for a period of time, will have some risks.”

Anderson noted that so far, about half of the organizations responsible for maintaining the affected computer programming languages ​​have promised to provide patches, but others are delaying.

“We will monitor their deployment in the next few days,” Anderson said. “We also expect Github, Gitlab, and Atlassian to take action, so their tools should be able to detect attacks on the code of languages ​​that still lack bidirectional character filtering.”

As for what measures need to be taken against Trojan Source, the researchers urge governments and companies that rely on critical software to determine the security posture of their suppliers, put pressure on them to deploy adequate defenses, and ensure that any link in the tool chain is covered.

The paper states:

“The Trojan Source vulnerability affects almost all computer languages, which makes it a rare opportunity to compare responsiveness across platforms and vendors in the entire technology ecosystem.” The paper concluded. “Because these technologies can easily launch powerful supply chain attacks, it is vital that all organizations participating in the software supply chain implement defenses.”

Nicholas Weaver, a lecturer in the Department of Computer Science at the University of California, Berkeley, also pointed out:

“The coordinated disclosure process of this vulnerability will be an excellent model for observing how we solve such problems,” he said. “This loophole is real, but it also highlights a larger loophole in modern computer code dependence.”

As of press time, Rust has issued a security bulletin for this security vulnerability, the vulnerability numbers are CVE-2021-42574 and CVE-2021-42694.

The Links:   B150XG02-V4 CXA-0538-A

Related Posts