Undefined Behaviors in ISO-C: Their Effect on Your Embedded Software Part 2
In Part 1 of this blog, I defined what I mean by undefined behaviors and provided some examples. I also showed how difficult it is to manage some of these behaviors and tickled your interest by hinting that there may be a positive use to these behaviors. Let’s continue!
How C compilers (ab)use undefined behavior
The purpose of an optimizing compiler is to minimize the time needed to execute the program and to minimize the size of the executable code, within the constraints imposed by the ISO-C language specification.
Most optimizing compilers will remove the following code fragment.
for ( int i=0; i<N; i++ );
In the (far) past such constructs were used to create a delay in program execution. Today most compilers detect that this loop does not affect the observable state of the program and will not emit object code for the loop.
It is easy to trigger undefined behavior
The results printed by the following program may differ when compiled with different compilers or with the same compiler using different optimization settings.
int main () {
int arr [] = { 0, 2, 4, 6, 8 };
int i = 1;
printf ("%d\n", i + arr[++i] + arr[i++]); // evaluation order is undefined
}
Although the results are different none is incorrect. This is due to ISO/IEC 9899:2011 paragraph 6.5p2 (Unfortunately the official ISO/IEC 9899:2011 standard document is not free downloadable. Use this link to access a draft standard. Paragraph numbers and text are often the same.), which specifies that undefined behavior occurs if there are multiple allowable orderings of the subexpressions of an expression. Therefore in above example the value of "i" is likely (The behavior is undefined so other values are also possible.) either 1, 3 or 2, depending on whether the compiler evaluates the expression from left to right, right to left, or in another way. The value of "arr[++i]" is likely either 4, 6 or 8, and the value of "arr[i++]" is likely either 2, 4, or 6, and the sum is derived from an arbitrary combination of these values.
Compilers may exploit undefined signed integer overflow behavior
Consider the following code fragment.
int foo(unsigned char x) { // 1
int val = 10; // 2
val += x; // 3
if (val < 10) // 4
bar(); // 5
return val; // 6
} // 7
Some compilers will optimize away the tests "if (val < 10)" and the call to function "bar()". A compiler is permitted to perform this optimization since ISO/IEC 9899:2011 paragraph 6.5.p5 specifies: "If the result of an expression is not in the range of representable values for its type, the behavior is undefined."
The value of "x" cannot be negative and, given that signed integer overflow is undefined behavior, the compiler can assume that at line 4 the value of "val" is always equal or greater than 10. Thus the "if" statement and the call to the function "bar()" can be ignored by the compiler since the "if" has no side effects and its condition will never be satisfied. The optimization as performed in this example seems harmless, but it can have undesired side effects as shown in a next example.
Safe and unsafe arithmetic overflow occurrence test
Arithmetic overflow checks for signed addition can be implemented in several ways. A safe way is to compare against a known threshold by writing something like "if (a > INT_MAX – 100) OVERFLOW;".
However “experienced” C programmers may take 2-complements overflow for granted and test whether the sum of the numbers is smaller than one of the addend by writing "if(a + 100 < a) OVERFLOW;".
In the latter case an optimizing compiler may assume that the if statement is always false and emit no object code. A compiler is permitted to perform this optimization because the value of "a+100" is always equal or greater than "a" unless an overflow occurs in which case the behavior is undefined and anything is allowed to happen, including ignoring this case.
C compilers may silently discard some wraparound checks
This example is taken from the CERT Vulnerability Notes Database. Some C compilers will optimize away the pointer arithmetic overflow tests from the following code fragment without providing a proper diagnostic message. These compilers assume that the value of "buf+len" is always equal or greater than the value of "buf".
char *buf;
int len;
[...]
len = 1<<30;
[...]
if(buf+len < buf) /* wrap check */
[...overflow occurred...]
A compiler is permitted to perform the optimization since ISO/IEC 9899:2011 paragraph 6.5.6p8 specifies: “When an expression that has integer type is added to or subtracted from a pointer, …, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.”
Wrapping checks that use methods similar to the one described above depend on undefined behavior and may be vulnerable to buffer overflows. However the optimization applied by the compiler is not the root cause of the problem. A compiler designed for safety critical software development shall emit a diagnostic message in such cases and warn the programmer about potential side effects of the applied optimization.
After such warning the programmer can (easily) solve the problem by casting the objects of type "char *" to "uintptr_t" before comparison. The faulty wrapping check listed above should be rewritten as:
#include <stdint.h>
[...]
if((uintptr_t)buf+len < (uintptr_t)buf)
[...]
Notice that only signed overflow is undefined, therefore it is advised to use unsigned integers to track buffer sizes.
C compilers may silently discard some null pointer checks
Some C compilers will optimize away the NULL pointer check at line 3 from the following code fragment without providing a proper diagnostic message.
int foo(struct bar *s ) { // 1
int x = s->f; // 2
if (!s) return ERROR // 3
[... use s ...] // 4
}
A compiler is permitted to perform this optimization. At line 2 the pointer "s" is dereferenced, and undefined behavior occurs if the value of pointer is NULL, therefore the compiler can assume that "s" is not NULL; Under this assumption the if statement at line 3 is always false and the compiler will not emitted object code for line 3 and remove the code that was supposed to detect and prevent undefined behavior.
Also in this case the optimization applied by the compiler is not the root cause of the problem. A compiler designed for safety critical software development shall emit a diagnostic message in such cases and warn the programmer about potential side effects of the applied optimization.
After such warning the programmer can solve the problem by moving the NULL pointer check before the pointer dereference. The faulty check listed above should be rewritten as:
int foo(struct bar *s ) { // 1
int x; // 2
if (!s) return ERROR // 3
x = s->f; // 4
[...] // 5
}
Other undefined behaviors that may lead to removal of code fragments
Shifting a n-bit integer by n or more bits causes undefined behavior. Some compilers will assume that the shift amount is at most n-1, and use that information to optimize the code.
Out of bound array access causes undefined behavior. Some compilers assume that a variable that is used as array index will never exceed the array size, and use that information to optimize the code.
Such optimizations are allowed, however proper diagnostic messages are highly desired to enable the programmer to verify if the resulting code behaves as intended.
If you want to read more about the topic of this blog then this scientific article What every compiler writer should know about programmers -- or -- “Optimization” based on undefined behaviour hurts performance could be of interest to you.
Conclusions
Undefined behavior represents a dangerous trade off: it sacrifices program correctness in favor of performance and compiler simplicity. In essence, undefined behavior creates a separation between unsafe operations and error checks. Due to this separation the checks can be optimized, which can result in a partial or complete elimination of the check.
Compiler developers typically take pride in implementing advanced optimizations and take a legalistic approach when interpreting the ISO-C standard to improve benchmark results. This can lead to situations where the behavior of the optimized program does not correspond with the intentions of the programmer.
Some people argue that a compiler should preserve the intent of the code. But the compiler does not, and should not, speculate about presumed intent. The correctness of a compiler should be solely judged by how closely it conforms to the standard. Other quality attributes such as the speed and size of the executable code affect the economic competitiveness of the compiler, and whether or not a compiler is suited for the development of safety critical software largely depends on quality of the diagnostic messages.
Compiler developers shall carefully balance the diverse requirements of all stakeholders and not exploit the weaknesses in the ISO-C language definition in their attempt to increase benchmark scores. (This should not be read as an excuse for bad benchmark performance. TASKING compilers obtain best in class scores on code speed as well as code size, while keeping your code safe.)
MISRA and CERT have defined additional rules for safe and secure coding in the C programming language. Optimizing compilers that have built-in checks to detect violations of these rules warn the programmer when optimizations are applied that may change the intent of the program, and are well suited for the development of safety critical code.