Finding Undefined Behavior in C Code

Undefined behavior (UB) in C1 invalidates all program guarantees and can lead to serious security issues. It occurs in many forms, most frequently as:

  • Spatial memory issues: out of bound read/writes or even the creation of pointers to invalid memory
  • Temporal memory issues: use after free, double free, use after return, uninitialized memory reads
  • Signed integer overflow
  • Strict alias violations

Fortunately, there are both static (at compile time) and dynamic (at runtime) analysis approaches and tools available to combat these issues. This post illustrates my experiences in mitigating UB in the vis editor project.

  • Valgrind, a dynamic binary instrumentation tool, is able to detect: memory leaks, access to uninitialized memory, out-of-bound operations and use after frees. Due to its function as a just-in-time binary translator a lot of the original program structure is lost e.g. overflows of local variables aren’t detected because the explicit stack layout is unavailable. Similarly, UB present at the source level but optimized out by subsequent compiler passes isn’t noticed.

  • ASan, UBSan, MSan, TSan: these sanitizers are based on compiler instrumentation in combination with a run-time library. Regarding address sanitizer, the checks inserted during compilation only verify whether the accessed memory is accessible, not whether it is within the storage bounds of the underlying C object. As a result, overflows into adjacent buffers, or more generally into different memory regions, are not detected.

  • TIS Analyzer based on the open source TIS Kernel and Frama-C’s Value Analysis plug-in is rooted in abstract interpretation. In its interpreter mode it does not consider all possible program inputs, but instead analyzes one concrete execution.

    Due to its accurate memory model in which it interprets each C statement, it is able to detect a wide range of UB including uses (not just dereferences) of invalid pointers, comparisons of unrelated pointers and unsequenced variable accesses.

While Valgrind and the compiler sanitizers can effortlessly be applied to large projects, TIS analyzer has only limited support for the standard library and may need simple stubs (in vis’ case for time(3) and labs(3)) or measures to avoid code paths with unsupported features like mmap(2).

Within the core text management data structure used by vis, the TIS Analyzer CI service found two cases of UB, both related to the handling of NULL pointers. The first concerns the creation (not dereference) of an invalid pointer due to invalid pointer arithmetic (NULL+0), see the C11 specification, The second issue was due to an invalid pointer comparison (NULL < NULL), where only == and != are compliant C11,

All these dynamic analysis tools need appropriate program input to exercise a wide range of possible execution paths. Towards that goal coverage guided fuzzing tools such as afl-fuzz (for input files or standard I/O streams) and libfuzzer (for API calls) can be used.

Static analysis tools, like Coverity Scan, PVS-Studio and LGTM rely solely on the source code. Instead of focusing on a concrete execution, the whole program behavior is approximated aiming for precision and few false positives. These tools are generally based on data flow analyis with various degrees of API misuse knowledge built-in. LGTM/Semmle also provides a query language (CodeQL) for variant analysis i.e. to find similar issues throughout a code base.

Besides the case of undefined behavior there is also implementation defined behavior, meaning systems are allowed to behave differently, but need to document their choice. A diverse software (kernel, libc, compiler) and hardware (different word size, endianness, alignment, handling of unaligned accesses) ecosystem with different characteristics is therefore desirable. The Sourcehut CI service provides convenient shell access to various operating systems, while the Debian package build farm covers many different hardware architectures.

All mentioned tools and services have successfully been used in the development of my C projects and I encourage you to check them out too.

  1. See also: A Guide to Undefined Behavior in C and C++, Part 1, Part 2, Part 3 and What Every C Programmer Should Know About Undefined Behavior, Part 1, Part 2, Part 3. ↩︎