The Death of Optimizing Compilers...Or is it?

September 6, 2017

Optimized Compiler software graphic

Recently I came across the tutorial "The Death of Optimizing Compilers" () from Daniel J. Bernstein, professor of mathematics and computer science at the Eindhoven University of Technology and research professor at the University of Illinois at Chicago. This tutorial was originally presented at ETAPS the European Joint Conferences on Theory and Practice of Software. Since I earn my living at TASKING, a compiler company, it made sense to check his arguments to decide whether it is time for a career change.

Bernstein's presentation of 115 pages is not easy to read, therefore I'll summarize his ideas:

Optimization of the software is still needed despite the continuous improvements in processing speed.
Code is either hot, and therefore worth optimizing by hand, or else cold, and therefore not worth optimizing at all.
Today's optimizing compilers cannot exploit the features of advanced computing architectures such as Vectorization; Many threads/cores; The memory hierarchy; Ring- and Mesh-connected multiprocessor networks; Larger-scale parallelism; and Larger-scale networking.
Bernstein questions the usefulness of optimizing compilers with the argument “Which compiler can, for instance, take Netlib LAPACK/BLASS and run serial Linpack as fast as OpenBLAS?".
To facilitate high-quality optimization the compiler needs to know properties of the data, and whether certain cases can arise, and therefore needs to be in a dialog with the programmer.
No good language in which to have such a dialog exists. Where Bernstein is influenced by Hoare's idea: ideally, a language should be designed so that an optimizing compiler can describe its optimizations in the source language.
Therefore future compilers should provide the programmer a means to write beautifully-structured, but possibly inefficient, programs; and subsequently interactively specify transformations that make it efficient.

Bernstein's first statement complies with TASKING's data. During the past two decades, our clients in the automotive powertrain segment have been asking for more processing speed either realized in silicon or by compiler optimizations. In the ADAS L4-L5 segment the ultimate requirement is to get HPC performance out of a single chip, at an affordable price.

Bernstein's second statement can be refuted based on the following arguments. In embedded systems size matters. The cost of hardware must be multiplied by the production volume, in the automotive sector, this is a number in the range of thousands up to a hundred million. Therefore all code needs to be optimized towards a tiny memory footprint, which is completely independent of the hot/cold issue. Also, the statement that code is either hot or cold does not match with the application profiles we receive from our customers. Furthermore, the prerequisite that software engineers that can, and are willing to, write efficient assembly code are widely available is incorrect. Such skills are scarce and expensive. In addition, Bernstein does not take the costs into account that are associated with software maintenance and porting the software to other architectures, which are both inversely proportional with the abstraction level of the programming language.

Today most software is created via "model driven development", where some application domain specific language is subsequently transformed into C/C++ and binary code, where one relies on automated optimization processes to break down the abstraction layers. Bernstein fully neglects this trend.

Bernstein's third statement is partially correct. Automated vectorization is supported by today's optimizing compilers and emitted SIMD instruction sequences make efficient use of the underlying hardware. However current optimizing compilers are not able to judiciously allocate code over the nodes in multi-core and HPC systems.

Bernstein's fourth statement about the optimization of the netlib LAPACK and BLASS requires some nuances. Personally, I would leave the optimization of LAPACK up to the compiler since it will be very laborious and difficult to beat an optimizing compiler on this type of -- general purpose -- code. For BLASS it is different, for this type of code -- small nested loops iterating over multiple large arrays -- hand made optimizations may be justified, especially for large production volumes or data sets.

Bernstein's fifth argument is covered by current compilers and programming languages. OpenMP and other language extensions exist that allow the user to annotate the source code with information that guides the compiler optimization processes.

Bernstein's concluding statements are a request for a new programming language, one that is sequential consistent to match with the limitations of our human reasoning capabilities, and also -- simultaneously -- allows the specification of coarse and fine grained parallelism. It sounds like no one knows what such a programming language would look like. In essence, this is also the holy-grail of electronic system-level synthesis (ESL).

Software programming language up close

Given the above it is valid to conclude that the economics of compiler optimization is still outstanding. An optimizing compiler significantly reduces the execution time and the memory footprint. This reduces the costs and energy consumption of the CPU and its memories. It also reduces the need to optimize code by hand, which requires scarce expertise and increases maintenance and porting costs. On a global scale the economic benefits are also excellent, where roughly 10^7 software engineers apply the automated optimizations developed by roughly 10^3 compiler engineers on a daily basis.

So, regarding the need for a career change. It is safe to say that optimizing compilers will not disappear. However, compilers will change, current academic compiler research is moving into the direction of domain-specific compilers and formally verified compilers.