The Balancing Act
Developers creating embedded advanced driver assistance system (ADAS) solutions walk a performance tuning tightrope. They must continually balance the size of the compiled program versus the execution speed. In the automotive space, devices are by nature small – memory is a constraining resource. Therefore, code must be uber-efficient and make optimal use of the various available types of memory (CPU registers, RAM, etc.). In addition, many embedded ADAS solutions are safety-critical; acceptable response times are often measured in microseconds, requiring code to run quickly. Slow code could result in safety risks and can jeopardize achieving the desired ASIL level.
Now, one might intuitively assume that the smaller a section of code is, the faster it ought to run. After all, a 3,500-pound sports car is faster than a multi-ton diesel truck. A hummingbird is faster than a condor. But in the world of embedded ADAS solution development, the “small equals fast” rule does not always apply.
Therefore, embedded software developers must find a balance, defined by their specific needs, between execution speed and code size. Performance tuning, or “optimization” can occur at several levels, including line, memory, and function:
- Line-level example: Loops look compact, but can actually run slower than simply performing operations on a relatively short list of elements, one after the other (without the loop). Of course, the latter approach wouldn’t be very practical if the list of elements was quite long.
- Function-level example: Inline functions often execute faster, but can result in duplicate code that drives up generated code size and can increase virtual-memory page faults.
- Memory-level example: Developers can choose to specify “packed” data (everything stored as efficiently as possible, not aligned with word boundaries, slower to access) or “unpacked” data (less efficiently stored but aligned with word boundaries, faster to access). According to one source, a CPU cache hit can cost your program 10–20 clock cycles. An external cache hit can cost 20–40 clock cycles. A page fault can cost as much as one million clock cycles.
Another complicating factor for embedded ADAS solution developers is the diversity of devices in the automotive industry. A desktop developer writes for the PC, and it will work on virtually any other PC. But in the automotive world, just because an application runs great on one device doesn’t mean it will run well on a different device – that’s why compilers optimized for a particular device or set of devices are so important.
Profilers Can Help
Sometimes you can achieve performance tuning by simple alternative programming techniques, such as deferring sorting to a less critical time, or reducing the number of items to be sorted by pre-processing. But hand-optimizing your code is less efficient than using a profiler – a tool that can analyze code and pinpoint problem areas.
Of course, before you can begin performance tuning, it helps to have a baseline – be certain that your unoptimized code works correctly (debugging), know which parts of your application have to be fast (thorough design specs), and to know the original size and speed of your code (benchmarking).
Desktop developers can use a tool such as the Intel® vTune™ Amplifier for performance tuning; but until now, the choices for optimizing embedded software have been few. TASKING® will soon release the TASKING Embedded Profiler that makes vTune-type functionality available for AURIX® and AURIX 2G devices. The Profiler verifies that the clocks are configured properly for a benchmark run (no need for an oscilloscope) and guides you through a few easy steps that pinpoint source lines associated with the greatest slow down. The tool indicates the root cause of the slow down and gives simple instructions on how to solve the problem. After applying the suggested mitigation, the tool can be used to confirm that the problem has indeed been fixed. All this happens non-intrusively with real data collected from the application running on the real device.
Be a Better Tightrope Walker!
Using a profiler, even non-expert users can often accelerate previously un-tuned applications by several hundred percent in about a half hour. If you’d like to try out the TASKING Embedded Profiler, contact firstname.lastname@example.org, and a team member will get back to you right away.
About the Author
Mark Forbes graduated from Bradley University with a BS in Electrical Engineering and has been in the EDA industry for over 30 years.More Content by Mark Forbes