How Compiler Vectorization in Computer Architecture Can Help You

August 10, 2017 Tasking

Self-driving car on road

What if I told you there was a way to increase the speed of your code and reduce its size? Single instruction, multiple data (SIMD) computing can be achieved using parallel computers. It can also help you improve code performance while decreasing its size. Vectorization is a technique that really can take advantage of SIMD hardware. This can either be implemented by the programmer or can be carried out by the compiler. As always there are both pros and cons that you should carefully consider before using this optimization yourself.

Parallel Computing

I’m sure you know the saying, “you can’t have your cake and eat it too.” Parallel computing lets you challenge that assumption a little bit. When used correctly, parallelism can run your code faster while using less space.

There are multiple types of parallel computing. These include SIMD, multiple instruction, single data (MISD), and multiple instruction, multiple data. The kind we’re primarily talking about is SIMD. This acronym tells us that we’re only going to be performing one kind of operation at a time, i.e., single instruction. However, we can carry it out using multiple sets of data (multiple data). This can help process large amounts of data that all need to be run through the same algorithm. Vectorization narrows this approach to loops in our code.

“Parallel Computing Systems” on a binary background
Parallel computing can increase your code’s performance while maintaining its size.


Nowhere in embedded systems is the size vs speed conundrum more important than in cars with advanced driver assistance systems (ADAS). There are some features like the automatic braking system (ABS) that need to operate at peak performance. Other bells and whistles, like multimedia, should be optimized for size rather than speed. With vectorization, it can be both instead of either or. Of course, there are always circumstances where this optimization shouldn’t be used.

Vectorization takes data that needs to be run through a loop and runs it through multiple loops on different processors. This means your processing time is divided down by the number of processors used. It also uses less space, since there is no need to unroll loops. Instead of adding lines of instructions to reduce loop repetition, vectorization just iterates them on another processor. The result is the same, just with less code. Vectorization can be a simple way to speed up and shrink your code, especially if you let your compiler do it for you. Many compilers come with vectorization optimizations, but some programmers still like to implement this by hand.

No matter how you implement vectorization, make sure you check your target architecture and loop dependencies. For example, if you want to perform operations on 32-bit integers on a 16-bit system vectorization may not be the answer. If you want to utilize parallel computing try to match your computations with your processor. In addition, your compiler will analyze your code to ensure there are no dependency problems. These can occur when operations in nested loops attempt to access or modify the same data. Dependencies can interfere with multiple kinds of loop optimizations, so it’s a good idea to avoid them if at all possible.

hand holding small money bags
Make sure buying multicore processors doesn’t break the bank.

Cost of Vectorization

Of course in life, there are no free lunches. Parallel computing requires you to use multiple processors, which can increase your costs and power consumption.

As a developer, you may not be extremely worried about costs, but I’m sure your accountants are. Using several processors or ones with multiple cores will increase expenditures. If you’re operating on a tight budget, check and see if you can go without before you specify a multicore processor.

Power consumption is always a concern in embedded systems, but especially in vehicles that are laden with electronic equipment. That’s one reason why researchers have been developing low power memory for embedded systems. It’s also why Tesla is developing new batteries for cars. While vectorization may not increase your power requirements, using multiple cores will. Especially if you’re using some of the cores to error check your other ones. So be aware that when you tack on more computers, you’re going to be using more power.

Parallel computing has a long history, but this old technology is still extremely useful. It can help you increase your code’s performance without increasing its size. This is the idea behind vectorizing loops in your code. Vectorization can use fewer lines of code than loop unrolling while achieving the same speeds. Everything has a cost, though, and the cost here is money and power.

Compiler optimization is extremely useful, but only if you have one that can work intelligently with your code. TASKING develops compilers, and other tools like static analyzers and standalone debuggers, specifically for the automotive industry. That way you can know that your code is optimized for your unique application.

Have more questions about parallel optimizations? Call an expert at TASKING.

Previous Article
Intelligent Performance Tuning for Embedded ADAS Solutions
Intelligent Performance Tuning for Embedded ADAS Solutions

Performance tuning profilers can help optimize embedded solutions for execution speed or code size.

Next Article
What Tesla’s Electric Car Battery Technology Means for Future Vehicles
What Tesla’s Electric Car Battery Technology Means for Future Vehicles

EVs and ADAS enabled vehicles both need good batteries. Tesla is leading the way in developing this technol...

Get Your TASKING Free Trial Today.

Free Download