The Internet of Things and Tuning

With billions of devices coming on line in the next decade, there needs to be the software infrastructure that can capture, store and analyze the data.  Connecting all of these devices from simple items such as a sensor all the way up to a car requires various technologies. A wide range of software is needed that ensures the data can be collected and acted upon with confidence.

One of the most interesting and ultimately most visible application of the Internet of Things  (IoT) is the autonomous car. There is a complex set of actions that must be reliable, from the sensors placed in the car to the communication to other autonomous cars all the way back to the car manufacturer.  From the actions of the car to the storage and analytics in the cloud, software acting with highly reliable hardware form the backbone of this complex infrastructure.

Autonomous driving on a large scale requires more than just the sensor and limited computing the car. There needs to be defined methods to communicate to data centers located elsewhere as well as reliable two way communication.  There are a number of tools that can be critical for the successful implementation of an IoT system.

Building and Optimizing the code –

  • Intel C++ Compilers
  • Intel Math Kernel Libraries
  • Intel Threaded Building Blocks
  • Intel Integrated Performance Primitives

Debugging and Understanding the code –

  • Intel System Debugger
  • Third party Debuggers

Analyze and Tune the code –

  • Intel VTune Amplifier
  • Intel Energy Profiler
  • Intel Performance Snapshot
  • Intel Inspector
  • Intel Graphics Performance Analyzers

When developing IoT applications, performance is important, but so is the power used to achieve that performance. Since different CPUs and associated hardware would be used in an end to end system, understanding what the capabilities of the various components is important. For example, the cycles per instruction (CPI) would be different based on the various types of hardware being used. Likewise, a hot spot on one architecture might not be a hotspot on another type of hardware. This extends to using all of the available computing power when available throughout the system. Achieving a speedup of 8X on an 8 core system is excellent, whereas a speedup of 10X on a 16 core system would not be considered a success.

More detailed analysis can be obtained through the use of other tools mentioned above. For example, understanding how the pipeline slots are being utilized can greatly increase the performance of the application. If pipeline slots are blocked for some reason, performance will suffer. Likewise, getting an understanding of the various cache misses can lead to a better organization of the data. This can increase performance while reducing latencies of memory to CPU.

The goal is to understand all aspects in a IoT system so that the CPUs are being used to their maximum potential.  By using state of the art tools, performance along the entire system can be examined and improved, leading to a better system. Understanding the application remains key to high performance IoT environments.

Leave a Reply

Your email address will not be published. Required fields are marked *