arXiv:1707.04566 Abstract | arXiv Analytics

arXiv:1707.04566 [cs.PF]Abstract References Reviews Resources

Pushing the Limits of Online Auto-tuning: Machine Code Optimization in Short-Running Kernels

Fernando Endo, Damien Couroussé, Henri-Pierre Charles

Published 2017-07-14Version 1

We propose an online auto-tuning approach for computing kernels. Differently from existing online auto-tuners, which regenerate code with long compilation chains from the source to the binary code, our approach consists on deploying auto-tuning directly at the level of machine code generation. This allows auto-tuning to pay off in very short-running applications. As a proof of concept, our approach is demonstrated in two benchmarks, which execute during hundreds of milliseconds to a few seconds only. In a CPU-bound kernel, the average speedups achieved are 1.10 to 1.58 depending on the target micro-architecture, up to 2.53 in the most favourable conditions (all run-time overheads included). In a memory-bound kernel, less favourable to our runtime auto-tuning optimizations, the average speedups are 1.04 to 1.10, up to 1.30 in the best configuration. Despite the short execution times of our benchmarks, the overhead of our runtime auto-tuning is between 0.2 and 4.2% only of the total application execution times. By simulating the CPU-bound application in 11 different CPUs, we showed that, despite the clear hardware disadvantage of In-Order (io) cores vs. Out-of-Order (ooo) equivalent cores, online auto-tuning in io CPUs obtained an average speedup of 1.03 and an energy efficiency improvement of 39~\% over the SIMD reference in ooo CPUs.

Comments: Extension of a Conference Paper published in the proceedings of MCSoC-16: IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, Lyon, France, 2016

Categories: cs.PF

Keywords: machine code optimization, online auto-tuning, short-running kernels, average speedup, total application execution times

Tags: conference paper

arXiv Analytics

arXiv:1707.04566 [cs.PF]Abstract References Reviews Resources

Pushing the Limits of Online Auto-tuning: Machine Code Optimization in Short-Running Kernels

Links

Toolbox

arXiv:1707.04566 [cs.PF]AbstractReferencesReviewsResources

Pushing the Limits of Online Auto-tuning: Machine Code Optimization in Short-Running Kernels

Links

Toolbox

arXiv:1707.04566 [cs.PF]Abstract References Reviews Resources