Unchaining Large-Scale Datacenter Computing: Reengineering a Server EcosystemSpeaker: Michael Gschwind – Yorktown Heights, NY, United States
Topic(s): Networks and Communications
One particularly pervasive dependence is byte ordering of data. Byte ordering affects both the layout of data in memory, and of disk-based data repositories. While Power server environments have used big-endian ordering to date, Open Power defines a new little-endian execution environment based on the Linux operating system. In addition to exploiting hardware support for little-endian data layout, compiler built-ins functions handle transformation of data orderings that cannot be readily changed with hardware reconfiguration, such as the ordering of vector elements in the SIMD execution units.
In addition introducing new data layout, the new little-endian environment also introduces a new Application Binary Interface (ABI) governing the interoperation of program modules, such as module organization, function calling and register usage conventions, and so forth. While not directly linked to the introduction of a little-endian data format, the creation of a new environment offered a suitable opportunity to introduce a new ABI.
In addition, the new OpenPOWER software environment also includes two new SIMD vector programming API optimized for the little-endian programming environment that uses fully little-endian conventions for referencing data structures and vector elements within the Power SIMD vector processing unit. Where necessary, the compiler translates these new little-endian conventions to the underlying big-endian hardware conventions. This is particularly useful to write native little-endian SIMD vector applications, or when porting SIMD vector code from other little-endian platforms.
To efficiently implement the new SIMD API, we extend compiler optimizations to optimize vector intermediate representations to eliminate data reformatting primitives. In addition to providing a framework for SIMD portability and for optimizing SIMD vector layouts, we implement a novel vector operator optimization pass and measure its effectiveness: our implementation eliminates all data reformatting from application vector kernels, resulting in a speedup of up 65% for a Power8 microarchitecture with two fully symmetric vector execution units.
About this LectureNumber of Slides: 50
Duration: 30 minutes
Languages Available: English
Request this Lecture
To request this particular lecture, please complete this online form.
Request a Tour
To request a tour with this speaker, please complete this online form.
All requests will be sent to ACM headquarters for review.