Unchaining Large-Scale Datacenter Computing: Reengineering a Server Ecosystem

Speaker:  Michael Gschwind – Yorktown Heights, NY, United States
Topic(s):  Networks and Communications

Abstract

Over the past several years, a new class of computing solutions has emerged in the form of dedicated data center scale computing platforms to power services such as search and social computing. Data center-level applications most often involve data discovery and/or serving from large repositories.  Applications may either be written in traditional object-oriented languages such as C++, or in new dynamic scripting languages such as JavaScript, PHP, Python, Ruby, etc. Because many datacenters use custom-designed servers, these applications have suffered from lock-in into merchant-silicon processors optimized for desktop environments.   We reengineered the Power server ecosystem to simplify porting of software stacks and entire systems. 

One particularly pervasive dependence is byte ordering of data.  Byte ordering affects both the layout of data in memory, and of disk-based data repositories. While Power server environments have used big-endian ordering to date, Open Power defines a new little-endian execution environment based on the Linux operating system.  In addition to exploiting hardware support for little-endian data layout, compiler built-ins functions handle transformation of data orderings that cannot be readily changed with hardware reconfiguration, such as the ordering of vector elements in the SIMD execution units.

In addition introducing new data layout, the new little-endian environment also introduces a new Application Binary Interface (ABI) governing the interoperation of program modules, such as module organization, function calling and register usage conventions, and so forth.  While not directly linked to the introduction of a little-endian data format, the creation of a new environment offered a suitable opportunity to introduce a new ABI.

In addition, the new OpenPOWER software environment also includes two new SIMD vector programming API optimized for the little-endian programming environment that uses fully little-endian conventions for referencing data structures and vector elements within the Power SIMD vector processing unit.  Where necessary, the compiler translates these new little-endian conventions to the underlying big-endian hardware conventions.  This is particularly useful to write native little-endian SIMD vector applications, or when porting SIMD vector code from other little-endian platforms.

To efficiently implement the new SIMD API, we extend compiler optimizations to optimize vector intermediate representations to eliminate data reformatting primitives. In addition to providing a framework for SIMD portability and for optimizing SIMD vector layouts, we implement a novel vector operator optimization pass and measure its effectiveness: our implementation eliminates all data reformatting from application vector kernels, resulting in a speedup of up 65% for a Power8 microarchitecture with two fully symmetric vector execution units. 

About this Lecture

Number of Slides:  50
Duration:  30 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.