Keeping Science on Keel when Software Moves

Speaker:  Ganesh Lalitha Gopalakrishnan – Salt Lake City, UT, United States
Topic(s):  Software Engineering and Programming

Abstract

Significant investments are made into the creation and maintenance of high-performance computing software, involving dozens of computer scientists and domain scientists working hard over multiple years. During this period, the computing hardware keeps changing, and so the software must be ported over, and tuned. Unfortunately, this process can change the computed numerical answers.  This can upset one's trust in the software, as the results established with prior software versions are often taken as ground truth.

More frequent porting of computational software will be necessitated in the coming years, as heterogeneous computing platforms and their compilers will be essential in order to obtain performance gains at this "End of Moore's Law" era.

Discovering which software units (files, functions) must be adjusted to re-obtain trusted answers is painful and laborious – especially with critically important (yet numerically sensitive) codes such as climate simulation codes.

We present our tool FLiT that can help pre-test one's software with respect to multiple compilers and platforms to determine the range of possible answers yielded under different optimizations.  FLiT also includes a bisection search method that can efficiently locate units more susceptible to changes. This can help the developer make these units more tolerant to porting by changing precision, re-expressing the underlying algorithm, etc.

Experiments on realistic libraries including MFEM (a finite element code), Laghos (simulator of compressible gas dynamics), and LULESH (popular miniapp from Livermore) have validated the promise of FLiT. In MFEM, FLiT identified a situation where the answer drifted nearly by a factor of two. This drift was root-caused to a numerically non-robust calculation, and is being discussed with the developers. In Laghos, one compilation caused the density of the simulated gas to become negative (an impossibility).

Many of the blemishes discovered by FLiT were attributed to poor floating-point coding practices and since then corrected.  FLiT is now being enhanced to be easy to use for anyone, including those who like to retain their own building and testing infrastructures. FLiT also helps discover optimization flags combinations that fetch substantial amounts of performance gains while causing only acceptable degrees of result changes. This tradeoff space has been hitherto difficult to navigate.

This talk will take the audience through the basics of floating-point arithmetic, how typical compiler optimizations affect performance and the computed results, and the details of FLiT's efficient bisection search mechanisms. They will learn about how FLiT's search differs from Delta Debugging.

The main take-away message from this talk is that we can see the world through the lens of science only to the extent and fidelity with which the supporting software can analyze.  Thus, to keep science on keel, we must safeguard our investments in software, and help ensure that it traverses its life-cycle path and performance tuning path while remaining reliable.

About this Lecture

Number of Slides:  60
Duration:  50 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.