Project Overview

The HPC community is focused on achieving exaflop performance — roughly a 30-fold improvement over the best supercomputers of the last decade. Because of practical, financial, and environmental concerns, the Department of Energy has set a power limit of 20 megawatts for achieving an exaflop. Since today's top machines consume between 5 and 20 megawatts yet remain far from exaflop performance, significant hardware and software advances are necessary.

One promising hardware direction is overprovisioned systems — machines containing more nodes than can be fully powered simultaneously. While overprovisioned systems have the potential to significantly improve power and performance, software will need to be redesigned to support them.

The focus of this project is to design and implement software infrastructure supporting overprovisioned systems. The key advance is support for system-wide optimizations that span multiple applications — in stark contrast to the current per-application optimization focus in HPC. The developed software consists of a job profiler, a multi-job scheduler, and a cluster-wide run-time system that jointly optimizes multiple applications.

Supported by the National Science Foundation (NSF) under grant no. CNS-1526015. Any opinions, findings, and conclusions expressed are those of the author(s) and do not necessarily reflect the views of NSF.

People
Faculty
David K. Lowenthal
Graduate Students
Publications
I/O Aware Power Shifting
L. Savoie, D. Lowenthal, B. de Supinski, T. Islam, K. Mohror, B. Rountree, M. Schulz
IPDPS, May 2016