Ultra-Scale Memory Analysis for High-End Computing

Start Date: 04/01/2007
End Date: 07/01/2009

The performance limitations of large-scale systems must be overcome to meet the computational demands of distributed simulations critical to DOE's mission. Memory costs dominate time-to-solution in emergent, ultra-scale systems and applications. Ultra-scale system complexity demands fine-grain profiling of the critical data paths to and from local and remote memory. Profiling these data paths for millions of processes on thousands of nodes will produce intractable performance data sets. Yet, profiled performance data is critical to hand-tuned optimization of codes with unacceptable time-to-solution.
We will build novel technologies of profiling and analysis coordinating the use of microprocessor hardware counters to improve understanding of distributed memory performance and enable optimization of ultra-scale applications and systems. We will design, implement, and validate 1) technologies to improve understanding of local and remote memory performance, 2) technologies scalable to thousands of nodes and millions of processes, 3) analytical models of local and remote memory performance for evaluation and optimization. We will work directly with researchers at ORNL and LANL to optimize production codes ensuring direct impact in the DOE scientific community. Graduate students will play an active role in the research effort, eventually taking their place among the next generation of high-end computing specialists.

Grant Institution: Department of Energy

Amount: $230,204

People associated with this grant:

Kirk Cameron