Seminar: Resilience-slack-aware Server Memory Architectures

Event Date: 
Fri, 2017-09-15 11:15 - 12:15

Location: 2150 Torgersen

Speaker: Steve Jian, CS@VT

Abstract:

Modern server memory systems contain substantial redundant data (e.g.,
12.5% to 40%) to ensure continuous correct operations when memory hardware
failures occur during system lifetime. Because a server can crash due to
just a single bit of uncorrectable memory error and because predicting
future fault locations is difficult due to the random nature of hardware
failures, modern servers protect *all* memory locations with uniform
redundant data even though on average only a *few* memory locations
experience hardware faults. Unfortunately, under current server memory
architectures, redundant data do not provide any benefits unless the
locations they protect contain error(s); as such the vast majority (e.g.,
97%) of the substantial (i.e., 12.5% to 40%) redundant data in server
memory systems are currently wasted.

I am currently researching server memory architectures that leverage the
redundant server memory data protecting error-free locations to perform
useful computations to improve system performance, while still ensuring the
same hardware failure protection when faults do suddenly occur; I refer to
them as resilience-slack-aware server memory architectures. This talk will
focus on one such new architecture, which uses redundant data to retrieve
program data from memory chips that are inaccessible due to memory refresh
to effectively transform memory refresh operations from blocking to
nonblocking; preliminary evaluation shows that the proposed
Redundant-data-enabled Nonblocking Refresh improves average system
performance by up to 12%, 22%, and 48% for latest generation of memory and
for the next two future generations, respectively, while neither requiring
additional data nor affecting server memory error rate.  I will end the
talk by briefly summarizing other resilience-slack-aware server memory
architectures currently under exploration, such as
Redundant-data-enabled Tag-lookup-free
Multi-tiered Memory, Redundant-data-enabled Memory Prefetching for Pointer
Chasing Applications, and Redundant-data-enabled Low-Cost Bimodal Pointer
Safely Checks.

Bio:

Steve (Xun) Jian joined the Department of Computer Science at Virginia Tech
as assistant processor in Fall 2017. He obtained his PhD from  the
University of Illinois at Urbana-Champaign.  He works in the area of
computer architecture with special focus on server architectures for
improving the future scaling of data centers and HPC systems. His graduate
work has been recognized by several best paper awards (SRC TECHCON 2014,
IEEE CAL 2013, SRC TECHCON 2015). He was one of the invitees to the
Heidelberg Laureate Forum and is a recipient of the M. E. Van Valkenburg
Graduate Research Award, one of the highest awards in ECE Illinois to
recognize graduate research excellence in the areas of circuits, systems,
or computers.