Predictable high-performance computing using feedback control and admission control (J IEEE Trans Par Dis 2011)


Historically, batch scheduling has dominated the management of High-Performance Computing (HPC) resources. One of the most significant limitations using this approach is an inability to predict both the start time and end time of jobs. Although existing researches such as resource reservation and queue-time prediction partially address this issue, a more predictable HPC system is needed, particularly for an emerging class of adaptive real-time HPC applications. This paper presents a design and implementation of a predictable HPC system using feedback control and admission control. By creating a virtualized application layer and opportunistically multiplexing concurrent applications through the application of formal control theory, we regulate a job's progress such that the job meets its deadline without requiring exclusive access to resources even in the presence of a wide class of unexpected events. Admission control regulates access to resources when oversubscribed. Our experimental results using five widely used applications show that the feedback and admission controller achieves highly predictable HPC system. The designed feedback controller regulates the HPC job's progress accurately, close to the prediction by theory, thereby, showing the successful application of classic control theory to HPC workloads. In week-long experiments, over 90 percent of jobs met deadlines and the jobs missing deadlines still finished close to the requested deadlines (12.4 percent error).

IEEE Transactions on Parallel and Distributed Systems