When every FLOPS counts: GPU Sharing Strategies at CERN
GPUs and accelerators are changing traditional High Energy Physics (HEP) deployments while also being the key to enabling efficient machine learning. However, their high cost and increasing demand oblige service managers to look into ways to maximize the HW utilization through sharing. While the existing methods are flexible and easy to use, complex use cases still require building custom components on top of the existing device plugin API.
In this talk we will explore interesting CERN GPU use cases, the infrastructure needed to accommodate them, and the new, exciting way of allocating and sharing GPUs - using Dynamic Resource Allocation (DRA). We go over the multiple options for GPU scheduling: time sharing, MPS, and MIG. We cover the features and limitations of each option and present extensive benchmark results that helped us assign each of our ML and scientific workloads to the most appropriate layout. Finally, we describe how managing GPUs in a centralized way improves resource utilization across interactive and batch workloads while optimizing costs in the long run.