Scheduling for Reduced Tail Latencies in Highly Utilised Datacenters

This is an idea proposed in 2023 as a Cambridge Computer Science PhD topic, and has been completed by Smita Vijayakumar. It was co-supervised with Evangelia Kalyvianaki.

Modern datacenters have become the backbone for running diverse workloads that increas- ingly comprise data-parallel computational jobs. Due to the ease of use and diversity of resources they host there has been an exponential rise in the demand for datacenters leading to high volume of traffic. Datacenters execute thousands of jobs by scheduling billions of tasks every day. To meet these demands, datacenters providers operate their clusters at levels of high utilisation. We show that under such conditions existing scheduling designs impose large wait times on tail tasks. This leads to large tail task completion times and consequently elevated job completion times that can potentially cost datacenter providers millions of dollars in terms of total cost of operations of these datacenters.

This PhD explores a new decentralised scheduling model, Murmuration, that uses multiple communicating scheduler instances to ensure tasks are scheduled in a manner that reduces their total wait times. It achieves this by scheduling all tasks of a job such that their start times are as close together as possible, thereby ensuring small tail task completion times and better average job completion times.

# 1st Sep 2023

ideas cloud distributed idea-done idea-phd scheduling systems

Anil Madhavapeddy, Professor of Planetary Computing

Scheduling for Reduced Tail Latencies in Highly Utilised Datacenters

Related News

Scheduling for Reduced Tail Task Latencies in Highly Utilized Datacenters / Nov 2024

Interspatial OS / Jan 2018