This is an idea proposed in 2023 as a Cambridge Computer Science PhD topic, and has been completed by Smita Vijayakumar. It was supervised by Evangelia Kalyvianaki and Anil Madhavapeddy as part of the Interspatial OS project.
Modern datacenters have become the backbone for running diverse workloads that increas- ingly comprise data-parallel computational jobs. Due to the ease of use and diversity of resources they host there has been an exponential rise in the demand for datacenters leading to high volume of traffic. Datacenters execute thousands of jobs by scheduling billions of tasks every day. To meet these demands, datacenters providers operate their clusters at levels of high utilisation. We show that under such conditions existing scheduling designs impose large wait times on tail tasks. This leads to large tail task completion times and consequently elevated job completion times that can potentially cost datacenter providers millions of dollars in terms of total cost of operations of these datacenters.
This PhD explores a new decentralised scheduling model, Murmuration, that uses multiple communicating scheduler instances to ensure tasks are scheduled in a manner that reduces their total wait times. It achieves this by scheduling all tasks of a job such that their start times are as close together as possible, thereby ensuring small tail task completion times and better average job completion times.