Paper on scheduling for reduced tail task latencies / Nov 2024

Smita Vijayakumar went along to Seattle to SOCC 2024 to present her PhD research on Murmuration. This is a new scheduler for Kubernetes that allows for 15%--25% faster job completion times than the default scheduler for different job arrival characteristics in datacenters that are very busy. The key insight is that existing schedulers impose large wait times on tail tasks in highly utilized clusters, leading to long job completion times. Murmuration employs multiple communicating schedulers to schedule tasks such that their start times are as close together as possible, ensuring small tail task completion time. Our evaluation shows it scales to workloads with millions of tasks, and with queue re-ordering enhancements, achieves up to 100x better median job completion time than current schedulers on industry workloads.

Unfortunately, the videos from SOCC don't seem to be online yet (I could only find SOCC 2020), but I'll update this if they do show up.

# 1st Nov 2024 / cloud, distributed, scheduling, systems

Loading recent items...