Benchmarking Ultra-Low-Power μNPUs
Josh Millar, Yushan Huang, Sarab Sethi, Hamed Haddadi, and Anil Madhavapeddy.
Paper in the proceedings of the 31st Annual International Conference on Mobile Computing and Networking.
Efficient on-device neural network (NN) inference offers predictable latency, improved privacy and reliability, and lower operating costs for vendors than cloud-based inference. This has sparked recent development of microcontroller-scale NN accelerators, also known as neural processing units (μNPUs), designed specifically for ultra-low-power applications.We present the first comparative evaluation of a number of commercially-available μNPUs, including the first independent benchmarks for multiple platforms. To ensure fairness, we develop and open-source a model compilation pipeline supporting consistent benchmarking of quantized models across diverse microcontroller hardware.
Our resulting analysis uncovers both expected performance trends as well as surprising disparities between hardware specifications and actual performance, including certain μNPUs exhibiting unexpected scaling behaviors with model complexity. This work provides a foundation for ongoing evaluation of μNPU platforms, alongside offering practical insights for both hardware and software developers in this rapidly evolving space.
Older versions
There are earlier revisions of this paper available below for historical reasons. Please cite the latest version of the paper above instead of these.
This is v1 of the publication from Mar 2025.
Benchmarking Ultra-Low-Power µNPUs
Josh Millar, Yushan Huang, Sarab Sethi, Hamed Haddadi, and Anil Madhavapeddy.
Working paper at arXiv.
