New preprint on benchmarking ultra-low power neural accelerators
Josh Millar just released our latest preprint on how to make sense of the growing number of dedicated, ultra-low-power 'neural network accelerators' that are found in many modern embedded chipsets. My interest in this derives from wanting to decouple from the cloud when it comes to low-latency local environments, and this needs fast tensor operations in hardware. Josh found a huge number of interesting NPUs in modern low-cost chips, ranging from ESP32-based boards over to ARM ones. All of these have quite a variety of tradeoffs, from the operations supported (which affects which models can be run on them) to the amount of memory and CPU power.
Benchmarking Ultra-Low-Power µNPUs
Josh Millar, Yushan Huang, Sarab Sethi, Hamed Haddadi and Anil Madhavapeddy.
Working paper at arXiv.