Hardware Aware Efficient Model Design for Deployment on MCUs Using MicroNAS
(Room 303B)
04 Nov 25
3:00 PM
-
3:25 PM
Tracks:
Embedded AI: AI Efficiency
Deploying modern deep learning models—particularly object detectors—on microcontrollers (MCUs) is constrained by tight limits on Flash, RAM, power, latency, and lack of support for advanced operations. State-of-the-art models like those from the YOLO family or EfficientDet are too large and complex for direct deployment. Current approaches fall into two categories: handcrafted models like MobileNet- and EfficientNet-based detectors, or compression of larger models via pruning, quantization, decomposition, and knowledge distillation. While handcrafted models offer simplicity, they often compromise accuracy; compressed models retain performance but require careful optimization and finetuning. We present benchmarks such as MCU-Bench and YOLO-Bench to highlight the variability of model performance across hardware. To address these challenges, we introduce MicroNAS, a hardware-aware NAS method that compresses pre-trained models based on MCU constraints like RAM or latency. Using mixed-precision activations, decomposition, and hardware-in-the-loop search, MicroNAS achieves significant compression (2–30×) and/or speedups (2–3×). These results underscore the need for hardware-aware optimization, especially as MCUs and NPUs grow increasingly heterogeneous and as it is difficult to predict the model performance by reducing the complexity of the model since complexity reduction does necessarily mean lower memory utilization or lower latency.