DSNS: Deep Sub-band Noise Suppression for Efficient Edge Processing
(Room 303A)
05 Nov 25
10:50 AM
-
11:15 AM
Tracks:
Hardware - System Design
Speech enhancement (SE) aims to improve the perceptual quality and intelligibility of speech signals contaminated by additive background noise. It is an indispensable component in speech recognition, hearing aids, and modern smart speakers. In recent years, the evolution of deep neural networks (DNNs) and the plethora of DNN-based speech denoising works confirm their superiority over traditional signal processing-based methods in low signal-to-noise-ratio conditions with non-stationary noises such as dogs barking, birds chirping, keyboard typing, etc. However, the large computational complexity and the model size of such models heavily limit their deployment on low-power edge devices for real-time applications.
To this end, we will present a deep sub-band noise suppression (DSNS) architecture, a low-complexity, lightweight online inference model. We introduce a two-stream network that uses sub-band features in the time-frequency domain. We will illustrate the architectural details and demonstrate the effectiveness of our SE system. We validate our model extensively using various objective metrics and subjective listening tests. We will also present popular speech enhancement architectures, their trends, and the associated challenges. We will provide the details on the real-time streaming implementation of DSNS on edge devices. Our model with 2.06M parameters attains competitive performance with the recent state-of-the-art models on the benchmark datasets.