AIStorm’s SpectroMic KWS is a key word spotting solution combining the AIS240A SpectroMic, a MEMs microphone, a smart activity detector (VAD), charge domain spectral engine and AI model libraries compatible with popular microcontrollers allowing rapid deployment of KWS solutions in IoT AI applications.
Key advantages of SpectroMic™ KWS
How It Works
Traditional analog MEMS microphones stream a continuous analog signal that an always‑awake MCU must digitize—or designers switch to digital mics that consume hundreds of µW and add cost. Even when these legacy mics offer a voice‑activity detector (VAD), background noise often pushes them into high‑power mode, and their slow recovery from VAD can miss the first syllables of or entire words. SpectroMic fixes these problems; its charge‑domain spectral engine turns incoming sound into a compact spectral image and makes it available digitally through the SPI bus, while a smart VAD can be used to wake the MCU and adapt to ambient noise. Or for smart speaker enabled devices, only required spectra needs to be stored for the rolling buffer, minimizing power and memory but still providing the restored digital time domain information required by online branded smart speaker verification systems when necessary.
Background Noise Adaption
In the video to the left SpectroMic is adapting to background noise in a bar. The two LEDs indicate at first that SpectroMic is being triggered almost continuously. After a short time, however, SpectroMic has adapted to the background noise and the LEDs go dark indicating that there is no spectral content of interest. During the adaption period SpectroMic is back to drawing its 18uA input current until spectra of interest is found.
Despite the background noise, once we hear words of interest on top of the noise of the bar, SpectroMic is still able to process these words. This can be seen from the activity from the LEDs in response to the spoken words. In fact the LED colors indicate the word recognized.
Google 10 DataSet Example
In the video below SpectroMic KWS is recognizing words from the Google 10 dataset. See how quickly the words are recognized (see the red box showing the recognized words). In this example the Google 10 dataset is implemented using only 23.7k parameters, a very low cost Raspberry Pi microcontroller (RP2040), with an inference time of 261ms, and an accuracy of 90.16%. This model is available for download and more complex models are also available for Raspberry Pi and other microcontrollers. This implementation demonstrates how SpectroMic’s spectral engine takes the burden from the microcontroller such that even a low end low cost microcontroller, or a portion of the resources from a host microcontroller doing other things, can be used to implement voice interaction.
Spectral Rolling Buffer (Charge Domain)
In smart speaker applications such as Alexa(TM), Bixby(TM) or Siri(TM), it is necessary to maintain a rolling buffer for verification by online systems before acceptance of a word by an edge device. For example it might be necessary to continuously store the last 2 seconds of acoustic information and thereafter once the local edge systems believes that it has identified a word then that 2s of information and the word that is believed to be identified is sent to the cloud for verification. Normally a 16kHz ADC continuously stores this information requiring 32kB of memory and using a lot of power. SpectroMic KWS can reduce this power by 10x and reduce the amount of memory by 8x by storing instead only the necessary components of the digital output of the spectral engine. More specifically, these online systems do not require all the spectra that makes sound pleasurable to our ear. Instead they need the minimum for their AI algorithms to identify a word. This information is a subset of the overall bandwidth that is stored in a standard system and also is compacted by its conversion to AIStorm’s spectral format.
*Alexa, Bixby and Siri are Trademarks of Amazon, Samsung and Apple respectively.
Package
SpectroMic is packaged in a standard 5.5×5.5mm microphone “can” assembly. A minimum of external components is required minimizing area requirements.
Key Word Spotting (KWS) · Sound Spotting (eg. Glass Break, Gunshot) · Heart Rate Variability (HRV) · Vibration Monitoring · Audio Monitoring