/ Ideas / Deep learning for decomposing sound into vector audio

This is an idea proposed in 2024 as a Cambrige Computer Science Part III or MPhil project, and is available for being worked on. It will be supervised by Trevor Agus and Anil Madhavapeddy as part of my Interspatial OS project.

Summary

All that we hear is mediated through cues transmitted to the brain from the cochlea, which acts like a bank of auditory filters centred at a wide range of centre frequencies. A lot of our knowledge of hearing comes from psychoacoustical experiments that involve simple sounds, like sine waves, whose synthesis parameters are closely related to cues available beyond the cochlea. However, for recorded sounds, many types of cue are available, but our use of these cues is limited by the extent that these cues can be manipulated in a controlled fashion. [1] [2]

The goal of this project is to apply deep learning tools to explore the extent to which recorded sounds, such as speech, music and noise, can be decomposed into components, such as modulated sine waves, that dominate independent regions of activity on the cochlea. The training data would come from combinations of basic sounds with known synthesis parameters and the corresponding output from a differential auditory filterbank, which has recently become available (Famularo[3]). The ability to control perceptually relevant parameters of arbitrarily complex sounds would be a powerful tool in hearing research and may have other applications in data compression and artificially generated sound.

  1. McDermott, J.H. and E.P. Simoncelli, Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron, 2011. 71(5): p. 926-40.

    ↩︎︎
  2. Agus, T.R., et al., Fast recognition of musical sounds based on timbre. J Acoust Soc Am, 2012. 131(5): p. 4124-33.

    ↩︎︎
  3. Famularo, R.L., et al., Biomimetic frontend for differentiable audio processing. [pre-print], 2024.

    ↩︎︎

Related Ideas