Deep learning for decomposing sound into vector audio
This is an idea proposed in 2024 as a Cambridge Computer Science Part III or MPhil project, and has expired. It may be co-supervised with
All that we hear is mediated through cues transmitted to the brain from the
cochlea, which acts like a bank of auditory filters centred at a wide range of
centre frequencies. A lot of our knowledge of hearing comes from
psychoacoustical experiments that involve simple sounds, like sine waves, whose
synthesis parameters are closely related to cues available beyond the cochlea.
However, for recorded sounds, many types of cue are available, but our use of
these cues is limited by the extent that these cues can be manipulated in a
controlled fashion. McDermott, J.H. and E.P. Simoncelli, Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron, 2011. 71(5): p. 926-40. Agus, T.R., et al., Fast recognition of musical sounds based on timbre. J Acoust Soc Am, 2012. 131(5): p. 4124-33.
The goal of this project is to apply deep learning tools to explore the extent
to which recorded sounds, such as speech, music and noise, can be decomposed
into components, such as modulated sine waves, that dominate independent
regions of activity on the cochlea. The training data would come from
combinations of basic sounds with known synthesis parameters and the
corresponding output from a differential auditory filterbank, which has
recently become available (Famularo Famularo, R.L., et al., Biomimetic frontend for differentiable audio processing. [pre-print], 2024.
(Note: this will be co-supervised with faculty from Queen's University, Belfast)