Spatial Audio for HEAR 2021

Dear all,

When I was going through the abstract of the HEAR 2021 challenge, the questions posed in the abstract (“where does the sound come from?”) made me think about spatial cues (ITDs, ILDs) in an audio file. Having worked with HRTFs before, I thought it might be of great value if the learnt audio representation could also capture the originating location of the sound in space (azimuthal direction of arrival, elevation, etc.). Such spatial cues usually require the audio file to have at least two channels (stereo). However, the API mentions that it must intake only mono audio.

Hence, I wanted to check if there would be any emphasis on extracting spatial information from the embeddings in any of the evaluation tasks.



Most of the tasks do not involve spatial information. However, there are one or two secret tasks that do. Since evaluation is based upon classification-style learning, for those tasks our downstream predictor would use your embeddings to classify which of several microphones recorded the audio input.