Automatic Upmixing of Legacy Audio Content for Metaverse Applications Using a Machine Learning Approach

Supervisor

Damian Murphy

Project Partner

BBC R&D

Project Summary

Immersive and interactive 360-degree media content creation demands new approaches and methods for efficient and effective workflows and that rise to the challenge of a user-centric, rather than author-centric, audiovisual perspective. Existing sound design practice emerges from traditional and established disciplines (e.g. film, TV), and recent research has identified some of the challenges for sound designers in creating content for new immersive media experiences. This includes the time-consuming nature of sound spatialisation, the lack of available spatial sound effects libraries, the integration of legacy stereo content into spatial productions, and the lack of standardisation and quality in the methods and tools used to emulate auditory source distance. This project will seek to address some of these issues by exploiting an existing proof-of-concept machine learning approach to enable the upmixing of legacy 2-channel stereo audio to spatial formats suitable for new and future metaverse experiences.

Project Description

Virtual Production and Metaverse content generation has the potential to extrapolate value from existing databases of digital assets and related content. Sound design for new immersive media experiences is still a relatively new area of practice with less well-defined methods requiring a new and still emerging set of skills and tools [1]. In particular it has been noted that audio practitioners face challenges in repurposing existing 2-channel stereo sound libraries for 360-degree interactive soundfield applications [2]. To this end a novel method of automatic upmixing from 2-channel stereo to 4-channel Ambisonic B-format has been proposed and tested [3]. This method uses deep learning to predict 360-degree time-frequency directional metadata from 2-channel stereo audio, which is subsequently used to extract and remap frequency components of a source signal to a target spherical harmonic representation. Results show that the system can learn and generalise the time/frequency content of sound sources present within the scene but has difficulty generalising to the ambient noise present within the scenes.

This research was co-created and conducted in partnership with BBC R&D, and now that a proof-of-concept machine-learning architecture has been developed, there is considerable further work to refine this approach, test its capabilities under a range of different input and output parameters (for instance, currently it works based on only one 2-channel stereo microphone configuration although other configurations were obtained as part of the baseline training data capture), audition the results with panels of listeners, and consider how this methodology might be integrated into future sound design workflows. One aim for this work would be the re-authoring of the original BBC Sound Effects Recordings suitable for future immersive formats. This in turn leads to associated research into the relevant ethical, procedural, and legal issues, not limited to copyright, IPR, GDPR, and data protection, of reusing such historical content.

Student Skills Requirements

The project will consider both research and development, as well as practice-based approaches, and user studies as needed. The successful applicant should have a strong interest in sound, music and immersive audio technology, an understanding of the application of machine learning in audio related tasks and good programming skills. This project is highly multi-disciplinary in its nature and we welcome applicants from a broad range of core research backgrounds and interests, extending from audio signal processing and machine learning to user experience design, human-computer interaction, as well as relevant creative practice. Suitable candidates must have (or expect to obtain) a minimum of a UK upper second-class honours degree (2.1) or equivalent in Computer Science, Electronic Engineering, Music Technology or a related subject. Prior research or industry experience would also be an advantage.

References

[1] Popp, C., Murphy, D.T., “Creating Audio Object-Focused Acoustic Environments for Room-Scale Virtual Reality”, Applied Sciences, 12(14), 7306, 2022. DOI: https://doi.org/10.3390/app12147306

[2] Turner, D., Pike, C., Baume, C., and Murphy, D.T., “Spatial Audio Production for Immersive Media Experiences: Perspectives on Practice-led Approaches to Designing Immersive Media Content”, The Soundtrack, 13(1), pp. 73-94, 2022. DOI: https://doi.org/10.1386/ts_00017_1

[3] Turner, D., “An Investigation into the Application of Machine Learning to Spatial Audio for Immersive Media”, PhD Thesis, University of York, 2023.

How to apply

Please go to the School of Physics Engineering, and Technology website and apply for a PhD in Music Technology. When applying please select ‘CDT in Sound Interactions in the Metaverse’ as your source of funding. You do not need to provide a research proposal, just enter the name of the project you are applying for.