Past Projects

Musical Interactions in Networked Experiences using Real-time Virtual Audio (MINERVA)

We will develop long-term sustainable solutions using hybrid paradigms whereby in-person interaction at musical events will co-exist with remote engagement of both performers and listeners through the use of VR.

Read more

Immersive and Interactive Audio

Building Upon the Limitations of Sound Localisation in Elevation: Finding Directional Enhancement Through Vibrotactile Induction

Through asymmetrical head positioning and sight verification, our current solutions to the issues of sound localisation within elevation don’t necessarily lend themselves to fixed visual stimuli in immersive media platforms such as gaming, art or film. This project will explore the multimodalities of a vibrotactile solution across the surface area of the scalp. Can this method in fact fill in the areas of the binaural hearing system where neurological signal processing becomes inaccurate? The aims of this venture are to build a new branching standard dataset to apply to the HRTF database. That is, a computational algorithm that represents both the binaural hearing model and a vibrotactile mapping of the scalp. This is to discover whether these methods can enhance our ability to perceive directionality inside the area of elevation where a fixed head position can rule out our current multisensory solutions. Can this data driven movement change the way in which product development occurs, autonomous systems learn, sensory diagnostics are performed, or even varieties of certain simulations are carried out?

Immersive and Interactive Audio

The Ghosts in the Machine: Conveying the power of opera as a storytelling artform through interactive immersive audio

Project Partners: Opera North, XR Stories, James Bulley, Lusion

This project seeks to explore the potential of interactive, spatial audio technology to draw a tech-savvy demographic to engage with a classical opera. The eerie score of Opera North’s 2020 production of The Turn of The Screw takes on a new life in an interactive trailer. This captures the haunting atmosphere of the opera by bringing together anechoic vocal recordings, recordings made on location at Yorkshire Sculpture Park and stunning visuals produced by Lusion using the latest Web generative graphics technology. The recordings are spatialised using HRTFs built into Web Audio API. The user hears the characters and instruments moving as they move through the visual environment. Further to this, they can interact with the audio, looking around and ‘zooming’ in on the unnerving anechoic vocal lines. The unnerving and ambiguous narrative is thus enhanced by manipulating the way it is listened to, the listener able to control the perspective through which this thrilling story is told. The trailer can be watched here.

Voice Science | Health and Wellbeing

Improving cleft palate treatment with acoustics: Detecting Velo-Pharyngeal Insufficiency using external noise excitation

Speech problems caused by velopharyngeal insufficiency (VPI) are the commonest complication of a cleft lip/palate. Existing diagnosis and treatment methods require access to highly specialised speech and language therapists and use equipment that is invasive and uncomfortable. This project aims to develop a new, non-invasive technique using vocal tract transfer functions to indicate hyper- and hypo-nasality.

Voice Science | Immersive and Interactive Audio

Dynamic Interactive Voice for 3D Audio Communication

Within virtual and augmented reality (VR/AR) environments, sound sources are usually modelled with a very simple radiation pattern that assumes that the source radiates equally well in all directions. However, the actual reality is that sound source directivity is extremely complex, is frequency dependent and has strong spatial properties. This is particularly true of the human voice and there is an increasing demand within the VR/AR industry to create more plausible and realistic source directivity patterns for use with next generation cinematic, gaming, social media, e-learning and tele-visual services amongst many other applications. This project will look at the development of a more sophisticated model for the directivity of the human voice.

Immersive and Interactive Audio

Creative affordances of orchestrated devices for immersive and interactive audio and audio-visual experiences

Project Partner: BBC R&D, XR Stories

This project will explore the creative possibilities of orchestrated media devices (such as smartphones, tablets, virtual assistants) and object-based media (where content and content metadata are considered separately and in parallel) for the delivery of immersive and interactive audio and audio-visual storytelling experiences. The project will focus on development and testing of new orchestrated audio and audio-visual storytelling experiences from a production and/or an end-user viewpoint. On the production side, aspects such as workflow, tools, and how to express, capture, and represent creative intent will be investigated. The experience of the end user will be considered in order to clearly ascertain, demonstrate, and improve upon the tangible benefit to audiences of this type of content reproduction.

Immersive and Interactive Audio | Voice Science

Lag-free audio communication in a multi-user virtual reality environment

Project Partner: XR Stories

Vocal interaction is a fundamental aspect of any multi-user virtual environment. In any networked system, however, an unavoidable latency is associated with the transport of audio between users of shared virtual environments. Such latency is recognised as having a significant effect on the ability of multiple users to engage in ‘natural’ vocal interaction, where no latency-coping technique need be employed on the part of users.
This project investigates the limits of latency within which ‘natural’ vocal interaction is possible, and the relationship between the shared acoustic environment presented and these latency limits. This research will assess the capacity of state-of-the-art networking technology to operate within these limits, and culminate in the specification and deployment of such a system. Such work has implications across a wide range of Virtual, Augmented and Mixed Reality applications, including Telecommunications, Healthcare, Education and Arts and Entertainment
Media, as well as informing Adaptive and Intelligent System Design.

Immersive and Interactive Audio

Evaluation of Spatial Audio Codecs

Project Partner: Google

This project seeks to evaluate spatial audio quality for delivery in the browser using Ambisonic systems and low bit-rate compression schemes. The aim is to improve objective audio quality metrics through extensive spatial audio quality assessment. This is achieved through subjective timbral and localisation accuracy studies with different Ambisonic orders and compression rates. The listening tests are undertaken within our 50-channel spherical loudspeaker array housed at the Audiolab. The evaluation consists of loudspeaker listening tests as well as headphone listening using generic head-related impulse responses (HRIRs) and measured individualised HRIRs using different Ambisonic orders, soundscapes and compression parameters. The results of the test affect the development of next generation spatial audio deliver through the browser.

Immersive and Interactive Audio

Recording Workflows for Virtual and Augmented Reality

Project Partner: Abbey Road Studios

As VR technologies move towards systems that can implement 6DOF video capture, it follows that good strategies must be employed to create effective 6DOF audio capture. In a musical context, this means that if we record an ensemble then we must give the end user the potential to move close and even around audio sources with a high degree of plausibility to match the visuals. This research looks at parameterisation of an acoustic environment and recording strategies for live music performance to enable full 6DOF VR content. The work also investigates how recording strategies can be deployed for augmented reality scenarios, where the musicians are brought to your living room.

Immersive and Interactive Audio

Improvements in Binaural Based Ambisonic Decoding

Binaural based Ambisonic decoding is widely used in immersive applications such as virtual reality due to its sound field rotation capabilities. Ambisonics can theoretically recreate the original sound field exactly at low frequencies, but high frequencies are inaccurate due to the limited spatial accuracy of reproducing a physical sound field with a finite number of transducers, which in practice causes localisation blur, reduced lateralisation and comb filtering spectral artefacts. The standard approach to improving Ambisonic reproduction is to increase the order of Ambisonics, but this comes at the expense of more convolutions in binaural decoding. Therefore, this work concentrates on improving high frequency rendering of binaural Ambisonic decoding within the same order of Ambisonics, using HRTF pre-processing techniques such as time alignment, ILD optimisation and both diffuse-field and directional equalisation.

Immersive and Interactive Audio

Optimised Binaural Rendering for Everyone

3-D Audio reproduction over headphones is at the heart of virtual reality (VR) and augmented reality (AR) experiences. However, there are many challenges involved in the accurate or plausible rendering of virtual sound scenes for everyone. In the first instance, individualised head-related transfer functions (HRTFs) are crucial for three-dimensional (3D) audio reproduction through headphones or stereo loudspeakers to aid good timbral and localisation fidelty. Similarly, accurate, but efficient rendering of immersive sound scenes in VR relies on convincing reverberation. For AR this becomes even more problematic, as the virtual rendering music match well to real world acoustics. This project seeks to address these issues through optimised binaural selection, binaural filter modification and real-time adaptive room reverberation. The work focuses largely on the use of machine learning to help the development of new strategies for binaural rendering in VR and AR.

Immersive and Interactive Audio

3D Boundary Localisation and Room Geometry Estimation from Spatial Room Impulse Responses

Room impulse response analysis provides a unique insight into the reverberant characteristics of an environment, and is used in numerous aspects of acoustics. In this project we explore the inverse acoustic analysis problem of geometry inference, for convex and non-convex room shapes, using spatial room impulse responses. Through spatiotemporal decomposition of the spatial room impulse response, the time- and direction-of-arrival of individual reflections is estimated. This spatiotemporal information is then used to define an inverse acoustic model which estimates the most likely reflection path for each detected reflection. The reflective boundaries present within the measurement environment are then inferred from the inverse model, and processed such that a watertight estimation of the room’s geometry is produced.

Environmental Soundscapes

Evaluation of Environmental Soundscapes

In this project, we’re researching how machine learning and spatial audio can be used to assist in our understanding of the environmental sound of the places we live in. The prevailing measures of environmental sound consider only absolute sound pressure levels with no consideration for the content of those sounds, which have been shown to be critical for human soundscape perception. The EigenScape database of acoustic scenes was recorded in high-order Ambisonic format using the mh Acoustics Eigenmike for this project, and we are analysing these recordings using spatial audio techniques.

Environmental Soundscapes

Auralisation of Environmental Soundscapes

Auralisation is the technique that creates audible sound files from numerical data. It is used to create the experience of acoustic phenomena in a virtual space. In this project, the application of auralisation has been extended from indoor to the outdoor environment, with specific focus on the traffic noise rendering. A traffic flow sound synthesizer is developed in this project, which can be utilized to generate a variety of audible experience based on different road traffic conditions. We are doing listening tests on the plausibility of the synthetic traffic flow sound from a psychoacoustical perspective.

Immersive and Interactive Audio | Environmental Soundscapes

Intelligent Sound Design

Project Partner: BBC R&D

In recent years artificial intelligence techniques, such as machine learning, have been applied to a range of technical and creative audio problems such as automatic mixing, digital effects application, music composition, and soundscape classification and analysis. Those tasks mainly being seated within the realms of what can be broadly defined as intelligent music production and environmental soundscapes. What has been less explored is the application of artificial intelligence techniques in the design of soundscapes for the emerging immersive content within the areas of film, TV, and radio dramas. This project aims, through the study of production workflows relating to the sound design of immersive content, to introduce novel tools utilising current AI techniques that will serve to enhance current practices and further expand the creative possibilities currently available to sound designers.

Health and Wellbeing

Noise at Night Time

Around a third of your life time is spent sleeping, and good quality sleep may be crucial for good brain function. Poor quality of sleep has been linked to a wide range of illnesses such as hypertension and heart disease. Environmental noise is becoming more pervasive, and that is a big concern to the councils, governments and engineering groups who plan and build airports, railways and cities. Current techniques for predicting the impact of noise on an urban area are based on noise dose concepts that are relatively crude, and curves that are intended to represent the relationship between awakening and noise dose which are overly simplistic. The noise at night time project is an exploration of environmental noise and sleep, with the aim of improving the way we can use objective measurement and mathematical models to optimise the control of environmental noise sources in future urban designs.

Immersive and Interactive Audio | Health and Wellbeing


Children with autism experience problems in social-emotional interaction, communication and display repetitive behaviours. Additionally, they often have difficulties in processing everyday sensory information, with a particular issue being extreme hypersensitivity to sounds. Unfortunately, common environmental sounds (e.g. vacuum cleaners or toilet flushing) can trigger severe fear related behaviours such as crying or hitting the ears. Presently, the most successful therapy to tackle this is systematic desensitisation. Based on the idea that the child is afraid of the sound, it involves gradually exposing them to the feared noise during positive activities such as play. SoundFields is an interactive immersive experience developed to address auditory hypersensitivities in autistic children. It creates realistic and dynamic representations of problematic auditory stimuli using virtual 3D audio and incorporates them into a computer game platform. Therefore, creating fun and engaging environment for exposure therapy. By using this technology therapy could become more accessible, improving quality of life for the child and their family.

Health and Wellbeing


The key to effective design of new anticancer drugs is the ability to target specific biomolecules by carefully engineering the 3-D shape of new drug molecule to reach the optimal docking between a drug and target-enzyme. Currently chemists and biochemists can ‘tune’ this energetic interaction purely with the aid of visual software. The process is imperfect, slow and often misses some of the principal electronic interactions. However, recent research shows that auditory display – displaying data as sound – has potential alongside visual representations to improve the efficiency and speed of this aspect of drug design. The project combines an audio-visual game and a spatial audio-visual performance to engage the public with the challenges of the drug design process. Video: documentary, performance.

Voice Science

Sing from Your Seat

Funder: AHRC

This project enables an entirely new user community access to the exciting outputs of the Hills are Alive research project. In addition to taking a bespoke VR experience to care homes in York to engage directly with the new audience of elderly residents, we will develop creative and innovative engagements with the care home sector whereby they can provide our experience as an activity via new technology workflows that enable free, easy access to the material within their resource and skills requirements. Project website:

Immersive and Interactive Audio


The Architexture series is a collection of vocal compositions specifically designed to make use of particular acoustic spaces, blurring the lines between architecture, composition, acoustics and technology. All works in the series have been composed by Professor Ambrose Field, with acoustic measurements, analysis and design carried out by a team of AudioLab researchers. Architexture I was a large-scale polyphonic work for a ten-voice choir, designed for and performed in York Guildhall. Architexture II was a compositional work for a six-voice choir intertwined with technology to bring the past to life. It was designed for a space knocked down nearly 500 years ago – St Mary’s Abbey in York. Architexture III is an upcoming compositional work for a four-voice ensemble, making use of augmented reality technology to bring a selection of real acoustic spaces to life from a single space.

Voice Science

Optimising blend in close harmony unaccompanied singing

Barbershop is defined by unaccompanied, close harmony, homophonic, four note chords. The melody line is in the second voice (typically below the 1st and 2nd formants), precise tuning using just intonation is paramount and vibrato is eliminated. Good barbershop singing produces “lock and ring” where the overtones of one voice part are precisely tuned to reinforce the fundamentals and overtones of other voice parts resulting in ‘expanded sound’. As part of this research it is postulated that an analysis of the frequency spectrum of sustained chords produced by a vocal ensemble will give a useful objective descriptor for characterising an ensemble’s vocal “footprint”. Furthermore, by identifying the key parameters which correlate with human assessments of “good” blend, strong indicators for improving the performance of an ensemble will be identifiable.

Gender Diversity In Audio

We are all affected by the male dominance of the music technology and audio industries. We want to do something about it by characterising the current state of the industry and investigating ways forward to improve gender diversity in the future.

Immersive and Interactive Audio

Near Field Binaural Improvements

Binaural techniques are at the heart of many immersive audio techniques, however, they are susceptible to various factors which degrade the quality of the audio. In this project, we are looking at possible improvements to loudspeaker-based binaural techniques using properties of the near field, specifically through the means of computational simulation.

Immersive and Interactive Audio

Single-Channel Audio Source Separation

The estimation and extraction of sounds coming from different sources in single-channel audio recordings is a challenging task. The algorithm being developed decomposes the original input mixture into a set of note events, which are estimated and extracted following an automatic iterative approach. Note events are considered to be harmonic sounds, characterised by a continuous pitch trajectory, that can be clustered to form individual sources. Results so far show the benefits of the proposed method in applications such as multipitch estimation, mono-to-stereo conversion, wav-to-midi conversion, harmonic and percussive source separation, lead and accompaniment separation, among many others.

Environmental Soundscapes

Bioacoustic Identification of Species

Human activities such as deforestation are contributing to the loss of whole species. In order to gauge the rate of biodiversity loss and assess conservation attempts, rich biodiversity monitoring data is required. The current amount of available monitoring data is insufficient, continuous data is difficult to collect, and trained taxonomists are required for accurate species identifications. This project is using audio signal processing and deep learning techniques to engineer systems that automatically identify species by recording and processing their social calls. Components of this project include; The collection of a UK species call audio recording database, the development of intelligent sound recognition algorithms, and the construction of species identification and soundscape analysis systems embedded in micro-controller systems and smart-phones.

Immersive and Interactive Audio

Affect and Emotion in Games Sound Design

Funder: EPSRC IGGI Programme

Recent advances in high definition video displays and 3-D headsets, coupled with motion tracking and biosensor technologies, have enabled video games to reach unprecedented levels of visual immersion and interaction. There is little research however on how the aural feedback of the player, which can help assess their emotional state, can be used to inform the game intelligence and affect the emotive impact of the game. Furthermore, improvements in domestic surround sound and binaural technology are paving the way for fully enveloping and realistic soundtracks that extend the gameplay beyond the visual and can significantly enhance the emotive experience. This project investigated how current sensor and tracking technologies could be enhanced through analysis of player aural reactions such that the game intelligence can, in turn, provoke a conditional response via the reproduced soundtrack. This research is conducted in conjunction with the EPSRC funded Intelligent Games and Games Intelligence (IGGI) programme.

Immersive and Interactive Audio

SADIE: Spatial Audio for Domestic Interactive Entertainment

The SADIE project addressed the improvement of spatial audio quality for immersive interactive media experiences in the home. It undertook novel science to improve soundfield immersion and the formation of 3-D sound sources beyond the horizontal plane, lifting the constraints of loudspeaker placement and dynamic source-listener movements whilst conserving good sound reproduction quality. The research pioneered new methods for soundfield rendering formed through characterisation of the cues required for perception of sources with height in dynamic listening. The work has had a significant transformative impact on headphone based binaural reproduction through the adoption of the SADIE filters into the Google VR pipeline. To learn more go to

Immersive and Interactive Audio

The Morphoacoustics of Human Hearing

Practically all external sounds reach the human auditory system via the left and right ear canals. Despite being limited to only two channels of auditory information, we are capable of determining the direction of a sound source typically to within a few degrees. Acoustically, this impressive performance can largely be accounted for by the auditory spatial cues of inter-aural time difference, inter-aural level difference and the pinna (outer ear) cues. The cues are embedded in a family of acoustic filters known as head-related transfer functions (HRTFs). HRTFs are created as a result of the complex shape (or morphology) of the ear flaps, the head and upper torso. With sufficient knowledge of an individual’s morphology it is possible to calculate the associated unique set of HRTFs. This is currently a difficult and computationally intensive process, but nevertheless may ultimately be easier than measuring them acoustically. Finding an efficient way to estimate individualised HRTFs is viewed by many as the key to achieving widespread exploitation of 3D personal audio. Our research is contributing to this goal in several ways.

Immersive and Interactive Audio


What would it be like to listen to a mountain? Or hear rock formations as they grow? Eonsounds explores different techniques for representing geological processes such as meteorite impacts, sea level change, erosion, mountain building, earthquakes, volcanoes, plate tectonic motions in sound and music. Our aim is to use stratigraphic timelines outlining various geological processes as inspiration for musicians and sound-artists, in order to open up new ways of understanding geological time, and to examine the relationship between geological and musical processes. In particular, we are interested in exploring the links between geology and musical systems which employ complex systems of time-division, such as the tala system in Hindustani classical music.

Immersive and Interactive Audio | Voice Science

Virtual Singing Studio

How is musical performance, and singing performance in particular, affected by the room acoustic characteristics of the performance space? The Virtual Singing Studio, a loudspeaker-based room acoustics simulation, is aurally interactive in real-time. In other words, singers can perform in the Virtual Singing Studio – 16 loudspeakers in a studio room – but hear themselves as if they are performing in a real performance space such as a church, concert hall or theatre. This allows us to make more true-to-life recordings and analyse the conscious and unconscious changes that singers make to their performance as they adjust to the acoustic characteristics of the venue in which they are performing (virtually).

Health and Wellbeing

SINGSVR: Simulating Inclusive Natural Group Singing in VR

This project sought to trial a semi-controlled protocol to assess the potential benefits of group singing in a real compared to a virtual environment. Two new community choirs were formed and undertook weekly rehearsals for 6 weeks. Rehearsals alternated between a real condition )when participants were in situ with the other singers taking part live as part of the group) with a virtual reality experience( during which the singer took part in a physically solitary setting, wearing a virtual reality headset and participating as part of a recording of the yoked choir). Measures were taken during and after each rehearsal associated with stress and arousal levels (heart rate and galvonic skin response), emotional response and singing voice parameters.

Voice Science

Expressive Non-Verbal Communication in Ensemble Performance

This project exploits electrolaryngography and acoustic analysis to explore synchronisation and other features of expressive ensemble performance in vocal ensembles. This work is part of the AHRC funded WRoCAH Network ‘Expressive non-verbal communication in ensemble performance’ with the Universities of Leeds and Sheffield (

Voice Science

Tuning in Singing Ensemble Performance

Working closely with the Music Department at the University, in particular Robert Hollingworth’s MA in Solo voice ensemble Singing, conducting experiments into choral blend (including parameters such as tuning and vibrato), and impact of pedagogy in ensemble singing. Acoustic analysis and electrolaryngography are employed to analyse aspects of voice that characterise good ensemble singing.

Voice Science

Perception and Production of Resonance Tuning in Soprano Singing

This project utilise 3D MRI and direct measurement of the transfer function of a singer’s vocal tract combined with acoustic analysis, to investigate the methods used by professional female opera singers to sing high notes. In particular, based on the acoustic challenges of singing vowels at high fundamental frequencies, this work considers how sopranos alter the shape of their vocal tract to produce sound more efficiently and how this impacts on the perception of their overall voice quality.

Voice Science


This interdisciplinary project involves development of an algorithm to discriminate babble from other sounds that infants make, in real time, based on the input from an ipad’s inbuilt microphone. BabblePlay is the current app, which rewards the infant with moving images when they produce voiced sounds.
Infant babble (sequences of consonants and vowels, eg, /bababa/) is thought to underpin the development of accurate consonant production. The age at which babble begins and the extent of babble can reliably predict later progress in speech development. BabblePlay responds to an infant’s voiced utterances, but not other vocalizations (squeals, bangs, unvoiced sounds), visually reinforcing naturally occurring babble. In the future, we plan to develop this as a clinical device for infant populations whose babble and first words are delayed.
Project website:

Voice Science

The Hills are Alive: Combining the Benefits of Outdoor Environments and Group Singing through Immersive Experiences

This project uses immersive virtual reality technology to bring to life a key moment in the formative history of the UK’s National Trust. The main outcome of the research is an immersive and interactive virtual reality installation where individuals can participate in a group singing event on a Lake District mountain summit in virtual reality, commemorating the 1923 gift of land to the nation in memory of those who lost their lives in the Great War 1914-18. The project draws together the various threads of the challenge in a unique way, uniting memory, place and performance in a compelling realisation of the power of immersive experiences to transform historical events through cultural participation. Current research identifies important health and wellbeing benefits associated with both fell-walking and group singing, and this project provides opportunities not just for able-bodied participants to enjoy the multiple wellbeing benefits of singing on mountain summits, but uniquely will provide opportunities for those otherwise unable to access such activities to do so through an immersive virtual reality experience. This project also develops relationships with cultural heritage partners in the Lake District and beyond, and develops a workflow for the capture and virtual reality reproduction of choir performances in inaccessible but culturally significant locations, to permit the application of the technology in a wide range of installations and exhibitions in the future. We have taken these experiences as an installation at Keswick Museum and to the Lakes Alive Festival and over 200 people have tired and enjoyed our virtual choir. This project won New Partnership of the Year at Cumbria Life Awards in 2019 and a National Trust Outsetanding Achievement Award

Health and Wellbeing | Voice Science

WISHED – Well-being: Investigating Singing Health in Ensembles through Digital Technologies

Imagine singing in a choir, in the performance venue of your choice, reaping the rewards of this shared experience from the comfort of your own home. This project uses state of the art Virtual Reality (VR) technology to quantify the health and wellbeing benefits of group singing. We have developed a prototype immersive virtual reality reconstruction of a choral singing scenario, where a player can sing along with the rest of a virtual quartet using an immersive audio-visual system. This work explores the use of VR technologies to design an experiment that investigates the social and interactive aspects of singing, whereby participants perform both as part of a live quartet and in a virtual setting. For the latter, they will wear a VR headset that allows for complete immersion in the virtual performance venue. The user hears and sees themselves as a member of a vocal quartet, singing with pre-recorded singers. Measures of health and wellbeing will include self-reported stress, arousal and enjoyment levels as well as heart-rate, galvanic skin response, neural activity and vocal fold activity.

Immersive and Interactive Audio

Evaluating the Perceived Quality of Binaural Technology

The primary aim of this work is to improve the quality of headphone listening experiences for entertainment media audiences by developing and evaluating binaural technology. The ambition is to understand how best to apply binaural techniques in these applications. This will be driven primarily through evaluation by human listeners of the perceived quality of binaural technology systems, as well as their application to presentation of audio or audiovisual entertainment media. It will also be driven by the context in which this applied project operates i.e. the technological and economic factors that influence the way in which binaural technology may be applied in practice.

Immersive and Interactive Audio

The Impact of Multichannel Game Audio on the Quality of Player Experience and In-game Performance

Multichannel audio is a term used in reference to a collection of techniques designed to present sound to a listener from all directions. This can be done either over a collection of loudspeakers surrounding the listener, or over a pair of headphones by virtualising sound sources at specific positions. The most popular commercial example is surround-sound, a technique whereby sounds that make up an auditory scene are divided among a defined group of audio channels and played back over an array of loudspeakers. Interactive video games are well suited to this kind of audio presentation, due to the way in which in-game sounds react dynamically to player actions. Employing multichannel game audio gives the potential of immersive and enveloping soundscapes whilst also adding possible tactical advantages. However, it is unclear as to whether these factors actually impact a player’s overall experience. There is a general consensus in the wider gaming community that surround-sound audio is beneficial for gameplay but there is very little academic work to back this up. It is therefore important to investigate empirically how players react to multichannel game audio, and hence the main motivation for this project. The aim was to find if a surround-sound system can outperform other systems with fewer audio channels (like mono and stereo). This was done by performing listening tests that assessed the perceived spatial sound quality and preferences towards some commonly used multichannel systems for game audio playback over both loudspeakers and headphones. There was also a focus on how multichannel audio might influence the success of a player in a game, based on their in-game score and their navigation within a virtual world. Results suggest that surround-sound game audio is preferable over more regularly used two-channel stereo systems, because it is perceived to have higher spatial sound quality and there is an improvement in player performance. This illustrates the potential for multichannel game audio as a tool to positively influence player experiences, a core goal many game designers strive to achieve.

Immersive and Interactive Audio

FDTD Schemes for 3D Room Acoustic Simulation

Finite Difference Time Domain (FDTD) methods have been demonstrated as being appropriate for modelling sound propagation in bounded environments. Work at York has focused on a number of key areas including the development of frequency dependent absorbing and diffusing boundary conditions, sound source directivity and spatial encoding of the simulated soundfield. However, despite giving good results, comparable with both actual measurements and other modelling methods, the synthesis of full audio bandwidth impulse responses suitable for auralization is a computationally intensive and demanding process, making it prohibitive for large spaces. Although parallel implementations for GPUs have brought computation times down to reasonable limits, memory bandwidth puts a practical upper limit on the maximum frequency that can be reproduced accurately. New methods take a hybrid approach, combining both geometric (e.g. ray-tracing) and wave-based (FDTD) modelling strategies optimised across frequency bands. Alternatively, perceptual methods consider what needs to be modelled, rather than what should be modelled, with a view to the development of real-time, interactive auralization design tools.

Environmental Soundscapes

Open Air Acoustic Impulse Response Library

The AHRC funded Open Acoustic Impulse Response Library (OpenAIR) project brings together resources relating to much of our work in Virtual Acoustics. Sites that we have acoustically surveyed, together with results generated from acoustic models are documented and archived, providing a rich resource of impulse response measurements for commercial audio software and computer games developers, composers, musicians and sound artists. The impulse response data is available in various common spatial audio formats (stereo, B-format, 5.1), and access to the raw audio information is also possible for deriving new versions where needed. The impulse response database is also supported with a variety of anechoic material. Most recent work has involved developing OpenAIR for the Web Audio framework. OpenAIR data can be found in leading digital audio workstations Ableton Live, Propellerheads Reason and Presonus Studio One and has also been used by Codemasters in their driving games and the BBC.

Environmental Soundscapes

Auralisation of an Urban Soundscapes

A novel approach to qualitative soundscape assessment is presented in the form of a virtual sound walk. The virtual sound walk method combines acoustic modelling and auralisation techniques with qualitative assessment methodologies from the field of soundscape research. A case study is presented in which the method is used to predict the impact of noise interventions on the perceived sound quality in an urban environment. Prior to the auralisation, a real sound walk is carried out in the case study environment. By comparison of the results from the two sound walks, it is found that the virtual sound walk elicits experiences similar to those of the real sound walk. The suitability of the approach for assessing the impact of potential noise treatments is also evaluated and found to have merit, though it is acknowledged that further refinements are needed. It is thought that the virtual sound walk may be able to assist acousticians and urban designers in a practical sense, as well as helping to promote human-environment interaction in other creative applications.

Environmental Soundscapes

Strategies for Environmental Sound Measurement Modelling and Evaluation

This project investigates the measurement, modelling, and evaluation of environmental sound. In each of these areas, this body of work aims to make use of soundscape methodologies in order to develop an understanding of different aspects of our relationship with our sonic environments. This approach is representative of the nature of soundscape research, which makes use of elements of many other research areas, including acoustics, psychology, sociology, and musicology. The majority of prior acoustic measurement research has considered indoor recording, often of music, and measurement of acoustic parameters of indoor spaces such as concert halls and other performance spaces. One strand of this research has investigated how best to apply such techniques to the recording of environmental sound, and to the measurement of the acoustic impulse responses of outdoor spaces. Similarly, the majority of prior work in the field of acoustic modelling has also focussed mainly on indoor spaces. This project introduced the Waveguide Web, a novel method for the acoustic modelling of sparsely reflecting outdoor spaces. In the field of sound evaluation of sound, recent years have seen the development of soundscape techniques for the subjective rating of environmental sound, allowing for a better understanding of our relationship with our sonic surroundings. This project focussed on how best to improve these approaches in a suitably robust and intuitive manner, including the integration of visual stimuli in order to investigate the multi-modal perception of our surroundings. The aim of this work in making contributions to these three fields of environmental sound research is, in part, to highlight the importance of developing a comprehensive understanding of our sonic environments. Such an understanding could ultimately lead to the alleviation of noise problems, encourage greater engagement with environmental sound in the wider population, and allow for the design of more positive, restorative, soundscapes.

Environmental Soundscapes

Listening to the Commons

Listening to the Commons was an AHRC funded project which builds on the work spearheaded by the AHRC ‘St Stephen’s Chapel’ project. The project aims to recover the soundscape of debate as experienced by women listening through a ventilator in the old House of Commons c. 1800-34. It is a collaborative project which highlights the deep history of women’s participation in politics by developing and adapting visual models of the 1834 House of Commons into acoustic models to create contemporary auralizations (aural reconstructions) of speeches and debate. The project builds on the University of York and UK Parliament collaboration established through the AHRC St Stephen’s Chapel Project while establishing new interdisciplinary links between the Department of History with the Digital Creativity Lab and Department of Electronics at University of York. The results of the project were incorporated into the Vote 100 exhibition at Parliament in June 2018 allowing visitors to engage with the digital outputs and recover women’s experience of politics c. 1800-34.

Environmental Soundscapes

Psychoacoustic Perception of Geometric Acoustic Modeling

This project investigates if reliability in objective acoustic metrics obtained for an auralized space implies accuracy and reliability in terms of the subjective listening experience. Auralizations can be created based either on impulse response measurements of an existing space, or simulations using computer-based acoustic models. Validations of these methods usually focus on the observation of standard objective acoustic measures and how these vary under certain conditions. However, for accurate and believable auralization, the subjective quality of the resulting virtual auditory environment should be considered as being at least as important, if not more so. This study is focused in the most part on St. Margaret’s Church, York, UK. Impulse responses have been acquired in the actual space and virtual acoustic models created using CATT-Acoustic and ODEON-Auditorium auralization software, both based on geometric acoustic algorithms. Variations in objective acoustic parameters are examined by changing the physical characteristics of the space, the receiver position and the sound source orientation. It is hypothesised that the perceptual accuracy of the auralizations depends on optimising the model to minimise observed changes in objective acoustic parameters. This objective evaluation is used to ascertain the behaviour of certain standard acoustic parameters. From these results, impulse responses with suitable acoustic values are selected for subjective evaluation via listening tests. These acoustic parameters, in combination with the physical factors that influence them, are examined, and the importance of variation in these values in relation to our perception of the result is investigated. Conclusions are drawn for both measurement and modelling approaches, demonstrating that model optimisation based on key acoustic parameters is not sufficient to guarantee perceptual accuracy as perceptual differences are still evident when only a simple acoustic parameter demonstrates a difference of greater than 1 JND. It is also essential to add that the overall perception of the changes in the acoustic parameters is independent of the auralization technique used. These results aim to give some confidence to acoustic designers working in architectural and archeoacoustic design in terms of how their models might be best created for optimal perceptual presentation.

Health and Wellbeing

Enhancing Audio Description

‘Enhancing Audio Description’ or ‘Enhancing AD’ is a project that seeks to explore how sound design techniques can be used to rethink accessibility to film and television for visually impaired audiences. Research includes the application of surround sound rendering, interactive media systems and first person narration. The project stems from the idea that accessibility should not be an after-thought, it should instead be an intrinsic part of the creative workflows involved in film and TV productions. With this idea in mind the project is conducted in collaboration with a Project Advisory Panel that includes representatives from both the film/television industry and accessibility sector. ‘Enhancing AD’ is funded by the Arts and Humanities Research Council AH/N003713/1.

Health and Wellbeing

Spatially Informed Hearing Aid Algorithms

The healthy human hearing system is capable of performing well in a variety of adverse acoustic conditions. A listener who has a hearing deficit, however, even if it affects only one ear, typically finds it much more difficult to understand a conversation in the presence of competing sounds. Binaural hearing provides the auditory system with a means of distinguishing one sound from another based on their different locations. It also plays an important role in increasing intelligibility in the presence of room reverberation. We are investigating a wide variety of spatial cues and evaluating their potential for improving the intelligibility of speech in challenging acoustic environments. Our goal is to develop a binaural audio algorithm suitable for implementing in a binaural hearing aid. To this end, we are investigating ways of optimising the intelligibility of wanted speech by adaptively identifying the most important cues for doing so, depending on the acoustic environment.

Voice Science

Vocal tract modelling using FDTD schemes

Articulatory speech synthesis has the potential to offer more natural sounding synthetic speech than established concatenative or parametric synthesis methods. Time-domain acoustic models are particularly suited to the dynamic nature of the speech signal, and recent work has demonstrated the potential of dynamic vocal tract models that accurately reproduce the vocal tract geometry. This work presented a dynamic 3D digital waveguide mesh (DWM) vocal tract model, capable of movement to produce diphthongs. The technique is compared to existing dynamic 2D and static 3D DWM models, for both monophthongs and diphthongs. The results showed that the proposed model provides improved formant accuracy over existing DWM vocal tract models. Furthermore, the computational requirements of the proposed method are significantly lower than those of comparable dynamic simulation techniques. This work represents another step toward a fully-functional articulatory vocal tract model which will lead to more natural speech synthesis systems for use across society.