Efficient Volumetric Scene-based audio with ZYLIA 6 Degrees of Freedom solution
What is the difference between Object-based audio (OBA) and Volumetric Scene-based audio (VSBA)?
The most popular method of producing a soundtrack for games is known as Object-based audio. In this technique, the entire audio consists of individual sound assets with metadata describing their relationships and associations. Rendering these sound assets on the user's device means assembling these objects (sound + metadata) to create an overall user experience. The rendering of objects is flexible and responsive to the user, environmental, and platform-specific factors [ref.].
In practice, if an audio designer wants to create an ambient for an adventure in a jungle, he or she needs to use several individual sound objects, for example, the wind rustling through the trees, sounds of wild animals, the sound of a waterfall, the buzzing of mosquitoes, etc. The complexity associated with Object-based renderings increases with the number of sound objects. This means that the more individual objects there are (the more complex the audio scene is) the higher is the usage of the CPU (and hence power consumption) which can be problematic in the case of mobile devices or limitations of the bandwidth during data transmission.
A complementary approach for games is Volumetric Scene-based audio, especially if the goal is to achieve natural behavior of the sound (reflections, diffraction). VSBA is a set of 3D sound technologies based on Higher-Order Ambisonics (HOA), a format for the modeling of 3D audio fields defined on the surface of a sphere. It allows for accurate capturing, efficient delivery, and compelling reproduction of 3D sound fields on any device (headphones, loudspeakers, etc.). VSBA and HOA are deeply interrelated; therefore, these two terms are often used interchangeably. Higher-Order Ambisonics is an ideal format for productions that involve large numbers of audio sources, typically held in many stems. While transmitting all these sources plus meta-information may be prohibitive as OBA, the Volumetric Scene-based approach limits the number of PCM (Pulse-Code Modulation) channels transmitted to the end-user as compact HOA signals [ref.].
ZYLIAs interpolation algorithm for 6DoF 3D audio
Creating a sound ambience for an adventure in a jungle through Volumetric Scene-based audio, can be as simple as taking multiple HOA microphones to the natural environment that produces the desired soundscape and record an entire 360° audio-sphere around devices. The main advantage of this approach is that the complexity of the VSBA rendering will not increase with the number of objects. This is because the source signals are converted to a fixed number of HOA signals, uniquely dependent on the HOA order, and not on the number of objects present in the scene. This is in contrast with OBA, where rendering complexity increases as the number of objects increases. Note that Object-based audio scenes can profit from this advantage by converting them to HOA signals i.e., Volumetric Scene-based audio assets.
To summarizing, the advantages of the Volumetric Scene-based audio approach affecting the CPU and power consumption are:
Zylia 6 Degrees of Freedom Navigable Audio
One of the most innovative and efficient tools for producing Volumetric Scene-based audio is ZYLIA 6 Degrees of Freedom Navigable Audio solution. It is based on several Higher Order Ambisonics microphones which capture large sound-scenes in high resolution, and a set of software for recording, synchronizing signals, converting audio to B-Format, and rendering HOA files. The Renderer can be also used independently from the 6DoF hardware – to create navigable 3D assets for audio game design.
ZYLIA 6 DoF HOA Renderer is a MAX/MSP plugin available for MAC OS and Windows. It allows processing and rendering ZYLIA Navigable 3D Audio content. With this plugin users can playback the synchronized Ambisonics files, change the listener’s position, and interpolate multiple Ambisonics spheres. The plugin is also available for Wwise, allowing developers to use ZYLIA Navigable Audio technology in various game engines.
Watch the comparison between Object-based audio and Volumetric Scene-based audio produced with Zylia 6 Degrees of Freedom Navigable Audio solution. Notice how the 6DoF approach reduces the CPU during sound rendering.
Volumetric Scene-based audio and Higher Order Ambisonics can be used for many different purposes, not only for creating soundtracks for games. This format is very efficient when producing audio for:
#zylia #gameaudio #6dof #objectbased #scenebased #audio #volumetric #gamedevelopment #GameDevelopersConference #GDC2021 #GDC
We are happy to announce the new release of the ZYLIA 6DoF Recording Application in version 1.0.0 for Linux and macOS. This application is a part of the ZYLIA 6DoF Navigable Audio solution (ZYLIA 6DoF VR/AR set). It replaces the command line toolkit for the recording and synchronization process. For your comfort, this application has a graphical user interface, so there is no need to use the command line anymore.
This application offers all features of the ZYLIA 6DoF Recording Toolkit such as:
Additionally, there are added few new features.
Configure you session.
Make the recording.
Synchronize raw audio files.
The command-line application will be also available but will not be further developed.
We are happy to announce the release of ZYLIA 6DoF HOA Renderer for Max MSP v2.0. (macOS, Windows).
This software is a key element of ZYLIA 6DoF Navigable Audio system. It allows you to reproduce the sound field in a given location based on Ambisonics signals recorded with ZM-1S microphones. The plugin works in Max/MSP environment so you can use this tool directly in your project. Please refer to the provided example project with our plugin and the manual for ZYLIA 6DoF HOA Renderer.
The newest ZYLIA 6DoF HOA Renderer for Max/MSP has a lot of improvements.
The new User Interface can be accessed in a separate window – it allows you to set up the whole configuration directly in the plugin, without using the message mechanism of Max/MSP. The UI allows you also to lock the microphones’ positions, preventing the unintentional change of the scene configuration.
The next thing is an increased number of supported signals, right now you can pass up to 30 HOA signals to the plugin, and create a 6DoF experience on much larger scenes.
If you would like to test this plugin, you can also use a 7-day free trial, and play around with our 6DoF audio rendering algorithm. The test recording data for the plugin can be found on our webpage.
Behind the scenes of the orchestra recording made with 30 Ambisonics microphones. How did we create a virtual stage with navigable audio?
Zylia in collaboration with Poznań Philharmonic Orchestra showed first in the world navigable audio in a live-recorded performance of a large classical orchestra. 34 musicians on stage and 30 ZYLIA 3’rd order Ambisonics microphones allowed to create a virtual concert hall, where each listener can enact their own audio path and get a real being-there sound experience.
ZYLIA 6 Degrees of Freedom Navigable Audio is a solution based on Ambisonics technology that allows recording an entire sound field around and within any performance imaginable. For a common listener it means that while listening to a live-recorded concert they can walk through the audio space freely. For instance, they can approach the stage, or even step on the stage to stand next to the musician. At every point, the sound they hear will be a bit different, as in real life. Right now, this is the only technology like that in the world.
6 Degrees of Freedom in Zylia’s solution name refers to 6 directions of possible movement: up and down, left and right, forward and backward, rotation left and right, tilting forward and backward, rolling sideways. In post-production, the exact positions of microphones placed in the concert hall are being mirrored in the virtual space through the ZYLIA software. When it is done, the listener can create their own audio path moving in the 6 directions mentioned above and choose any listening spot they want.
6DoF sound can be produced with an object-based approach – by placing pre-recorded mono or stereo files in a virtual space and then rendering the paths and reflections of each wave in this synthetic environment. Our approach, on the contrary, uses multiple Ambisonics microphones – this allows us to capture sound in almost every place in the room simultaneously. Thus, it provides a 6DoF sound which is comprised only of real-life recorded audio in a real acoustic environment.
How was it recorded?
* Two MacBooks pro for recording
* A single PC Linux workstation serving as a backup for recordings
* 30 ZM-1S mics – 3rd order Ambisonics microphones with synchronization
* 600 audio channels – 20 channels from each ZM-1S mic multiplied by 30 units
* 3 hours of recordings, 700 GB of audio data
Microphone array placement
The placement of 30 ZM-1S microphones on the stage and in front of it.
To be able to choose the best versions of performances, the Orchestra played nine times the Overture and eight times the Aria with three additional overdubs.
Simultaneously to the audio recording, we were capturing the video to document the event. The film crew placed four static cameras in front of the stage and on the balconies. One cameraman was moving along the pre-planned path on the stage. Additionally, we have put two 360 degrees cameras among musicians.
Our chief recording engineer made sure that everything was ready – static cameras, moving camera operator, 360 cameras and recording engineers – and then gave a sign to the Conductor to begin the performance. When the LED rings on the 30 arrays had turned red everybody knew that the recording has started.
A large amount of data make it possible to explore the same moment in endless ways. Recording all 19 takes of two music pieces resulted in storing 700 GB of audio. The entire recording and preparation process was documented by the film with several cameras. Around 650 GB of the video has been captured. In total, we have gathered almost 1,5 TB of data.
Post-processing and preparing data for the ZYLIA 6DoF renderer
First, we had to prepare the 3D model of the stage. The model of the concert hall was redesigned, to match the dimensions in real life. Then, we have placed the microphones and musicians according to the accurate measurements. When this was done, specific parameters of the interpolation algorithm in the ZYLIA 6DoF HOA Renderer had to be set. The next task was the most difficult in post-production - matching the real camera sequences with the sequences from the VR environment in Unreal Engine. After this painstaking process of matching the paths of virtual and real cameras, a connection between Unreal and Wwise was established. In this way, we had the possibility to render the sound of the defined path in Unreal - just as if someone was walking there in VR. Last, but not least - was to synchronize and connect the real and virtual video with the desired audio.
The outcome of this project is presented in “The Walk Through The Music” movie, where we can enter the music spectacle from the audience position and move around artists on the stage.
You can also watch the “Making of” movie to get more detailed information on how the setup looked like.
Zylia is building a future of immersive and fully navigable audio for Virtual Reality by creating an installation of 53 3rd order Ambisonics microphone arrays
Zylia introduced a six-degrees-of-freedom (6DoF) multi-level microphone arrays installation for navigable live recorded audio.
What does it mean? We are working on technology that gives people the possibility to listen to a concert or live performance from any point in the audio scene. With our technology you are able to record an audio scene from different points of the space – the center of the stage, from the middle of a string quartet, audience, or backstage. Audio recorded in such a way can be used together with virtual reality projections and allow the user to freely move around the space giving the natural experience of audio scene and the possibility to listen to it from different perspectives.
6DoF installation and test setup
The first step of test recordings was to install 9 3rd order Ambisonics microphone arrays on the same level and record musicians playing their performance. Such an approach allowed the listener to move around those 9 points and listen to their music from different perspectives. However, microphones placement on a single level introduced limitations in terms of audio resolution in the vertical plane.
Since we like challenges we decided to increase the number of microphones to 53 and build an installation on five different levels. It allowed us to freely move in every direction of the recorded scene in a truly immersive experience. The second idea behind this test setup was to check the limits of Ambisonics recordings in order to achieve a fully navigable audio scene. We placed the microphones arrays densely in the recorded scene and we received a spatial audio image of very high resolution.
We used 53 19-channel mic arrays – which gave us 1007 audio channels recorded simultaneously. Microphones were connected to a USB hub and the recordings were operated via a single laptop.
The audio recorded from each microphone array was converted to 3rd order Ambisonics using our ZYLIA Ambisonics Converter plugin (it can be done in real-time or offline). After the recording, we used our interpolation software. This software is a MaxMSP plugin, that generates 3rd order Ambisonics spheres based on the signal from all microphones in the position you are at the moment. When you put your headphones and VR headset and move around the space the algorithm in MaxMSP takes your position and interpolates 3D sound in the position you are at the moment.
We used 3rd order Ambisonics microphone arrays. It is important because the higher the order the more precision we get in the spatial localization of sound around the listener. We are able to recreate the sound with a very high spatial resolution which influences the audio quality - an extremely important aspect for listeners.
With this simple approach, you can record the natural audio scene for your VR/AR productions and use it right away without complicated work-flow in post-production. You can record live events and stream audio directly to the listener giving him the possibility to freely choose the position in this real-time recorded space for an ultimate immersive audio experience.
Cinematic trailers for VR, audio for games, live performances recording, domes with multi-loudspeakers installations
What would happen if on a rainy and cloudy day, during a walk along a forest path, you could move into a completely different place thousands of kilometers away from you? Putting the goggles on would get you into a virtual reality world, you would find yourself on a sunny island in the Pacific Ocean, you would be on the beach, admiring the scenery and walking among the palm trees listening to the sound of waves and colorful parrots screeching over your head.
It sounds unrealistic, but such goals are determined by the latest trends in the development of Augmented / Virtual Reality technology (AR / VR). Technology and content for full VR or 6DoF (6 Degrees-of-Freedom) rendered in real time will give the user the opportunity to interact and navigate through virtual worlds. To experience the feeling of "full immersion" in the virtual world, realistic sound must also follow a high-level image. Therefore, only each individual sound source present in virtual audio landscape provided to the user as a single object signal can reliably reflect both the environment and the way the user interacts with it.
What are Six Degrees of Freedom (6DOF)
"Six degrees of freedom" is a specific parameter count for the number of degrees of freedom an object has in three-dimensional space, such as the real world. It means that there are six parameters or ways that the object can move.
There are many possibilities of using a 6DoF VR technology. You can imagine exploring a movie plan in your own pace. You could stroll between the actors, look at the action from different sides, listen to any conversations and paying attention to what is interesting only for you. Such technology would provide really unique experiences.
A wide spectrum of virtual reality applications drives the development of technology in the audio-visual industry. Until now, image-related technologies have been developing much faster, leaving the sound far behind. We have made the first attempts to show that 6DoF for sound is also achievable.
How to record audio in 6DoF?
It's extremely challenging to record high-quality sound from many sources present in the sound scene at the same time. We managed to do this using nine ZYLIA ZM-1 multi-track microphone arrays evenly spaced in the room.
In our experiment the sound field was captured using two different spatial arrangements of ZYLIA ZM-1 microphones placed within and around the recorded sound scenes. In the first arrangement, nine ZYLIA ZM-1 microphones were placed on a rectangular grid. Second configuration consisted of seven microphones placed on a grid composed of equilateral triangles.
Fig. Setup of 9 and 7 ZYLIA ZM-1 microphone arrays
Microphone signals were captured using a personal computer running GNU/Linux operating system. Signals originating from individual ZM-1 arrays were recorded with the specially designed software.
We recorded a few takes of musical performance with instruments such as an Irish bouzouki (stringed instrument similar to the mandolin), a tabla (Indian drums), acoustic guitars and a cajon.
Unity and 3D audio
To present interesting possibilities of using audio recorded with multiple microphone arrays we have created a Unity project with 7 Ambisonics sources. In this simulated environment, you will find three sound sources (our musicians) represented by bonfires among whom you can move around. Experiencing fluent immersive audio becomes so natural that you can actually feel being inside of this scene.
MPEG Standardization Committee