Behind the scenes of the orchestra recording made with 30 Ambisonics microphones. How did we create a virtual stage with navigable audio?
Zylia in collaboration with Poznań Philharmonic Orchestra showed first in the world navigable audio in a live-recorded performance of a large classical orchestra. 34 musicians on stage and 30 ZYLIA 3’rd order Ambisonics microphones allowed to create a virtual concert hall, where each listener can enact their own audio path and get a real being-there sound experience.
ZYLIA 6 Degrees of Freedom Navigable Audio is a solution based on Ambisonics technology that allows recording an entire sound field around and within any performance imaginable. For a common listener it means that while listening to a live-recorded concert they can walk through the audio space freely. For instance, they can approach the stage, or even step on the stage to stand next to the musician. At every point, the sound they hear will be a bit different, as in real life. Right now, this is the only technology like that in the world.
6 Degrees of Freedom in Zylia’s solution name refers to 6 directions of possible movement: up and down, left and right, forward and backward, rotation left and right, tilting forward and backward, rolling sideways. In post-production, the exact positions of microphones placed in the concert hall are being mirrored in the virtual space through the ZYLIA software. When it is done, the listener can create their own audio path moving in the 6 directions mentioned above and choose any listening spot they want.
6DoF sound can be produced with an object-based approach – by placing pre-recorded mono or stereo files in a virtual space and then rendering the paths and reflections of each wave in this synthetic environment. Our approach, on the contrary, uses multiple Ambisonics microphones – this allows us to capture sound in almost every place in the room simultaneously. Thus, it provides a 6DoF sound which is comprised only of real-life recorded audio in a real acoustic environment.
How was it recorded?
* Two MacBooks pro for recording
* A single PC Linux workstation serving as a backup for recordings
* 30 ZM-1S mics – 3rd order Ambisonics microphones with synchronization
* 600 audio channels – 20 channels from each ZM-1S mic multiplied by 30 units
* 3 hours of recordings, 700 GB of audio data
Microphone array placement
The placement of 30 ZM-1S microphones on the stage and in front of it.
To be able to choose the best versions of performances, the Orchestra played nine times the Overture and eight times the Aria with three additional overdubs.
Simultaneously to the audio recording, we were capturing the video to document the event. The film crew placed four static cameras in front of the stage and on the balconies. One cameraman was moving along the pre-planned path on the stage. Additionally, we have put two 360 degrees cameras among musicians.
Our chief recording engineer made sure that everything was ready – static cameras, moving camera operator, 360 cameras and recording engineers – and then gave a sign to the Conductor to begin the performance. When the LED rings on the 30 arrays had turned red everybody knew that the recording has started.
A large amount of data make it possible to explore the same moment in endless ways. Recording all 19 takes of two music pieces resulted in storing 700 GB of audio. The entire recording and preparation process was documented by the film with several cameras. Around 650 GB of the video has been captured. In total, we have gathered almost 1,5 TB of data.
Post-processing and preparing data for the ZYLIA 6DoF renderer
First, we had to prepare the 3D model of the stage. The model of the concert hall was redesigned, to match the dimensions in real life. Then, we have placed the microphones and musicians according to the accurate measurements. When this was done, specific parameters of the interpolation algorithm in the ZYLIA 6DoF HOA Renderer had to be set. The next task was the most difficult in post-production - matching the real camera sequences with the sequences from the VR environment in Unreal Engine. After this painstaking process of matching the paths of virtual and real cameras, a connection between Unreal and Wwise was established. In this way, we had the possibility to render the sound of the defined path in Unreal - just as if someone was walking there in VR. Last, but not least - was to synchronize and connect the real and virtual video with the desired audio.
The outcome of this project is presented in “The Walk Through The Music” movie, where we can enter the music spectacle from the audience position and move around artists on the stage.
You can also watch the “Making of” movie to get more detailed information on how the setup looked like.
Zylia is building a future of immersive and fully navigable audio for Virtual Reality by creating an installation of 53 3rd order Ambisonics microphone arrays
Zylia introduced a six-degrees-of-freedom (6DoF) multi-level microphone arrays installation for navigable live recorded audio.
What does it mean? We are working on technology that gives people the possibility to listen to a concert or live performance from any point in the audio scene. With our technology you are able to record an audio scene from different points of the space – the center of the stage, from the middle of a string quartet, audience, or backstage. Audio recorded in such a way can be used together with virtual reality projections and allow the user to freely move around the space giving the natural experience of audio scene and the possibility to listen to it from different perspectives.
6DoF installation and test setup
The first step of test recordings was to install 9 3rd order Ambisonics microphone arrays on the same level and record musicians playing their performance. Such an approach allowed the listener to move around those 9 points and listen to their music from different perspectives. However, microphones placement on a single level introduced limitations in terms of audio resolution in the vertical plane.
Since we like challenges we decided to increase the number of microphones to 53 and build an installation on five different levels. It allowed us to freely move in every direction of the recorded scene in a truly immersive experience. The second idea behind this test setup was to check the limits of Ambisonics recordings in order to achieve a fully navigable audio scene. We placed the microphones arrays densely in the recorded scene and we received a spatial audio image of very high resolution.
We used 53 19-channel mic arrays – which gave us 1007 audio channels recorded simultaneously. Microphones were connected to a USB hub and the recordings were operated via a single laptop.
The audio recorded from each microphone array was converted to 3rd order Ambisonics using our ZYLIA Ambisonics Converter plugin (it can be done in real-time or offline). After the recording, we used our interpolation software. This software is a MaxMSP plugin, that generates 3rd order Ambisonics spheres based on the signal from all microphones in the position you are at the moment. When you put your headphones and VR headset and move around the space the algorithm in MaxMSP takes your position and interpolates 3D sound in the position you are at the moment.
We used 3rd order Ambisonics microphone arrays. It is important because the higher the order the more precision we get in the spatial localization of sound around the listener. We are able to recreate the sound with a very high spatial resolution which influences the audio quality - an extremely important aspect for listeners.
With this simple approach, you can record the natural audio scene for your VR/AR productions and use it right away without complicated work-flow in post-production. You can record live events and stream audio directly to the listener giving him the possibility to freely choose the position in this real-time recorded space for an ultimate immersive audio experience.
Cinematic trailers for VR, audio for games, live performances recording, domes with multi-loudspeakers installations
What would happen if on a rainy and cloudy day, during a walk along a forest path, you could move into a completely different place thousands of kilometers away from you? Putting the goggles on would get you into a virtual reality world, you would find yourself on a sunny island in the Pacific Ocean, you would be on the beach, admiring the scenery and walking among the palm trees listening to the sound of waves and colorful parrots screeching over your head.
It sounds unrealistic, but such goals are determined by the latest trends in the development of Augmented / Virtual Reality technology (AR / VR). Technology and content for full VR or 6DoF (6 Degrees-of-Freedom) rendered in real time will give the user the opportunity to interact and navigate through virtual worlds. To experience the feeling of "full immersion" in the virtual world, realistic sound must also follow a high-level image. Therefore, only each individual sound source present in virtual audio landscape provided to the user as a single object signal can reliably reflect both the environment and the way the user interacts with it.
What are Six Degrees of Freedom (6DOF)
"Six degrees of freedom" is a specific parameter count for the number of degrees of freedom an object has in three-dimensional space, such as the real world. It means that there are six parameters or ways that the object can move.
There are many possibilities of using a 6DoF VR technology. You can imagine exploring a movie plan in your own pace. You could stroll between the actors, look at the action from different sides, listen to any conversations and paying attention to what is interesting only for you. Such technology would provide really unique experiences.
A wide spectrum of virtual reality applications drives the development of technology in the audio-visual industry. Until now, image-related technologies have been developing much faster, leaving the sound far behind. We have made the first attempts to show that 6DoF for sound is also achievable.
How to record audio in 6DoF?
It's extremely challenging to record high-quality sound from many sources present in the sound scene at the same time. We managed to do this using nine ZYLIA ZM-1 multi-track microphone arrays evenly spaced in the room.
In our experiment the sound field was captured using two different spatial arrangements of ZYLIA ZM-1 microphones placed within and around the recorded sound scenes. In the first arrangement, nine ZYLIA ZM-1 microphones were placed on a rectangular grid. Second configuration consisted of seven microphones placed on a grid composed of equilateral triangles.
Fig. Setup of 9 and 7 ZYLIA ZM-1 microphone arrays
Microphone signals were captured using a personal computer running GNU/Linux operating system. Signals originating from individual ZM-1 arrays were recorded with the specially designed software.
We recorded a few takes of musical performance with instruments such as an Irish bouzouki (stringed instrument similar to the mandolin), a tabla (Indian drums), acoustic guitars and a cajon.
Unity and 3D audio
To present interesting possibilities of using audio recorded with multiple microphone arrays we have created a Unity project with 7 Ambisonics sources. In this simulated environment, you will find three sound sources (our musicians) represented by bonfires among whom you can move around. Experiencing fluent immersive audio becomes so natural that you can actually feel being inside of this scene.
MPEG Standardization Committee