Audiokinectic Interview: Providing Next-Gen Audio Solutions For PS4 And Xbox One

Audiokinetic is an extremely popular software company based in Montreal, Quebec that provides audio middleware for just about every AAA game out there. Audiokinectic has been revolutionizing audio authoring for years now and in the process has partnered with some big guns in the industry such as the developers of Unreal Engine, Unity and many more.

With the new consoles already available on the market, GamingBolt decided to get in touch with Mike Drummelsmith from Audiokinectic to know more where the company is hoping to head in the future. Check out his response below.

Rashid Sayed: First of all, can you please tell us a bit about yourself and the humble beginnings of the company?

Mike Drummelsmith: While I’ve personally only been at Audiokinetic for just over a year, the AK history goes back over a decade. It started up in 2003 after some of the founders had worked on gaming projects here in Montreal, and realized that the tools for audio were pretty lacking. Audiokinetic was founded, and the work to research the various requirements of studios all around the industry was begun. Over the next few years, Wwise was born, and the first game with it was shipped out in 2007 (Shadowrun, on Xbox 360). Since then, we’ve been in pretty constant growth, and this past year we shipped out about 70 titles (at my last count). 2015 going to be even bigger!

Rashid Sayed: Let’s talk about two of your major product offerings, Wwise and a set of plugins. For our readers, can you please let us know the purpose of these two technologies?

Mike Drummelsmith: Wwise is the core technology that everything else currently branches off of. It’s a high-performance, cross-platform audio engine that covers PCs, consoles and mobile development. The other portion of Wwise is the expansive Authoring tool. If the audio engine is the heart of the system, then the Authoring environment is the brain. This is where sound designers and composers integrate their raw audio into the game. They can essentially define out the following core points: What sounds play and when they play. WHY they play (behaviours based on the game rule sets). How they play (effects and mixing). It can get MUCH more complex than that, but that’s what tutorials and practice are for.

The main purpose, overall, is to improve the quality of the audio integration in the game, while reducing the time needed to integrate those sounds in the game. We also strive to reduce the testing and QA needs by allowing the sound designer to debug the audio in the game, without always needing the programmers. That’s the heart of all middleware, really – better results, less time and therefore less money.

"Wwise is the core technology that everything else currently branches off of. It’s a high-performance, cross-platform audio engine that covers PCs, consoles and mobile development."

Rashid Sayed: What kind of improvements have you done to Wwise for the new consoles i.e. PlayStation 4 and Xbox One?

Mike Drummelsmith: There have been a bunch of under-the-hood developments for the current-gen console family, but some of the specifics I can’t really discuss. We do exploit some of the additional audio hardware on the Xbox One, and we support the special codecs on each platform (ATRAC-9 on PS4, XMA on Xbox One). We’ve also been doing some work for 3D surround audio (and by that I mean not just 5.1/7.1 mixes, but also height channel and headphone virtualization), which will have benefits for VR systems.

Rashid Sayed: What kind of ‘specific’ features does SoundSeed have for PS4, Xbox One and Modern PCs?

Mike Drummelsmith: SoundSeed is a real-time audio synthesis engine to generate infinite variability for certain types of sounds (wind and woosh sounds, impact sounds, etc.), so having more power in the current-gen consoles and modern PCs always helps. We’ve kept it relatively light-weight, though, so it just means that more developers on current-gen will use it, as the CPU requirements for it are relatively small. It’s easily one of our most popular plug-ins.

Rashid Sayed: What kind of debugging tools and user interface do Wwise have to make life easier for the developers?

Mike Drummelsmith: Wwise comes with a built-in profiler system that allows developers to see what sounds are playing during their session, and then if a bug or anomaly is detected, they can ‘travel back in time’ to see what the error was and why it occurred. With this tool, the developer can also see how much CPU and memory is being used by the audio engine, how many voices are playing, and a lot more profiling information that is critical to maintaining performance and keeping a clean sound mix. It’s a powerful tool, and we’re continually working to improve it to keep it completely indispensable.

Rashid Sayed: The PS4 does not have an on board audio processor but the Xbox One has one. Does this impose any kind of limitations when developing the tool for the PS4 version?

Mike Drummelsmith: Not really. Having extra hardware is nice, since it makes some things ‘free’ from a performance standpoint. However, extra hardware can also add complexity into the mix that makes it more difficult to diagnose issues and maintain full control over your audio pipeline. As soon as you hand off something to a dedicated processor and say ‘do something with this’, you’re pretty much bound to whatever that processor does! Both Xbox One and PS4 have each presented their own challenges and benefits, but we’re nicely on top of each (and we have a ton of current-gen games in development, running the gamut from small indies to massive AAA titles)

"Both Xbox One and PS4 have each presented their own challenges and benefits, but we’re nicely on top of each (and we have a ton of current-gen games in development, running the gamut from small indies to massive AAA titles)"

Rashid Sayed: With the rising costs of games development, how does Wwise offer a better cost to performance ratio?

Mike Drummelsmith: Our pricing currently takes two forms. First, for small games with less complex audio, we offer a free ‘Limited Commercial License’. This allows teams with fewer than 200 sound files to use Wwise for free. Because Wwise allows for a lot of real-time audio processing (filters, reverb, etc.) and now includes a synthesizer (Wwise Synth One), those 200 sounds can go really far.

For more complex games, we offer our standard licensing, which is tiered to the production budget of the game (basically, how much you’re budgeting for the game, not counting administrative overhead and marketing). This allows games with smaller budgets to still afford the exact same tool that the AAA-budgeted studios use. And we’re always working on ways to make Wwise affordable to developers of all stripes. Basically, if you’re interested in using Wwise in your game, and you have some worries about the pricing, talk to us – we’re remarkably friendly!

With that in mind, our main ‘competition’ comes not from what most people would consider traditional competitors, but from in-house solutions and built-in solutions. When studios have an in-house solution, it’s easy to think that ‘this is free’. However, there is a cost to maintaining that code, developing features for it (remember, we’ve been building this for over a decade!), keeping up with new platforms and new SDKs, and more. All of this takes time and money. Slowly but surely, a lot of teams are realizing that maintaining their own audio engine either is WAY more expensive than they thought, or simply can’t match the feature-set we offer.

With game engines, it’s a little trickier proposition. The tools are obviously built- in, so that makes them generally easy to use. However, often they are incredibly limited. Talking with a lot of indie developers using engines like Game Maker, Cocos2D, Monogame they pretty much agree that there is not really an ‘audio engine’ there, just a means to play sounds. Mixing, building complex interactive music, real-time effects, and other important features are simply non-existent. Even teams using Unity and Unreal Engine 4 come to us regularly touting how much easier things have been with Wwise versus the built-in systems.

We work hard to integrate Wwise with the major engines, and currently have Unity, Unreal Engine 4 and CryEngine covered (Crytek actually did that integration, so it’s very deeply ingrained now). For other engines, we’re working
on how to target them cost-effectively (there are a LOT of engine out there), but for a studio to integrate Wwise with any engine they have source to is not the most complex affair. Many, many of the teams using Wwise are using their own game engines, so we’ve made Wwise as easy as possible to hook into core engine code.

Rashid Sayed: Both the Xbox One and PS4 have similar architecture but the former lacks a unified memory. Many developers are apparently facing issues due to eSRAM resulting into lower resolution in certain games. As someone who has extensive hands on the Xbox One and the internal tid-bits what is your take on this issue and does it affect Wwise tools development in anyway?

Mike Drummelsmith: As with any new hardware, there are always funky complexities and quirks to work around. Luckily in this particular situation, it hasn’t affected us. As I mentioned above, having the extra audio processor on the Xbox One has had both benefits and detriments for us, but we’ve worked through them. Last generation, the situation was reversed, and we had to work hard through the complexity of the PS3. Comparatively, this generation has been much easier to work with. From our perspective on the audio side, the two systems are pretty much the same, though we get the secondary output on PS4 (playing sounds through the controller, like in Resogun), which is a pretty neat feature.

Rashid Sayed: A lot has been made out of the PlayStation 4’s GDDR5 memory. It comes packed in with a wide bandwidth and latency does not seem to be an issue. The Xbox One on the other hand has DDR3. Obviously one is faster than the other but in the grand scheme of things, does the speed difference matter for Wwise?

Mike Drummelsmith: For us, it hasn’t really been an issue. We’re just happy that in both cases, we’re pretty much streaming or loading everything either from memory or from the hard disc, rather than a DVD or Blu-Ray.

Rashid Sayed: A lot of developers are obviously facing difficulties for transitioning from last gen to next gen systems. What kind of challenges are these developers facing and how are you helping them so that the transition is smooth? [In terms of products offered by Audiokinetic only].

Mike Drummelsmith: What we’ve seen is a great ‘easing’ on that front. The amount of currentgen remasters and cross-generation titles is pretty telling. For our products, teams have found a very easy transition from Gen7 (PS3/X360/Wii) to Gen8 (PS4/ Xbox One/WiiU). They can do more with the tools provided, since they have so much more CPU and memory room to work with. We continue to support the older platforms (we currently support around 15 platforms), so we’ve had to make sure that the transition was as seamless as possible.

"Audio on GPU (via GPGPU techniques) is something that we’ve been working on for a while. Some things are easier to do on that front (audio decoding, for instance) and some are more complex (calculating reverberation models, for example)."

Rashid Sayed: Sound is something that is dependent on CPU. Both the PS4 and Xbox One have identical CPU but the former has a more powerful GPU. Have you been able to use that extra bit to have better playback, compression and performance on the PS4?

Mike Drummelsmith: Audio on GPU (via GPGPU techniques) is something that we’ve been working on for a while. Some things are easier to do on that front (audio decoding, for instance) and some are more complex (calculating reverberation models, for example). As the platform tools become more fleshed out to move more processing over to the GPU, we’ll continually be looking to see how we can allow developers to exploit that. It’s a similar (but far less complex) situation to the early PS3 days, when the tools weren’t quite there for exploiting the SPUs. By mid-generation, developers were doing things that were pretty mind-boggling. I’m anxiously awaiting the 2nd round of games in current gen – the first round has already been pretty great, and it’s only going to get better from here!

Rashid Sayed: Memory was such a big issue last gen. But now with these new consoles [PS4 and Xbox One], there is literally a ton of it which allows developers to do more. In that perspective, how do you ‘now’ work with developers who have a custom request which may be heavy on the memory side?

Mike Drummelsmith: Haven’t had too many issues on that front, as we support streaming solutions alongside loading sounds into memory, and in general the teams are still figuring out how to use the memory they’ve been given in this generation. I’m sure that eventually they’ll be bumping up against their limits, but we’ve already provided them with a lot of tools to handle systems with far lower amounts of memory (PS3, X360, mobile platforms), so they use those tools well already. One thing we’ve had for a while is a dynamic virtual voice system, where you can essentially stop playing and decoding a sound file based on a rule set, while still knowing the position within that sound file so that you can bring it back into the audio mix either at the same time stamp, or seeked forward to make it sound like it never really stopped playing. You don’t want to have an incredibly complex mix playing, since it can rapidly become just noise. These systems let a developer keep their mix clean, while also managing memory and CPU usage.

Rashid Sayed: It is indeed impressive that you support everything from mobiles to consoles. What kind of challenges does that bring when developing across such a wide eco-system?

Mike Drummelsmith: It’s a big challenge since the performance specs are so widely different. However, with the tools we have in place, teams can easily adjust things that have the biggest impact on performance metrics. Using a really expensive real-time effect on console? Tone it down for mobile. Increase file compression or change codecs. Play fewer simultaneous voices. There are lots of things that a developer can tweak to support a variety of performance targets.

Rashid Sayed: What is next for Audiokinetic? What are you guys working on now?

Mike Drummelsmith: From the core technology side of things, we have about 3 or 4 years of work already scoped out. We can’t sit still! Lots of work is currently going in to ways to manage more and more complex projects and improve the workflow for studios using Wwise. We’re working on ways to better support a variety of game engines, as well. When we officially support an engine, we then have to make sure the integration is as deep as we can make it (see our currently Unity and Unreal Engine 4 integrations for examples of that), that it works on as many platforms as we can target, and that it’s fully QA’d by our team.

This puts a functional limit on how many engines we can support. However, we’re hoping to provide some unofficial assistance and bootstrap integrations for more engines moving forward, to allow more and more teams to take advantage of Wwise. On the licensing side, we’re also cooking up some ideas that will allow teams that can’t currently work easily within our Limited Commercial License or tiered licensing structure to be able to afford Wwise. More on that in the coming months. Finally, we’re working hard on building a community around the Wwise core platform. This is an initiative that will include developers of all sizes, freelance sound designers, 3rd party sound design companies and more. It’s going to be an exciting, and very, very busy year! Thanks for your time!