Will we ever cross over to the other side of the Uncanny Valley? This has been a problem bothering gamers and developers alike since the arrival of the very first 3D games. A couple years ago, Tim Sweeney was quoted saying that dynamic, photo-realistic environments would take need about 40 TFlops of compute power to render in realtime.
A pair of overclocked RTX 2080 Ti’s in SLI can approach 30 TFlops of compute. While that’s still a luxury setup, well out of reach of most people, it indicates that mainstream consumer parts might hit 40 TFlops points sooner rather than later, perhaps as early as the end of the coming decade. Remember that when the eighth console generation began in 2013, it was considered a big deal that the PS4 had 1.84 TFlops of compute power. The mobile GPU in 2018’s iPad Pro offers comparable graphics performance to the Xbox One. The upcoming iPhone 12 release will likely bring that level of performance to pocketable phones.
As early as the 1980s, with the arrival of wireframe games like the original Elite, the community’s simultaneously praised advances in real-time graphics and bemoaned just how far away they are from real life. The arrival of dedicated 3D graphics acceleration hardware in the late 90s sped things up significantly. The fifteen years from 1995 to 2010 are marked by giant leap after giant leap in visual advancement. To put things into perspective: While 2019 visuals are certainly better than the best on offer in 2007, we appear to be working against the law of diminishing returns now.
What exactly is the Uncanny Valley? It’s the term used to denote the phenomenon where, beyond a certain point, near-photorealistic visuals look less pleasing to us than stylized or low fidelity graphics. There’s a reason for this and its rooted in deep, instinctual human psychology. Low fidelity game graphics and works of art in general rely on the suspension of belief to allow people to relate to and engage with them. Consider the highly stylized cel-shaded visuals in Telltale’s The Walking Dead series. When we see Lee and Clementine, both our conscious and subconscious minds are in agreement that these are not “real” people, but representations of human beings.
In order to immerse ourselves, we then willingly suspend our disbelief and accept these representations as proxies for real people—this is the same thing we do when, say, reading a book. Suspension of belief allows us to identify with videogame characters even if their visual makeup doesn’t correspond neatly to the real human form. Up to a certain point, the more stylized characters are, the easier it is to suspend belief and engage with them without looking for subtle “flaws.” This is why 2012’s Borderlands 2 has held up so well. It still looks great on a 4K display. This is because in our minds, we don’t see Raiders as crude depictions of real humans, but as great-looking comic book representations.
When games target photorealism, though, a fascinating paradox results. As visuals become increasingly sophisticated, it becomes more and more difficult for us to suspend belief on a broad basis. Our conscious mind doesn’t really have a problem—suspension of belief here operates in the same way that it does with movies and shows—we consciously recognize the characters and environments as fictional, therefore allowing us to engage with them. However, some characters are “real enough” that our subconscious has trouble slotting them in the “artistic representation of humans” category like it does with more stylized characters. Instead, our subconscious recognizes them as actual humans—but with some major caveats.
When interacting with real people, we pick up on a lot of subtle and dynamic visual ques—the way they make and break eye contact, micro-expressions corresponding to the conversation they’re having with you, and so much more. Current character rendering models don’t incorporate this dynamism except in the most rudimentary of ways, and the way in which character interactions are framed in most games don’t offer scope for that kind of interactivity. Take Deus Ex: Mankind Divided, for instance. In the real-world, a conversation is a two-way, dynamic interaction.
In Mankind Divided, an NPC says something, and then just waits for you to say something else—their facial animations don’t account for the time in between and the very manner in which you interact with them is artificial. However, running at 4K with the settings maxed out and with PureHair enabled, these NPCs can look very realistic. And this is exactly the problem. Camouflage as a predation strategy has been used by different species for hundreds of millions of years. For instance, certain spider species have evolved to closely resemble the ants they prey on. This leads to many species developing a kind of instinctual IFF (identify friend-foe) radar, based on subtle cues. When it comes to games though, this results in the feeling of unease we often feel with Uncanny Valley characters.
With photorealistic rendering, we’ve already reached far into the Uncanny Valley. But these subtle cues that help your subconscious identify who’s human and who’s not require much more effort to simulate. This effort might not be in terms of traditional rendering—it’s not about increasing poly-counts or incorporating ray-tracing. It would be in terms of coding dynamic systems—likely powered by AI and complemented by facial recognition sensors—to enable in-game characters to emote dynamically like real people. Because micro-expressions are so subtle—they can last mere milliseconds and use the smallest of facial muscles—and because they’re so dynamic, they’d require accordingly powerful AI algorithms to enable characters to respond to your expressions in real-time.
This is uncharted territory for videogames. Facial recognition algorithms that we commonly encounter day-to-day, from Face ID to Snapchat filters, are slowly collecting vast sets of data pertaining to human facial expressions. Closely related to this is the nascent field of artificial emotional intelligence (AEI). For truly dynamic character models that can cross the Uncanny Valley, we’d require future systems that leverage a mature implementation of AEI and sophisticated facial recognition to respond realistically to human interaction at a subperceptual level of granularity. Suffice to say, we’ve got a long ways to go before that becomes a reality.
Game graphics today have come closer than ever to photorealism. But in some ways, they’re just as far as ever from achieving true photorealism. As graphics continue to evolve, the real challenge is to deliver characters that are genuinely lifelike, in that they talk, emote, and act like humans, not just representation of them.