One of my greatest pet peeves with graphics hardware reviews these days has been the insistence of media—and the community in general—on using peak compute in Teraflops as a point of reference. The situation’s gotten even worse with the release of the GTX 2000 series—we now have a new number, seemingly invented by Nvidia called “RTX-Ops,” a completely arbitrary figure with seemingly no other purpose than to show that the ultra-expensive new hardware is indeed faster than Pascal (in some arcane way, assuming that hybrid raytracing models actually become an integral part of the graphics pipeline in tomorrow’s games). But before RTX-Ops can to completely rob hardware comparisons of any credibility whatsoever, there was the original culprit: compute, measured in teraflops.
The origins of this unfortunate usage can easily be traced back to 2013: leaked presentations on Orbis and Durango—codenames for the PS4 and Xbox One—highlighted that the PS4 had a GPU with 1.84 Teraflops of power while the Xbox One’s GPU had 1.18 Teraflops. There’s nothing that the marketing department loves more than a simple number to make seemingly scientific comparisons (“Our detergent removes stains 5 times better than the competition.”) With console and PC hardware, the reality is nowhere near as simple. And yet—because of the bandwagon effect or just plain laziness—the teraflop has entered the lexicon of the gaming media: The 6 TFLOP Xbox One X; the 1 TFLOP Tegra X1 mobile processor (and we’re completely ignoring the utterly ridiculous “70 RTX-Ops” 2080 Ti here).
In the console space, Teraflop figures are often thrown around in the discussion surrounding PS4 Pro and Xbox One X performance. This can lead to disingenuous conclusions. While it is true that the Xbox One has an objectively faster graphics component, this is just part of the story when it comes to the performance comparison.
But what, exactly, is so wrong with using compute performance to compare graphics cards?
To understand this better, we need to look into the reasons why particular GPUs run games well or poorly, and the only truly objective metric here is per-game framerates.
A graphics card isn’t simply made up of compute units. It has a variety of other components, from video memory to texture units to ROPs and more. Each of these elements is made use of in different ways, and to different extents, by various games. For example, MSAA depends in large part on the available memory bandwidth. A card like the GTX 1050 (3 GB), with more compute units than the GTX 1050 2 GB, but substantially less memory bandwidth, may actually end up performing worse than its notionally “worse” counterpart. On the flip side, having a glut of memory bandwidth—512 GB/S to be precise—doesn’t help the R9 Fury or Fury X due to their low ROP count, relative to their large number of shader units. The imbalance of ROP and compute units in the Fury series is evident in that, despite losing 12.5 percent of its compute units, the R9 Fury never performs 12.5 percent worse than the Fury X.
With consoles, situation is further complicated because the bottleneck is often not in the graphics component at all, but in the CPU. The perfect example of this is Fallout 4. Despite the significant GPU performance gap between the Xbox One and PS4, Fallout 4 actually ran slightly better in places on Xbox One due to its CPU clockspeed advantage. CPU power was that much of a bottleneck that the teraflop difference had no impact on in-game performance. With PS4 Pro and Xbox One X, the situation is slightly different. The Pro runs the game at 1440p while the Xbox One X runs it at a full 4K. The result here is that image quality on the One X is definitely better than on the Pro. However, performance is shakier, with more frequent drops below 30 FPS than on Sony’s console. This is a pattern that repeats itself in many titles—the One X runs titles at a higher resolution while offering lower performance. How is the Teraflop gap useful in describe these situations?
Game engines aren’t solely compute-intensive either. A variety of techniques stress various aspects of GPU hardware, and not all of these are used in all games. For example, AMD cards are known to handle tessellation worse than their Nvidia counterparts. As a result, Gameworks features that are often heavy on tessellation—such as Hairworks—work noticeably worse on AMD hardware as compared to Nvidia hardware. However, Team Red’s recoups a large chunk of that deficit in games like The Witcher 3 simply by having tessellation turned off—to the extent that other advantages on the AMD hardware could help propel it past the competition—as is the case with the R9 380 and GTX 960.
Performance can be limited by bottlenecks in any particular component, not just the compute units. And we haven’t even got into the massive impact of driver support. Many Nvidia cards far outperform AMD cards with a similar degree of compute performance (e.g. The RX 480 and the GTX 1070) in gaming situations while falling far behind in other situations like mining, in part due to the extra resources Nvidia puts towards drivers.
Lastly, the “peak” aspect of peak compute bears looking at. Due to thermal constraints or power efficiency concerns, a particular GPU may not always operate at peak performance: This is especially noticeable in the mobile space: Performance degradation is an almost unavoidable aspect of smartphone gaming as high-end mobile GPUs lower clocks and offer more pedestrian performance to cope with the increased heat load. In many situations, a nominal “peak compute” figure is quoted, but in real-world scenarios the hardware doesn’t actually hit that level of performance consistently. In the console space, peak performance is less of a factor: Console hardware is purposefully operated at lower clocks to guarantee consistent performance without hitting thermal or power limits.
Teraflops, in general aren’t a useful figure to determine the relative performance of graphics hardware. In the console space, considering the relative architectural similarities, it is of some use to come to a rough estimate of performance differentials between the Pro and Xbox One X in GPU-limited situations, when running at the same framerates and visual settings. However, because of the (obvious) desire by manufacturers and developers alike to offer parity in console experiences, its rare that everything else remains the same: visual settings may be tweaked, draw distances may be pulled back, checkerboard rendering may be employed and at the end of the day, the experience on a 4K TV isn’t, say, 50 percent better or worse, as Teraflop figures would have you believe.