RDNA 2. The graphics structure on the middle of AMD’s kick-ass new Radeon RX 6000 graphics playing cards would possibly sound like a easy iteration upon the unique “RDNA” GPUs that got here ahead of it, however RDNA 2—which additionally powers the next-gen Xbox Sequence X and PlayStation five consoles—is a lot more than an insignificant refresh. Important tweaking has ended in a surprising 54-percent building up in power-per-watt over AMD’s last-gen Radeon RX 5000 GPUs. In all probability extra significantly, the Radeon RX introduces an leading edge new “Infinity Cache” era that reimagines how reminiscence behaves in graphics playing cards. Oh, and ray tracing? AMD does that now, too.
Upload all of it up, and the Radeon RX 6800-series graphics debuting these days organize to problem Nvidia’s enthusiast-class gaming flagships for the primary time in a very long time. Head over to our Radeon RX 6800 and RX 6800 XT evaluation to peer what that implies in sensible phrases. This high-level evaluation of the RDNA 2 structure will allow you to give an explanation for how AMD accomplished all of it.
RDNA 2 structure adjustments
AMD’s engineers approached RDNA 2 with lofty potency objectives as their guiding lighting fixtures. The unique RDNA structure equipped a 50-percent performance-per-watt building up over its “GCN”-based predecessors, in the end matching Nvidia’s vaunted chronic potency, and the corporate’s executives sought after RDNA 2 to stay that tempo. Spoiler alert: They did. It took a large number of onerous paintings despite the fact that, in addition to shut collaboration with the Ryzen CPU structure crew, as a result of RDNA 2 is constructed the use of the similar TSMC 7nm production procedure as RDNA 1. A large a part of the unique RDNA’s potency positive factors got here from the node jump from 14nm to 7nm, however RDNA 2’s enhancements required extra considerable tweaking.
In spite of the serious rejiggering, the elemental RDNA 2 development blocks stay in large part very similar to RDNA 1’s in wide strokes—excluding the addition of devoted ray accelerator hardware, which we’ll get to later—handiest scaled up a lot additional.
AMD stayed modest with final technology’s RDNA 1 merchandise. Its flagship, the Radeon RX 5700 XT, crowned out at 40 compute gadgets and 10.three billion transistors within its 251mm² die—a marvel taking into consideration AMD’s earlier GCN architectures scaled as much as 64 CU designs. (We’ll get to why that was once later as smartly.) RDNA 2 blows way past that. The $579 Radeon RX 6800 comprises 60 CUs, the $649 Radeon RX 6800 XT americathat to 72 CUs, and the flagship $999 Radeon RX 6900 XT will absolutely double-up final technology’s RX 5700 XT with a whopping 80 CUs within an enormous 519mm² die with over 26 billion transistors. In contrast, the “Ampere” GPU die within Nvidia’s rival $1,500 GeForce RTX 3090 packs a hair over 28 million transistors into a miles better 628mm² die.
Swiping a web page from AMD’s improbable Ryzen 5000 CPUs, RDNA 2 implements pervasive fine-grain clock gating to permit portions of the GPU to decelerate in the event that they aren’t getting used, bettering chronic potency. RDNA 2 moreover options extra tough clock tree splitting and gating (like server CPUs) for a similar reason why, however extra parallelized to hit the upper bandwidths succesful with GPUs. The corporate’s engineers additionally “aggressively” rebalanced information pipelines or even redesigned complete information paths, honing the structure for max potency. The ones optimizations accounted for roughly a 3rd of the as much as 54-percent performance-per-watt building up delivered within the Radeon RX 6800 and 6800 XT (and the whopping 65-percent building up promised for the flagship Radeon RX 6900 XT coming December eight).
Efficiency-per-watt isn’t all about chronic potency, despite the fact that—therefore the phrase “functionality.” Any other 3rd of RDNA 2’s perf-per-watt development comes from pushing the pedal to the steel even more difficult. As soon as once more, AMD’s engineers optimized the microarchitecture, good judgment, and function libraries with a focal point on pace. Probably the most tangible effects in their efforts need to be the insane clock speeds of the Radeon RX 6000 GPUs. AMD’s CPU engineers have spent a very long time honing speeds at the 7nm procedure node via this level, they usually shared their experience with the Radeon crew to nice impact.
The Radeon RX 6000-series graphics playing cards push way past the 2GHz barrier. Corporate representatives have been prepared to tout the “unparalleled” speeds in conversations with press. They must be. All 3 high-end choices—the Radeon RX 6800, 6800 XT, and 6900 XT—have spice up clock speeds that surpass a whopping 2.1GHz. The 2 XT fashions move the entire manner as much as 2,250MHz. The ones are beneath perfect stipulations, however AMD says the XT playing cards hit 2,015MHz even in gaming workloads, conserving tempo with Nvidia’s staggeringly tough Ampere GPUs, which is able to spice up to more or less 2GHz right through gameplay.
AMD couldn’t have hit such rapid speeds or accomplished its chronic potency objectives with out the creation of RDNA 2’s progressive Infinity Cache.
RDNA 2 Infinity Cache defined
RDNA 2’s standout characteristic additionally swipes a web page from processor design—Epyc server processors, on this case. Conventional GPUs come with L1 and L2 caches of more than a few sizes. Radeon RX 6000 graphics playing cards upload an “Infinity Cache” that behaves in a similar fashion to the “Sport Cache” that is helping fashionable Ryzen processors recreation such a lot higher than previous fashions did. Impressed via Epyc server CPUs, Infinity Cache is mainly an enormous 128MB L3 cache that has been closely optimized for gaming workloads. It’s 4 occasions denser than the L3 SRAM in Epyc processors to assist give a boost to chronic potency, too.
Equipping the GPU with this sort of massive, high-speed cache shall we it stay lots of the operating information for any given body on-die. This protects the GPU from having to stay sending alerts the entire manner around the package deal to the 16GB of onboard GDDR6 reminiscence in lots of instances, particularly since the cache holds a large number of temporal and spatial information that may be reused in next frames. That makes Infinity Cache a lot sooner and a lot more power-efficient in comparison to merely expanding the bus width to the reminiscence modules.
Sam Naffziger, AMD’s product era architect, says that even if the Radeon RX 6000 GPUs keep on with a modest 256-bit bus, the Infinity Cache is helping RDNA 2 ship vastly extra bandwidth-per-watt than conventional GDDR6 supplied with even a humongous 512-bit bus. By means of comparability, Nvidia’s rival high-end RTX 3080 and 3090 graphics playing cards make the most of wider 320-bit and 384-bit buses, respectively, paired with state of the art GDDR6X reminiscence that makes use of “PAM4” signaling era, which permits them to ship 4 conceivable values consistent with cycle, up from the normal two. That shall we GDDR6X transfer information at two times the speed of GDDR6, however with upper latency and tool calls for.
The Infinity Cache additionally is helping allow RDNA 2’s sky-high clock speeds. If AMD had attempted to drive the unique RDNA reminiscence subsystem on RDNA 2, Naffziger stated, it might have required a vastly better reminiscence configuration to steer clear of ravenous the GPU for bandwidth. That may have required upgrading to very large 512-bit buses, and extra, sooner reminiscence, all of which might have despatched the ability calls for skyrocketing—a no-go given RDNA 2’s design objectives.
The overpowering bandwidth enabled via Infinity Cache helps to keep RDNA 2’s CUs amply fed, as you’ll be able to see within the chart above. When AMD’s engineers disable Infinity Cache of their labs and revert to the usual cache design with 16GB of GDDR6 reminiscence over a 256-bit bus, GPU clock frequencies fall off a cliff.
By means of conserving such a lot body information on die, the Infinity Cache is helping the Radeon RX 6800 moderate 34 % much less latency than the older Radeon RX 5700 XT. When a scene absolutely “hits” the Infinity Cache, the latency reduces additional. Naffziger says that AMD’s Infinity Cloth communique era can scale its accelerates and all the way down to optimize potency, ramping as much as 550GB/s when the Infinity Cache turns into particularly stressed out. However even if the GPU must get admission to your card’s precise VRAM, latency additionally improves in comparison to the last-gen Radeon playing cards because of a normal pace building up for Infinity Cloth.
AMD tuned the Infinity Cache in this preliminary trio of enthusiast-class playing cards for 4K gaming, which is why it is configured with an outstanding 128MB. Naffziger says the massive dimension shall we Infinity Cache succeed in a 56 % “hit price” throughout quite a lot of titles at 4K answer, and better hit charges because the answer scales down. A part of the explanation why those playing cards carry out higher than their Nvidia pageant at 1440p gaming is because of excessive Infinity Cache hit charges, AMD’s Laura Smith stated.
However the Infinity Cache functionality doesn’t scale linearly as answer decreases, Naffziger warned. While you drop all the way down to 1080p, video games frequently turn into extra CPU- or engine-bound than memory-bound. (I wouldn’t be stunned if extra reasonably priced Radeon RX 6000 choices at some point diminished the Infinity Cache’s dimension on account of that.)
Likewise, the Infinity Cache spreads its wings probably the most in programs which might be extra memory-bound, despite the fact that its advantages can also be felt even if a recreation must get admission to conventional VRAM extra frequently. Naffziger says in the ones instances, RDNA 2’s total reminiscence device behaves more or less on a par with what you’d see in the event you’d supplied those playing cards with a 512-bit bus.
Infinity Cache a great deal is helping with ray tracing too.
Ray tracing with RDNA 2
Sure, AMD’s Radeon GPUs can take care of real-time ray tracing now. Nvidia kicked off the ray tracing birthday party via including devoted “RT cores” for dealing with ray tracing to its older RTX 20-series GPUs. Now AMD is becoming a member of the thrill via including a unmarried devoted “ray accelerator” to every RDNA 2 compute unit. That implies as you progress up the Radeon RX 6000 stack, extra tough graphics playing cards with extra compute gadgets may also be higher at ray tracing, as they’ll have extra devoted ray tracing hardware.
As you’ll be able to see in our Radeon RX 6800 and 6800 XT evaluation, RDNA 2 isn’t relatively on a par with Nvidia’s second-gen ray tracing implementation. It nonetheless delivers strangely just right ray tracing functionality, reaching very playable body charges at each 1440p and 1080p answer. You gained’t be capable of play video games at 4K with the in depth lighting fixtures applied sciences enabled, on the other hand, and AMD says it centered 1440p gaming as its ray tracing purpose. By means of and massive, it delivered.
Infinity Cache comes via within the seize right here, too. We delved deeper into how ray tracing works in our unique deep-dive of Nvidia’s Turing structure, the place the era debuted, however mainly it really works via having devoted ray tracing hardware carry out calculations of the way the sunshine rays behave, the use of one way referred to as bounding quantity hierarchy (BVH) traversal. Appearing that activity could be very memory-intensive, which is why VRAM calls for jump upward whilst you allow ray tracing in a recreation.
AMD says it’s in a position to stay “an excessively excessive share of the BVH operating set” immediately within the Infinity Cache, lowering latency and bettering total functionality. The ray accelerator handles intersections within the BVH, whilst RDNA 2 makes use of usual shader code within the compute gadgets for ray transversal and shading the true scene.
That stated, AMD does no longer have a solution for Nvidia’s Deep Studying Tremendous Sampling (DLSS) era. Ray tracing is extremely computationally dear, and activating it creates a putting functionality affect. To counteract the loss in body price, DLSS renders video games at a decrease answer, then upscales the overall symbol for your recreation answer the use of system finding out to spiff up the picture, all powered via Nvidia’s devoted AI-focused tensor cores.
Early iterations of DLSS may just appear to be Vaseline smeared in your display, however the DLSS 2.zero era rolling out in more moderen video games works like black magic. It’s superb, and in point of fact makes flipping ray tracing on much less painful. The tensor cores additionally take care of “denoising” when ray tracing is directly to steer clear of a gritty glance not unusual on older, much less complicated ray tracing implementations.
AMD doesn’t come with devoted AI upscaling hardware in RDNA 2. Denoising is treated via the overall compute gadgets, and it really works really well via my eye—however there’s no DLSS-like characteristic to claw again misplaced frames. Throughout its Radeon RX 6000 disclose, AMD teased some type of DLSS rival dubbed “Tremendous Solution” as a part of its FidelityFX suite of open-source gear with out going into element. Representatives declined to mention extra, as opposed to to state that Tremendous Solution will no longer be to be had instantly. That stated, as a result of AMD’s RDNA 2 powers each next-gen consoles as smartly, the corporate hopes its open-source choice finally ends up gaining traction with builders when it does arrive. The corporate’s FidelityFX toolkit additionally features a denoiser answer that builders can enforce.
DirectX 12 Final options and extra
However wait, there is extra. Like Nvidia’s fresh RTX-branded GPUs, RDNA 2 is absolutely DirectX 12 Final-compliant. Microsoft calls DX12 “a drive multiplier for all the gaming ecosystem” via unifying an array of latest options—most commonly ones offered in Nvidia’s Turing-based RTX 20-series, however in large part unnoticed via builders—throughout all fashionable PC and next-gen Xbox Sequence X hardware.
That implies Radeon RX 6000-series graphics playing cards additionally select up nifty tips like mesh shading, variable price shading, and sampler comments, which we coated in our take a look at DirectX 12 Final. The entire options dangle nice possible to give a boost to each functionality and visible constancy. AMD optimized more than a few portions of RDNA 2 round them, comparable to bettering the colour compression conduct and including devoted sampler comments good judgment.
AMD’s Radeon GPUs may also improve Microsoft’s DirectStorage API when it debuts in 2021 (as will Nvidia’s RTX 30-series). DirectStorage shall we your NVMe SSD communicate immediately for your graphics card’s reminiscence for massively stepped forward loading and asset-streaming functionality. Right here’s how DirectStorage targets to kill game-loading occasions at the PC. It has the prospective to be an actual game-changer.
Different facets of RDNA 2 won upgrades as smartly. The show engine now helps HDM1 2.1, as an example. The multi-media engine can take care of AV1 interpreting for 8K movies and features a top quality 8K HEVC encode accelerator, matching developments present in Nvidia’s Ampere GPUs. 8K is probably the most area of interest of area of interest instances at this level, despite the fact that, and that is getting lengthy sufficient.
Be sure that to take a look at our complete Radeon RX 6800 and RX 6800 XT evaluation to peer how some of these RDNA 2 enhancements translate into graphics playing cards you’ll be able to in truth purchase. They’re improbable, they usually in point of fact problem Nvidia’s high-end gaming choices for the primary time since 2013’s Radeon R9 290X hit the streets. No matter else you’ll be able to say about 2020, it’s an ideal 12 months to be a gamer.