With its GeForce RTX 30 series, NVIDIA has hit hard. It delivers an architecture in line with the previous ones, but profoundly altered to change the situation. Improved RT Cores and Tensor Cores, doubled FP32 units, controlled consumption. After the promises, it’s time for testing. As you might expect, the Ampere architecture as built into the GeForce RTX 30 Series is an improvement over that of the 20 Series. “As it should have been” some will say. But that would be forgetting that GPU cannot be designed in a snap, and that sometimes it takes time to make certain changes.
Double density, the importance of cost control
Could NVIDIA have doubled the FP32 computing units without Samsung switch to 8nm? No. We can see this very clearly when we analyze the density of the chips, which has almost doubled. AMD keeps the advantage and should go a little further on its Radeon RX 6000 in 7nm+ (RDNA 2, Navi 2X). We will see.
NVIDIA confirms its choice, for good reason: TSMC 7nm engraving process is too busy to deliver in volume at a reasonable price, which even Microsoft (which uses it for its next Xbox) has confirmed in recent months. This is also one of the concerns that AMD will have to face for its new Ryzen / Radeon. With NVIDIA, this solution is only used for the A100 chip, intended for servers.
Because one of NVIDIA objectives with Ampere is to democratize its technical innovations. Despite the success of the GeForce RTX 20 series, both ray tracing and DLSS, Tensor Cores or even accelerated 3D rendering have only been accessible to a limited portion of its customers. Especially for high performance products.
GeForce RTX 30 series: a much better price and performance ratio
This is partly what explains the aggressive pricing even on the high-end offer. A Titan RTX which was sold for nearly 3,000 euros is now less efficient than an RTX 3090 advertised at 1,549 euros. The slip is the same for the GeForce RTX 3070 (520 euros) and 3080 (720 euros) equivalent to the 2080 (Ti). Here too, to be convinced of this, just look at the figures, in particular the constitution of each card:
The RTX 3070 is modeled on the 2080, the 3080 on the 2080 Ti. All this while being much cheaper than when they left or even their current prices. As we had analyzed, the 3080 is the most interesting model for its TFLOPS / euros ratio, which will be important, especially for computing enthusiasts.
There are some subtleties to note in this table: first of all the number of Tensor Cores, revised downwards. They are indeed twice as fast in their “v3” than the previous generation, which explains this choice. While there are more RT Cores, they have another big advantage: they manage competitive execution. They can be active at the same time as the Tensor Cores and therefore more efficient, as we will see later.
Regarding the FP64 computing power, NVIDIA is doing as usual. Its diagrams don’t mention dedicated units, but they do exist for applications to function. However, there is not as much as in an A100, where the ratio is 1: 2 compared to the FP32. In GA102 / 104 it is 1:64.
We have indeed measured it with the AIDA64 GPGPU test:
- GeForce RTX 2080:
FP32: 11,706 GFLOPS
FP64: 366 GFLOPS
- GeForce RTX 2080 Ti:
FP32: 16,348 GFLOPS
FP64: 503 GFLOPS
- GeForce RTX 3080:
FP32: 31,698 GFLOPS
FP64: 539 GFLOPS
The information has since been confirmed to us by NVIDIA, which has therefore halved this report, no doubt to limit the units and remain on the same level of calculations in FP64. We will not come back further to the evolutions of the Ampere architecture, already detailed in previous articles.
Finally a cooler whose fan stops when idle!
If NVIDIA started the GeForce 30 series with the RTX 3080, it is not trivial. This is its “flagship”, thought both for players and those who want to make a beast of calculation. While the RTX 3090 will perform better, it will target specific uses. Here, NVIDIA is touting a card of a different kind.
It is advertised as twice as efficient as the RTX 2080 and better than the 2080 Ti. Enough to be the first card to play in 4K at 60 frames per second without tricks such as DLSS. With a power revised upwards, of course, but controlled. Its TDP goes from 250 to 320 watts (+ 28%).
NVIDIA nevertheless wanted to keep a silent card. To reach the performance level of the 2080 Ti, concessions had to be made, in particular on the temperature which could climb to more than 80° C and a fan that was still reasonable, but which ventilated more than on the other models.
This explains the work done around the PCB, its reduction for the integration of a through-fan, a design thought to directly extract part of the heat from the case, taking advantage of the overall airflow for the rest. In our tests, even in our compact machine, we did not find any problems.
The GPU remains as advertised within 70-75° C. The fan rotates just under 1700 rpm when the GPU is used, and around 1000 rpm when it is low. It remains silent in all cases. Good news: this is the first time on a Founders edition that it has stopped idling. A solution already integrated by many partners and which was missing from NVIDIA cards.
The GPU can climb up to 2 GHz depending on the case, now going down to 210 MHz at rest. The minimum consumption on our machine equipped with a Core i9-10850K was just under 50 watts, as with older generations or even competing cards. No surprises here.
Note that the card is heavy, 1.4 kg. This is a little more than the 2080 Ti or the Radeon VII (approximately 1.3 kg). It occupies only two PCIe slots within the enclosure and has no NVLink connectors (now reserved for RTX 3090s). The Founders edition requires a 12 pin PCIe power connector, an adapter is included.
Side video outputs, it has three DisplayPort 1.4a and an HDMI 2.1 at 48 Gb/s (its maximum speed). Up to four screens can be managed simultaneously depending on the definition.
The machine test
Our first work on this map was to verify the company’s performance claims. We built a machine equipped with a Core i9-10850K processor with 32 GB of G.Skill 3 GHz memory and an ASRock Z490 Taichi motherboard. Why not a PCIe 4.0 model? Because the gains are non-existent in games, especially for a high-end model. It will probably be different with DirectStorage and RTX IO, in 2021.
We did some quick tests on a Ryzen 9 3950XT and didn’t find any particular deviations. Sometimes the higher frequency of the Intel CPU worked in its favor. We therefore stayed on this configuration. The machine was installed under Windows 10 (May 2020) up to date, both for the system, the drivers and the BIOS / UEFI.
We will not use an AMD graphics card here for reference, because both the Radeon RX 5700XT and the Radeon VII (which is no longer sold) are positioned under the GeForce RTX 2080, integrating them would not have been of great interest. This shows in passing that 16 GB of memory is not enough to put a GPU in the lead.
On the other hand, it will be interesting to oppose these Radeons to the future RTX 3070 and to review the 3080 once the RX 6000 available for testing in November.
We have chosen to test the card at 1080p, 1440p and 2160p (4K). One way to see the progress of performance and especially when games are limited by the CPU more than the GPU. When high definition texture packs are offered, we have them installed. Ray tracing and DLSS are analyzed separately.
We also spent time breaking down performance in different 3D rendering applications, where all the power of the Ampere architecture could be expressed more fully. The workload is indeed mixed between INT32 and FP32 in games, which is much less the case in pure calculation.
What do the benchmarks say?
Let’s just start with a fairly popular benchmark, Unigine Superposition, which allows you to climb to 8K rendering tests, with DirectX and OpenGL. The gains are clear and fairly stable: the 3080 does up to 70% better than the 2080 and 33% better than the 2080 Ti on average. In the 4K Optimized test, we exceeded 100 frames per second for the first time.
Let’s continue with 3DMark. We used Time Spy, the most demanding test in its Classic and Extreme (4K) versions, as well as the VRS, DLSS and DirectX Raytracing tests.
The evolutions are of the same order: 70% better than the 2080 and 30% on the 2080 Ti. The second case is easily explained: if the number of FP32 units has been doubled, some also perform INT32 calculations. Especially, units related to 3D rendering such as ROPs, texturing units, memory bandwidth, etc. have not been doubled, a way for NVIDIA to limit the size of its chip. Their growth is 25/30%.
Note one exception here: activating VRS Tier 1 seems to benefit the RTX 3080 far more than the older models. The gain is respectively 101% compared to the 2080 and 45% compared to the 2080 Ti. As a reminder, the Radeon RX 6000 will be the first cards from AMD to support VRS and DirectX Raytracing.
In games: + 70% of the 2080, up to + 40% compared to the 2080 Ti. The benchmarks remain a bit theoretical. Let’s get down to business with games. First observation during our tests is that NVIDIA promise of a card cut for 4K has been kept. Despite consistently testing with the maximum options, we almost always got a result above 60 fps. In some games, we are more than double:
GeForce RTX 3080 Gaming Performance Average
Clearly, this card is not made for a simple 1080p screen. This is the case where the gains are the lowest compared to previous generations. We are still in the order of 35% and 16% on average. We sometimes find 70/30% of benchmarks, especially in greedy games like Borderlands 3.
As soon as we switch to 1440p the differences return to their normal, the card expressing itself at its best in 4K. Here too Borderlands 3 stands out with gains of up to 94/48%, exceeding for the first time 60 fps in 4K. Impressive gains, especially when we remember that it is a card at 720 euros, where the 2080 Ti was marketed in the 1200 euros still in recent weeks.
Note that the Vulkan version of Tom Clancy’s Breakpoint encountered a problem in its 4K version with the RTX 2080 and 3080, without us knowing why. Without doubt a bug which will have to be corrected. The DirectX version of the game did not encounter the same problem. We therefore only counted this for our averages.
Although not very greedy, Rainbow Six Siege also benefits from the transition to the 30 series more intensively from 1440p, where we approach 400 fps. The game is therefore ready for the 360 Hz panels that NVIDIA and its partners will try to sell us. We climb to 245 fps in 4K, the CPU limits on the other hand to 439 fps in 1080p and all the same.
RTX / DLSS performance: the icing on the cake
As you might expect, these gaps increase with the activation of ray tracing or DLSS. The RT / Tensor Cores having been improved and the FP32 power revised upwards, compatible games take full advantage. Especially those who no longer use classic rendering, like Quake II RTX.
Here, we find a gain of 85 to 90% compared to the RTX 2080, as for Bordelands in 4K. From 45 to 50% compared to the 2080 Ti. 4K at 60 fps isn’t here yet, but we’re getting closer to it. You can still enjoy it at almost 90 fps in 1440p, which is already impressive.
NVIDIA has made available to us a specific patch for Wolfenstein: Young blood, to improve performance (especially in 1080p) and to enable the asynchronous operation of RT Cores. We have noted the results of the current version of the game and of this patch which should be released soon.
Here again, in all cases we exceed 60 fps in 4K without DLSS 2.0. With, we double this figure, even asking for the “Quality” rendering level. On average, the gains in 1440p with RT / DLSS are 77/40% compared to the 2080 and 2080 Ti. 89/43% in 4K.
3D rendering times halved compared to 2080
As you might expect, if the GeForce RTX 3080 is an interesting card in games, going so far as to outperform the best cards of the previous generation by 30 to 40% depending on the case, this is not the practice. which is most favorable to them. To be convinced, just run Blender 2.90 in a basic scene, like bmw27.
Although extremely simple, it benefits from a gain of 125% compared to the RTX 2080 and 70% compared to the 2080 Ti. This rendering, which requires almost a minute even on the largest CPUs on the market, only takes 12 seconds here. And this is not the scene that shows the biggest gaps. We have retained two others.
The first is barbershop, available in the Blender public demos. The second is a test scene provided by NVIDIA allowing motion blur to be activated, better supported by the second generation RT Cores integrated into the Ampere architecture, which is already usable via OptiX.
In the first case, the gains increase to 125 and 112% respectively. A scene that took nearly 15 minutes to calculate on a GPU now only takes 7! The numbers climb to 191/160% with motion blur.
In most of the other tests, we find deviations of over 100% from the 2080 and around 60 to 70% from the 2080 Ti. We took readings in different applications compatible with NVIDIA ray tracing acceleration: Reshift, V-Ray, Octane, etc. In SpecViewPerf, which does not take advantage of it, the results are more limited, even if in some tests the gains are at the levels noted for games: 70/30%.
In a pure calculation tool like Hashcat, which is used to crack passwords, it will depend on the algorithms and their complexity. For example, an MD5 will see very little gain, even if by pushing a little on the parameters in manual mode we managed to climb to 59,759 MH/s, more than those by default (workload 4).
But on PDF, SHA-512crypt or Scrypt, we approach the worst of 100% compared to 2080, going up to 159%. With respect to the 2080 Ti, there is between 59 and 135% of gain on these three precise cases.
For video compression, don’t expect any gain. The NVENC engine remains the same. The only gains that can be hoped for in tools like Première are those in the application of CUDA filters or using Tensor Cores for example. We recorded 69.4 fps for 4K> 1080p compression in Handbrake in H.265 via this engine, as on the 20 series RTX.
Reflex and Broadcast: NVIDIA targets competition enthusiasts
As we have mentioned in recent weeks, the arrival of the GeForce RTX 30 series signals the arrival of software improvements. Certainly not the pilot itself, which is however starting to get old, but ancillary tools and other solutions for developers. In particular Reflex which will integrate Fortnite tomorrow, then others.
But also Broadcast which allows streamers to remove their background in real time, cancel ambient noise from their microphones, and automatically monitor their faces with the camera. We will have the opportunity to go into these features in detail when they are fully available.
We also appreciate that the overlay is intended to be reinforced to be more useful, like what AMD already offers. New pilots will be proposed shortly to bring these improvements. New features that are not, for the most part, reserved for the GeForce 30 series.
But is it more efficient?
The question of the consumption of the RTX 3080 occupied us for some time. Indeed, it is the manufacturer’s most greedy card ever produced for the general public market with its 320 watts. It is therefore impossible to put an equivalent opposite and oppose their level of performance.
Above all, the classic methods of reading on catch are a concern. The CPU also has its role to play, and if it consumes 150 watts out of 300 watts recorded for the GPU, it penalizes the efficiency gains noted. Not to mention the performance of the feed. But this has the advantage of being what the consumer will pay.
NVIDIA recommends that testers rely on its FrameView tool at a fixed performance in a game (60 fps for example) to pit one card against another. It also provides some testers with its PCAT kit (which we have not received) to isolate the consumption of the card from the rest of the system materially.
We therefore opted for an intermediate solution. The readings were taken with a device of our design (we will come back to this in a future article). Every second, the consumption is read during a test. We derive an average from it that we relate to the observed performances.
In the first case, this is the built-in test of Borderlands, launched in 4K with the Brutal quality level. We divide the consumption by the score obtained in order to obtain the amount of energy required in joules (J) to calculate an image. In the second case, we multiply the rendering time of the barbershop scene by the consumption to obtain the amount of energy in Wh necessary to render the entire image.
And as we can see, if the consumption recorded at the plug is indeed on the increase, from 15 to 40% depending on the case, it is very largely offset by the performance gains. Rendering the image under Blender thus only requires 38.1 Wh compared to 67/70 Wh with the RTX 2080 (Ti), i.e. 45% reduction. The same goes for Bordelands where there were 8.3 and 8.6 joules per image against 6.9 now, a drop of 28 and 16%.
RTX 3080 and Core i9-10850K: is 620W power enough?
One of the questions we were asked the most about this GeForce RTX 3080 was the power supply. NVIDIA recommends 750 watts, some manufacturers push towards their 850-watt blocks. As always, there is no automatic answer on the subject, everything will depend on your machine, especially the CPU.
To verify that a 620 watt supply is sufficient, we used one of our older Antec models on all of our tests. Despite the Core i9-10850K, it held up with no problem. The consumption recorded at the outlet (therefore taking into account the performance of the power supply) sometimes climbed to 520 watts, but not beyond. If we deliberately pushed the CPU and GPU, we could reach 620 watts.
This block is therefore sufficient, especially if you have a less demanding CPU. Otherwise, it will hit its limits, especially if you overclock the CPU and / or GPU. The latter should also have a good leeway, if only when we push the limit of temperature and / or consumption.
We will revisit the subject when all the tools have been updated. Until then, consider NVIDIA recommendation as adequate. There is no point in climbing too high or spending a fortune, just opt for a quality power supply, if possible 80PLUS certified to have a good performance.
The high-end menu that some people dreamed of
At 720 euros, the GeForce RTX 3080 is not for all budgets. Some prefer to wait for the RTX 3070 at 520 euros which will undoubtedly be more tailored for the 1440p, as was the 2080 it replaces. All while enjoying the advantages of NVIDIA, from ray tracing to DLSS via Reflex, Broadcast, etc.
The RTX 3080 is cut for 4K, for 1440p “comfort”, in particular with 120 Hz + screens and especially for computing. There is a good chance that the pros will tear it up for this reason, although some will no doubt wait to see what the RTX 3090 and its 24 GB of GDDR6x have in the stomach.
We appreciate the gain in energy efficiency brought by Ampere, the silence of the card and its rather convincing design. Of course we will have to wait to see what the partners offer, but this Founders Edition is already very well made. Will the stocks be sufficient? We will find out tomorrow.
Because for those who have the budget, having 40% more performance than an RTX 2080 Ti for 40% less on the final bill is already a sufficient argument to crack. What to oppose to AMD and its Radeon RX 6000 a double challenge to convince next November: to be at the level, and to do so at such a competitive price.