At Nvidia’s GPU Technology Conference in 2010, CEO Jen-Hsun Huang made some pretty dramatic claims about his company’s future GPU architecture, code-named Kepler. Huang predicted the chip would be nearly three times more efficient, in terms of FLOPS per watt, than the firm’s prior Fermi architecture. Those improvements, he said, would go “far beyond” the traditional advances chip companies can squeeze out of the move to a newer, smaller fabrication process. The gains would come from changes to the chip’s architecture, design, and software together.

Fast forward to today, and it’s time to see whether Nvidia has hit its mark. The first chip based on the Kepler architecture is hitting the market, aboard a new graphics card called the GeForce GTX 680, and we now have a clear sense of what was involved in the creation of this chip. Although Kepler’s fundamental capabilities are largely unchanged versus the last generation, Nvidia has extensively refined and polished nearly every aspect of this GPU with an eye toward improved power efficiency.

Kepler was developed under the direction of lead architect John Danskin and Sr. VP of GPU engineering Jonah Alben. Danskin and Alben told us their team took a rather different approach to chip development than what’s been common at Nvidia in the past, with much closer collaboration between the different disciplines involved, from the architects to the chip designers to the compiler developers. An idea that seemed brilliant to the architects would be nixed because it didn’t work well in silicon, or if it didn’t serve the shared goal of building a very power-efficient processor.

Although Kepler is, in many ways, the accumulation of many small refinements, Danskin identified the two most major changes as the revised SM—or shader multiprocessor, the GPU’s processing “core”—and a vastly improved memory interface. Let’s start by looking at the new SM, which Nvidia calls the SMX, because it gives us the chance to drop a massive block diagram on you. Warm up your scroll wheels for this baby.

To some extent, GPUs are just massive collections of floating-point computing power, and the SM is the locus of that power. The SM is where nearly all of the graphics processing work takes place, from geometry processing to pixel shading and texture sampling. As you can see, Kepler’s SMX is clearly more powerful than past generations, because it’s over 700 pixels tall in block diagram form. Fermi is, like, 520 or so, tops. More notably, the SMX packs a heaping helping of ALUs, which Nvidia has helpfully labeled as “cores.” I’d contend the SM itself is probably the closest analog to a CPU core, so we’ll avoid that terminology. Whatever you call it, though, the new SMX has more raw computing power—192 ALUs versus 32 ALUs in the Fermi SM. According to Alben, about half of the Kepler team was devoted to building the SMX, which is a new design, not a derivative of Fermi’s SM.

The organization of the SMX’s execution units isn’t truly apparent in the diagram above. Although Nvidia likes to talk about them as individual “cores,” the ALUs are actually grouped into execution units of varying widths. In the SMX, there are four 16-ALU-wide vector execution units and four 32-wide units. Each of the four schedulers in the diagram above is associated with one vec16 unit and one vec32 unit. There are eight special function units per scheduler to handle, well, special math functions like transcendentals and interpolation. (Incidentally, the partial use of vec32 units is apparently how the GF114 got to have 48 ALUs in its SM, a detail Alben let slip that we hadn’t realized before.)

Although each of the SMX’s execution units works on multiple data simultaneously according to its width—and we’ve called them vector units as a result—work is scheduled on them according to Nvidia’s customary scheme, in which the elements of a pixel or thread are processed sequentially on a single ALU. (AMD has recently adopted a similar scheduling format in its GCN architecture.) As in the past, Nvidia schedules its work in groups of 32 pixels or threads known as “warps.” Those vec32 units should be able to output a completed warp in each clock cycle, while the vec16 units and SFUs will require multiple clocks to output a warp.



The increased parallelism in the SMX is a consequence of Nvidia’s decision to seek power efficiency with Kepler. In Fermi and prior designs, Nvidia used deep pipelining to achieve high clock frequencies in its shader cores, which typically ran at twice the speed of the rest of the chip. Alben argues that arrangement made sense from the standpoint of area efficiency—that is, the extra die space dedicated to pipelining was presumably more than offset by the performance gained at twice the clock speed. However, driving a chip at higher frequencies requires increased voltage and power. With Kepler’s focus shifted to power efficiency, the team chose to use shorter pipelines and to expand the unit count, even at the expense of some chip area. That choice simplified the chip’s clocking, as well, since the whole thing now runs at one speed.

Another, more radical change is the elimination of much of the control logic in the SM. The key to many GPU architectures is the scheduling engine, which manages a vast number of threads in flight and keeps all of the parallel execution units as busy as possible. Prior chips like Fermi have used lots of complex logic to decide which warps should run when, logic that takes a lot of space and consumes a lot of power, according to Alben. Kepler has eliminated some of that logic entirely and will rely on the real-time complier in Nvidia’s driver software to help make scheduling decisions. In the interests of clarity, permit me to quote from Nvidia’s whitepaper on the subject, which summarizes the change nicely:

Both Kepler and Fermi schedulers contain similar hardware units to handle scheduling functions, including, (a) register scoreboarding for long latency operations (texture and load), (b) inter-warp scheduling decisions (e.g., pick the best warp to go next among eligible candidates), and (c) thread block level scheduling (e.g., the GigaThread engine); however, Fermi’s scheduler also contains a complex hardware stage to prevent data hazards in the math datapath itself. A multi-port register scoreboard keeps track of any registers that are not yet ready with valid data, and a dependency checker block analyzes register usage across a multitude of fully decoded warp instructions against the scoreboard, to determine which are eligible to issue.

For Kepler, we realized that since this information is deterministic (the math pipeline latencies are not variable), it is possible for the compiler to determine up front when instructions will be ready to issue, and provide this information in the instruction itself. This allowed us to replace several complex and power-expensive blocks with a simple hardware block that extracts the pre-determined latency information and uses it to mask out warps from eligibility at the inter-warp scheduler stage.

The short story here is that, in Kepler, the constant tug-of-war between control logic and FLOPS has moved decidedly in the direction of more on-chip FLOPS. The big question we have is whether Nvidia’s compiler can truly be effective at keeping the GPU’s execution units busy. Then again, it doesn’t have to be perfect, since Kepler’s increases in peak throughput are sufficient to overcome some loss of utilization efficiency. Also, as you’ll soon see, this setup obviously works pretty well for graphics, a well-known and embarrassingly parallel workload. We are more dubious about this arrangement’s potential for GPU computing, where throughput for a given workload could be highly dependent on compiler tuning. That’s really another story for another chip on another day, though, as we’ll explain shortly.

Now that we’ve looked at the SMX, we dial back the magnification a bit and consider the overall layout of the first chip based on the Kepler architecture, the GK104.

You can see that there are four GPCs, or graphics processing clusters, in the GK104, each nearly a GPU unto itself, with its own rasterization engine. The chip has eight copies of the SMX onboard, for a gut-punching total of 1536 ALUs and 128 texels per clock of texture filtering power.

The L2 cache shown above is 512KB in total, divided into four 128KB “slices,” each with 128 bits of bandwidth per clock cycle. That adds up to double the per-cycle bandwidth of the GF114 or 30% more than the biggest Fermi, the GF110. The rest of the specifics are in the table below, with the relevant comparisons to other GPUs.

In terms of basic, per-clock rates, the GK104 stacks up reasonably well against today’s best graphics chips. However, if the name “GK104” isn’t enough of a clue for you, have a look at some of the vitals. This chip’s memory interface is only 256 bits wide, all told, and its die size is smaller than the middle-class GF114 chip that powers the GeForce GTX 560 series. The GK104 is also substantially smaller, and comprised of fewer transistors, than the Tahiti GPU behind AMD’s Radeon HD 7900 series cards. Although the product based on it is called the GeForce GTX 680, the GK104 is not a top-of-the-line, reticle-busting monster. For the Kepler generation, Nvidia has chosen to bring a smaller chip to market first.

Although Nvidia won’t officially confirm it, there is surely a bigger Kepler in the works. The GK104 is obviously more tailored for graphics than GPU computing, and GPU computing is an increasingly important market for Nvidia. The GK104 can handle double-precision floating-point data formats, but it only does so at 1/24th the rate it processes single-precision math, just enough to maintain compatibility. Nvidia has suggested there will be some interesting GPU-computing related announcements during its GTC conference in May, and we expect the details of the bigger Kepler to be revealed at that point. Our best guess is that the GK100, or whatever it’s called, will be a much larger chip, presumably with six 64-bit memory interfaces and 768KB of L2 cache. We wouldn’t be surprised to see its SM exchange those 32-wide execution units for 16-wide units capable of handling double-precision math, leaving it with a total of 128 ALUs per SM. We’d also expect full ECC protection for all local storage and off-chip memory, just like the GF110.

The presence of a larger chip at some point in Nvidia’s future doesn’t mean the GK104 lacks for power. Although it “only” has four 64-bit memory controllers, this chip’s memory interface is probably the most notable change outside of the SMX. As Danskin very carefully put it, “Fermi, our memory wasn’t as fast as it could have been. This is, in fact, as fast as it could be.” The interface still supports GDDR5 memory, but data rates are up from about 4 Gbps in the Fermi products to 6 Gbps in the GeForce GTX 680. As a result, the GTX 680 is able essentially to match the GeForce GTX 580 in total memory bandwidth, at 192 GB/s, while having a 50% narrower data path.

The other novelty in the GK104 is Nvidia’s first PCI Express 3.0-compatible interconnect, which doubles the peak data rate possible for GPU-to-host communication. We don’t expect major performance benefits for graphics workloads from this faster interface, but it could matter in multi-GPU scenarios or for GPU computing applications.

On this page, we intend to explain some of the important new features Nvidia has built into the GK104 or its software stack. However, in the interests of getting this review posted before our deadline, we’ve decided to put in a placeholder, a radically condensed version of the final product. Don’t worry, we’ll fix it later in software—like the R600’s ROPs.

Although the theory is fairly simple, the various implementations of dynamic clocking vary widely in their specifics, which can make them hard to track. Intel’s Turbo Boost is probably the gold standard at present; it uses a network of thermal sensors spread across the die in conjunction with a programmable, on-chip microcontroller that governs Turbo policy. Since it’s a hardware solution with direct inputs from the die, Turbo Boost reacts very quickly to changes in thermal conditions, and its behavior may differ somewhat from chip to chip, since the thermal properties of the chips themselves can vary.

Although distinct from one another in certain ways, both AMD’s Turbo Core (in its CPUs) and PowerTune (in its GPUs) combine on-chip activity counters with pre-production chip testing to establish a profile for each model. In use, power draw for the chip is then estimated based on the activity counters, and clocks are adjusted in response to the expected thermal situation. AMD argues the predictable, deterministic behavior of its DVFS schemes is an admirable trait. The price of that consistency is that it can’t squeeze every last drop of performance out of each individual slab of silicon.

GPU Boost is essentially a first-generation crack at a dynamic clocking feature, and it combines some traits of each of the competing schemes. Fundamentally, the logic is more like the two Turbos than it is like AMD’s PowerTune. With PowerTune, AMD runs its GPUs at a relatively high base frequency, but clock speeds are sometimes throttled back under atypically high GPU utilization. By contrast, GPU Boost starts with a more conservative base clock speed and ranges into higher frequencies when possible.

The inputs for Boost’s decision-making algorithm include power draw, GPU and memory utilization, and GPU temperatures. Most of this information is collected from the GPU itself, but I believe the power use information comes from external circuitry on the GTX 680 board. In fact, Nvidia’s Tom Petersen told us board makers will be required to include this circuitry in order to get the GPU maker’s stamp of approval. The various inputs for Boost are then processed in software, in a portion of the GPU driver, not in an on-chip controller. The combination of software control and external power circuitry is likely responsible for Boost’s relatively high clock-change latency. Stepping up or down in frequency takes about 100 milliseconds, according to Petersen. A tenth of a second is a very long time in the life of a gigahertz-class chip, and Petersen was frank in admitting that this first generation of GPU Boost isn’t everything Nvidia hopes it will become in the future.

Graphics cards with Boost will be sold with a couple of clock speed numbers on the side. The base clock is the lower of the two—1006MHz on the GeForce GTX 680—and represents the lowest operating speed in thermally intensive workloads. Curiously enough, the “boost clock”—which is 1058MHz on the GTX 680—isn’t the maximum speed possible. Instead, it’s “sort of a promise,” according to Petersen, the clock speed at which the GPU should run during typical operation. GPU Boost performance will vary slightly from card to card, based on factors like chip quality, ambient temperatures, and the effectiveness of the cooling solution. GTX 680 owners should expect to see their cards running at the Boost clock frequency as a matter of course, regardless of these factors. Beyond that, GPU Boost will make its best effort to reach even higher clock speeds when feasible, stepping up and down in increments of 13MHz.

Petersen demoed several interesting scenarios to illustrate Boost behavior. In a very power-intensive scene, 3DMark11’s first graphics test, the GTX 680 was forced to remain at its base clock throughout. When playing Battlefield 3, meanwhile, the chip spent most of its time at about 1.1GHz—above both the base and boost levels. In a third application, the classic DX9 graphics demo “rthdribl,” the GTX throttled back to under 1GHz, simply because additional GPU performance wasn’t needed. One spot where Nvidia intends to make use of this throttling capability is in-game menu screens—and we’re happy to see it. Some menu screens can cause power use and fan speeds to shoot skyward as frame rates reach quadruple digits.

Nvidia has taken pains to ensure GPU Boost is compatible with user-driven tweaking and overclocking. A new version of its NVAPI allows third-party software, like EVGA’s slick Precision software, control over key Boost parameters. With Precision, the user may raise the GPU’s maximum power limit by as much as 32% above the default, in order to enable operation at higher clock speeds. Interestingly enough, Petersen said Nvidia doesn’t consider cranking up this slider overclocking, since its GPUs are qualified to work properly at every voltage-and-frequency point along the curve. (Of course, you could exceed the bounds of the PCIe power connector specification by cranking this slider, so it’s not exactly 100% kosher.) True overclocking happens by grabbing hold of a separate slider, the GPU clock offset, which raises the chip’s frequency at a given voltage level. An offset of +200MHz, for instance, raised our GTX 680’s clock speed while running Skyrim from 1110MHz (its usual Boost speed) to 1306MHz. EVGA’s tool allows GPU clock offsets as high as +549MHz and memory clock offsets up to +1000MHz, so users are given quite a bit of leeway for experimentation.

Although GPU Boost is only in its first incarnation, Nvidia has some big ideas about how to take advantage of these dynamic clocking capabilities. For instance, Petersen openly telegraphed the firm’s plans for future versions of Boost to include control over memory speeds, as well as GPU clocks.

More immediately, one feature exposed by EVGA’s Precision utility is frame-rate targeting. Very simply, the user is able to specify his desired frame rate with a slider, and if the game’s performance exceeds that limit, the GPU steps back down the voltage-and-frequency curve in order to conserve power. We were initially skeptical about the usefulness of this feature for one big reason: the very long latency of 100 ms for clock speed adjustments. If the GPU has dialed back its speed because the workload is light and then something changes in the game—say, an explosion that adds a bunch of smoke and particle effects to the mix—ramping the clock back up could take quite a while, causing a perceptible hitch in the action. We think that potential is there, and as a result, we doubt this feature will appeal to twitch gamers and the like. However, in our initial playtesting of this feature, we’ve not noticed any problems. We need to spend more time with it, but Kepler’s frame rate targeting may prove to be useful, even in this generation, so long as its clock speed leeway isn’t too wide. At some point in the future, when the GPU’s DVFS logic is moved into hardware and frequency change delays are measured in much smaller numbers, we expect features like this one to become standard procedure, especially for mobile systems.

Now that we’ve looked at the GPU in some detail, let me drop the specs on you for the first card based on the GK104, the GeForce GTX 680.

The GTX 680 has (as far as we know, at least) all of the the GK104’s functional units enabled, and it takes that revised memory interface up to 6 GT/s, as advertised. The board’s peak power draw is fairly tame, considering its positioning, but not perhaps considering the class of chip under that cooler.

Multiply the chip’s capabilities by its clock speeds, and you get a sense of how the GTX 680 stacks up to the competition. In most key rates, its theoretical peaks are higher than the Radeon HD 7970’s—and our estimates conservatively use the base clock, not the boost clock, as their basis. The only deficits are in peak shader FLOPS, where the 7970 is faster, and in memory bandwidth, thanks to Tahiti’s 384-bit memory interface.

With that said, you may or may not be pleased to hear that Nvidia has priced the GeForce GTX 680 at $499.99. On one hand, that undercuts the Radeon HD 7970 by 50 bucks and should be a decent deal given its specs. On the other, that’s a lot more than you’d expect to pay for the spiritual successor to the GeForce GTX 560 Ti—and despite its name, the GTX 680 is most definitely that. Simply knowing that fact may create a bit of a pain point for some of us, even if the price is justified based on this card’s performance.

Thanks to its relatively low peak power consumption, the GTX 680 can get away with only two six-pin power inputs. Strangely, Nvidia has staggered those inputs, supposedly to make them easier to access. However, notice that the orientation on the lower input is rotated 180° from the upper one. That means the tabs to release the power plugs are both “inside,” facing each other, which makes them harder to grasp. I don’t know what part of this arrangement is better than the usual side-by-side layout.

The 680’s display outputs are a model of simplicity: two dual-link DVI ports, an HDMI output, and a full-sized DisplayPort connector.

At 10″, the GTX 680 is just over half an inch shorter than its closest competitor, the Radeon HD 7970.

This review marks the debut of our new GPU test rigs, which we’ve already outed here. They’ve performed wonderfully for us, with lower operating noise, higher CPU performance in games, and support for PCI Express 3.0.

Oh, before we move on, please note below that we’ve tested stock-clocked variants of most of the graphics cards involved, including the Radeon HD 7970, 7870, 6970, and 5870 and the GeForce GTX 580 and 680. We agonized over whether to use a Radeon HD 7970 card like the XFX Black Edition, which runs 75MHz faster than AMD’s reference clock. However, we decided to stick with stock clocks for the higher-priced cards this time around. We expect board makers to offer higher-clocked variants of the GTX 680, which we’ll happily compare to higher-clocked 7970s once we get our hands on ’em. Although we’re sure our decision will enrage some AMD fans, we don’t think the XFX Black Edition’s $600 price tag would have looked very good in our value scatter plots, and we just didn’t have time to include multiple speed grades of the same product.

As ever, we did our best to deliver clean benchmark numbers. Tests were run at least three times, and we’ve reported the median result.

Thanks to Intel, Corsair, and Gigabyte for helping to outfit our test rigs with some of the finest hardware available. AMD, Nvidia, and the makers of the various products supplied the graphics cards for testing, as well.

Unless otherwise specified, image quality settings for the graphics cards were left at the control panel defaults. Vertical refresh sync (vsync) was disabled for all tests.

The idle measurements were taken at the Windows desktop with the Aero theme enabled. The cards were tested under load running Skyrim at its Ultra quality settings with FXAA enabled.

You can think of these noise level measurements much like our system power consumption tests, because the entire systems’ noise levels were measured. Of course, noise levels will vary greatly in the real world along with the acoustic properties of the PC enclosure used, whether the enclosure provides adequate cooling to avoid a card’s highest fan speeds, placement of the enclosure in the room, and a whole range of other variables. These results should give a reasonably good picture of comparative fan noise, though.

The tests and methods we employ are generally publicly available and reproducible. If you have questions about our methods, hit our forums to talk with us about them.

We’ll begin with a series of synthetic tests aimed at exposing the true, delivered throughput of the GPUs. In each instance, we’ve included a table with the relevant theoretical rates for each solution, for reference.

The pixel fill rate is, in theory, determined by the speed of the ROP hardware, but this test usually winds up being limited by memory bandwidth long before the ROPs run out of steam. That appears to be the case here. Somewhat surprisingly, the GTX 680 manages to match the Radeon HD 7970 almost exactly, even though the Radeon has substantially more potential memory bandwidth on tap.

Nvidia’s new toy comes out looking very good in terms of texturing capacity, more than doubling the performance of the GeForce GTX 580 in the texture fill and integer filtering tests. Kepler’s full-rate FP16 filtering allows it outperform the 7970 substantially in the final test. In no case does the GTX 680’s relatively lower memory bandwidth appear to hinder its ability to keep up with the 7970.

Although the GTX 680 has a higher theoretical rasterization rate than the GTX 580, the GK104 GPU has only half as many setup and tessellator units (aka PolyMorph engines) as the GF110. Despite that fact, the GTX 680 achieves twice the tessellation performance of Fermi. The GTX 680 even exceeds that rate in TessMark’s 64X expansion test, where it’s nearly three times the speed of the Radeon HD 7970. We doubt we’ll see a good use of a 64X geometry expansion factor in a game this year, but the Kepler architecture clearly has plenty of headroom here.

Our first look at the performance of Kepler’s re-architected SMX yields some mixed, and intriguing, results. The trouble with many of these tests is that they split so cleanly along architectural or even brand lines. For instance, the 3DMark particles test runs faster on any GeForce than on any Radeon. We’re left a little flummoxed by the fact that the 7970 wins three tests outright, and the GTX 680 wins the other three. What do we make of that, other than to call it even?

Nonetheless, there are clear positives here, such as the GTX 680 taking the top spot in the ShaderToyMark and GPU cloth tests. The GTX 680 improves on the Fermi-based GTX 580’s performance in five of the six tests, sometimes by wide margins. Still, for a card with the same memory bandwidth and ostensibly twice the shader FLOPS, the GTX 680 doesn’t appear to outperform the GTX 580 as comprehensively as one might expect.

This benchmark, built into Civ V, uses DirectCompute to perform compression on a series of textures. Again, this is a nice result from the new GeForce, though the 7970 is a smidge faster in the end.

Here’s where we start to worry. In spite of doing well in our graphics-related shader benchmarks and in the DirectCompute test above, the GTX 680 tanks in LuxMark’s OpenCL-driven ray-tracing test. Even a quad-core CPU is faster! The shame! More notably, the GTX 680 trails the GTX 580 by a mile—and the Radeon HD 7970 by several. Nvidia tells us LuxMark isn’t a target for driver optimization and may never be. We suppose that’s fine, but we’re left wondering just how much Kepler’s compiler-controlled shaders will rely on software tuning in order to achieve good throughput in GPU computing applications. Yes, this is only one test, and no, there aren’t many good OpenCL benchmarks yet. Still, we’re left to wonder.

Then again, we are in the early days for OpenCL support generally, and AMD seems to be very committed to supporting this API. Notice how the Core i7-3820 runs this test faster when using AMD’s APP driver than when using Intel’s own OpenCL ICD. If a brainiac monster like Sandy Bridge-E can benefit that much from AMD’s software tuning over Intel’s own, well, we can’t lay much fault at Kepler’s feet just yet.

Our test run for Skyrim was a lap around the town of Whiterun, starting up high at the castle entrance, descending down the stairs into the main part of town, and then doing a figure-eight around the main drag.

Since these are pretty capable graphics cards, we set the game to its “Ultra” presets, which turns on 4X multisampled antialiasing. We then layered on FXAA post-process anti-aliasing, as well, for the best possible image quality without editing an .ini file.

At this point, you may be wondering what’s going on with the funky plots shown above. Those are the raw data for our snazzy new game benchmarking methods, which focus on the time taken to render each frame rather than an frame rate averaged over a second. For more information on why we’re testing this way, please read this article, which explains almost everything.

If that’s too much work for you, the basic premise is simple enough. The key to creating a smooth animation in a game is to flip from one frame to the next as quickly as possible in continuous fashion. The plots above show the time required to produce each frame of the animation, on each card, in our 90-second Skyrim test run. As you can see, some of the cards struggled here, particularly the GeForce GTX 560 Ti, which was running low on video memory. Those long waits for individual frames, some of them 100 milliseconds (that’s a tenth of a second) or more, produce less-than-fluid action in the game.

Notice that, in dealing with render times for individual frames, longer waits are a bad thing—lower is better, when it comes to latencies. For those who prefer to think in terms of FPS, we’ve provided the handy table at the right, which offers some conversions. See how, in the last plot, frame times are generally lower for the GeForce GTX 680 than for the Radeon HD 7970, and so the GTX 680 produces more total frames? Well, that translates into…

…higher FPS averages for the new GeForce. Quite a bit higher, in this case. Also notice that some of our worst offenders in terms of long frame times, such as the GeForce GTX 560 Ti and the GTX 560 Ti 448, produce seemingly “acceptable” frame rates of 41 and 50 FPS, respectively. We might expect that FPS number to translate into adequate performance, but we know from looking at the plot that’s not the case.

To give us a better sense of the frame latency picture, or the general fluidity of gameplay, we can look at the 99th percentile frame latency—that is, 99% of all frames were rendered during this frame time or less. Once we do that, we can see just how poorly the GTX 560 Ti handles itself here compared to everything else.

We’re still experimenting with our new methods, and I’m going to drop a couple of new wrinkles on you here today. We think the 99th percentile latency number is a good one, but since it’s just one point among many, we have some concerns about using it alone to convey the general latency picture. As a bit of an experiment, we’ve decided to expand our look at frame times to cover more points, like so.

This illustrates how close the matchup is between several of the cards, especially our headliners, the Radeon HD 7970 and GeForce GTX 680. Although the GeForce generally produces frames in less time than the Radeon, both are very close to that magic 16.7 ms (60 FPS) mark 95% of the time. Adding in those last few percentage points, that last handful of frames that take longer to render, makes the GTX 680’s advantage nearly vanish.

Our next goal is to focus more closely on the tough parts, places where the GPU’s performance limitations may be contributing to less-than-fluid animation, occasional stuttering, or worse. For that, we add up all of the time each GPU spends working on really long frame times, those above 50 milliseconds or (put another way) below about 20 FPS. We’ve explained our rationale behind this one in more detail right here, if you’re curious or just confused.

Only the two offenders we’ve already identified really spend any significant time working on really long-to-render frames. The rest of the pack (and I’d include the GTX 580 in this group) handles Skyrim at essentially the highest quality settings quite well.

Several factors converged to make us choose these settings. One of our goals in preparing this article was to avoid the crazy scenario we had in our GeForce GTX 560 Ti 448 review, where every card tested could run nearly every game adequately. We wanted to push the fastest cards to their limits, not watch them tie a bunch of other cards for adequacy. So we cranked up the resolution and image quality and, yes, even enabled DirectX 11. We had previously avoided using DX11 with this game because the initial release had serious performance problems on pretty much any video card. A patch has since eliminated the worst problems, and the game is now playable in DX11, so we enabled it.

This choice makes sense for benchmarking ultra high-end graphics cards, I think. I have to say, though, that the increase in image quality with DX11 tessellation, soft shadows, and ambient occlusion isn’t really worth the performance penalty you’ll pay. The image quality differences are hard to see; the performance differences are abundantly obvious. This game looks great and runs very smoothly at 2560×1600 in DX9 mode, even on a $250 graphics card.

The GTX 680 again takes the top spot in the FPS sweeps, but as you can see in the plots above, all of the cards produce some long frame times with regularity. As a result of those higher-latency frames, the GTX 680 ties the 7970 in the 99th percentile frame time metric.

A broader look at the latency picture shows that the GTX 680 generally produces lower-latency frames than the 7970, which is why its FPS average is so high. However, that last 1% gives it trouble.

Lots of trouble, when we look at the time spent on long-latency frames. What happened to the GTX 680? Well, look up at the plots above, and you’ll see that, very early in our test run, there was a frame that took nearly 180 ms to produce—nearly a fifth of a second. As we played the game, we experienced this wait as a brief but total interruption in gameplay. That stutter, plus a few other shorter ones, contributed to the 680’s poor showing here. Turns out we ran into this problem with the GTX 680 in four of our five test runs, each time early in the run and each time lasting about 180 ms. Nvidia tells us the slowdown is the result of a problem with its GPU Boost mechanism that will be fixed in an upcoming driver update.

We tested Battlefield 3 with all of its DX11 goodness cranked up, including the “Ultra” quality settings with both 4X MSAA and the high-quality version of the post-process FXAA. We tested in the “Operation Guillotine” level, for 60 seconds starting at the third checkpoint.

Blessedly, there aren’t many wrinkles at all in BF3 performance from any of the cards. The 99th percentile frame times mirror the FPS averages, and all is well with the world. Even the slow cards are just generally slow and not plagued with excessively spiky, uneven frame times like we saw in Arkham City. This time, the GeForce GTX 680 outperforms the Radeon HD 7970 in every metric we throw at it, although its advantage is incredibly slim in every case.

Our cavalcade of punishing but pretty DirectX 11 games continues with Crysis 2, which we patched with both the DX11 and high-res texture updates.

Notice that we left object image quality at “extreme” rather than “ultra,” in order to avoid the insane over-tessellation of flat surfaces that somehow found its way into the DX11 patch. We tested 90 seconds of gameplay in the level pictured above, where we gunned down several bad guys, making our way up the railroad bridge.

The GTX 680 just trails the 7970 in the FPS average, but its 99th percentile frame time falls behind a couple of other cards, including the Radeon HD 7870. Why? If you look at the plot for the GTX 680, you can see how, in the opening portion of the test run, its frame times range regularly into the 30-millisecond range. That’s probably why its 99th percentile frame time is 32 milliseconds—or, translated, roughly 30 FPS—and therefore nothing to worry about in the grand scheme. The GTX 680 devotes almost no time to really long frames, and its performance is quite acceptable here—just not quite as good as the 7970’s during those opening moments of the test sequence.

We tested Serious Sam 3 at its “Ultra” quality settings, only tweaking it to remove the strange two-megapixel cap on the rendering resolution.

How interesting. Generally, this is one of those games where a particular sort of GPU architecture tends to do well—Radeons, in this case. However, the GeForce GTX 680 is different enough from its siblings that it utterly reverses that trend, effectively tying the Radeon HD 7970.

We’re pretty pleased with the nice, low power consumption numbers our new test rigs are capable of producing at idle. Not bad for quad memory channels, Sandy Bridge Extreme, and an 850W PSU, eh?

Although the entire system’s power draw is part of our measurement, the display is not. The reason we’re testing with the display off is that the new Radeons are capable of going into a special ultra-lower power mode, called ZeroCore power, when the display goes into standby. Most of the chip is turned off, and the GPU cooling fans spin down to a halt. That allows them to save about 12W of power draw on our test system, a feat the GTX 680 can’t match. Still, the 680’s power draw at idle is otherwise comparable to the 7970’s, with only about a watt’s worth of difference between them.

We’re running Skyrim for this test, and here’s where Kepler’s power efficiency becomes readily apparent. When equipped with the Radeon HD 7970, our test rig requires over 40W more power under load than it does when a GeForce GTX 680 is installed. You can see why I’ve said this is the same class of GPU as the GeForce GTX 560 Ti, although its performance is a generation beyond that.

Since we tested power consumption in Skyrim, we can mash that data up with our performance results to create a rough picture of power efficiency. By this measure, the GTX 680 is far and away the most power-efficient performer we’ve tested.

Even though the Radeon HD 7970 can turn off its cooling fan when the display goes into power-save, it doesn’t convey any measurable advantage here. The GTX 680 essentially adds nothing to our system’s total noise levels, which consist almost entirely of noise from the (very quiet) CPU cooler.

Under load, the GTX 680’s cooler performs admirably, maintaining the same GPU temperature as the 7970 while generating substantially less sound pressure. Of course, the GTX 680’s cooler has quite a bit less power (and thus heat) to deal with, but Nvidia has a long tradition of acoustic excellence for its coolers, dating back to at least the GeForce 8800 GTX (though not, you know, NV30.)

We’re not terribly pleased with the fan speed profile AMD has chosen for its stock 7970 cards, which seems to be rather noisy. However, we should note that we’ve seen much better cooling and acoustic performance out of XFX’s Radeon HD 7970 Black Edition, a card with slightly higher clock speeds. It’s a little pricey, but it’s also clearly superior to the reference design.

The scatter plot of power and performance on the previous page has inspired me to try a bit of an experiment. This is just for fun, so feel free to skip ahead if you’d like. I’m just curious see what we can learn by mashing up some other bits of info with our overall performance data across all of the games we tested.

This one isn’t really fair at all, since we haven’t normalized for the chip fabrication process involved. The three GPUs produced on a 28-nm process are all vastly superior, in terms of performance per area, to their 40-nm counterparts. The difference in size between the GeForce GTX 580 and the Radeon HD 7870, for roughly equivalent performance, is comical. The GTX 680 looks quite good among the three 28-nm chips, with higher performance and a smaller die area than the 7970.

The next few scatters are for the GPU architecture geeks who might be wondering about all of those graphics rates we’re always quoting and measuring. Here’s a look at how the theoretical peak numbers in different categories track with delivered performance in games. What we’re looking for here is a strong or weak correlation; a stronger correlation should give us a nice collection of points roughly forming diagonal line, or something close to it.

The first couple of plots, with rasterization rate and FLOPS, don’t show us much correlation at all between these properties and in-game performance. The final three begin to fall into line a little bit, with memory bandwidth and ROP rate (or pixel fill) being most strongly correlated, to my eye. Notice that the GeForce GTX 680 is apparently very efficient with its memory bandwidth, well outside of the norm.

These results led to me wonder whether the correlations would grow stronger if we subbed in the results of directed tests instead of theoretical peak numbers. We do have some of that data, so…

ShaderToyMark gives us the strongest correlation, which shouldn’t be too much of a surprise, since it’s the most game-like graphics workload among our directed tests. Otherwise, I’m not sure we can draw too many strong conclusions from these results, other than to say that the GTX 680 sure looks to have an abundance of riches when it comes to FP16 texture filtering.

With a tremendous amount of information now under our belts, we can boil things down, almost cruelly, to a few simple results in a final couple of scatter plots. First up is our overall performance index, in terms of average FPS across all of the games we tested, matched against the price of each card. As usual, the most desirable position on these plots is closer to the top left corner, where the performance is higher and the price is lower.

The GeForce GTX 680 is slightly faster and 50 bucks less expensive than the Radeon HD 7970, so it lands in a better position on this first plot. However, if we switch to an arguably superior method of understanding gaming performance and smoothness, our 99th percentile frame time (converted to FPS so the plot reads the same), the results change a bit.

The GTX 680’s few instances of higher frame latencies, such as that apparent GPU Boost issue in Arkham City, move it just a couple of ticks below the Radeon HD 7970 in overall performance. Then again, the GTX 680 costs $50 less, so it’s still a comparable value.

The truth is that, either way you look at it, there is very little performance difference between these two cards, and any difference is probably imperceptible to the average person.

We’ve already established that the GTX 680 is more power efficient than the Radeon HD 7970 (at least when running a game, if not sitting idle), and it’s quieter, too. In fact, there is very little not to like about the GeForce GTX 680. With this GPU, Nvidia catches up to AMD on a whole host of fronts overnight, from power efficiency and performance to process tech and feature set. Nvidia was even able to supply us with working software that uses its H.264 video encoder, something AMD has yet to do for the Radeon HD 7970 and friends. All of those considerations lead us, almost inescapably, to one conclusion: the GeForce GTX 680 has earned itself an Editor’s Choice award for being the most desirable video card in its class.

That honor comes with some small caveats, though. For one, if you require the absolutely fastest single-GPU video card available, with price as no object, then you’ll probably want to check out some of the higher-clocked versions of the Radeon HD 7970, like XFX’s Black Edition. We figure a slight bump in GPU core and memory clocks ought to put the 7970 solidly over the top versus a stock-clocked GTX 680 like the one we tested—and we don’t yet have any information about what board makers will be doing with the GTX 680. You’ll have to be very finely attuned to bragging rights for any of this to matter, though. Fortunately, AMD and Nvidia are so attuned, and I expect to see higher-clocked variants of both cards hitting the market in the coming weeks in an attempt to establish a clear winner. That should be fun to watch.

Also, the GeForce GTX 680 is a massive generational improvement, extracting roughly twice the performance of the GeForce GTX 560 Ti from a similar class of GPU. Still, we’re a little disappointed Nvidia isn’t passing along more of those gains to consumers in the form of higher performance per dollar, as has happened in the past. Half a grand is a lot to ask for a mid-sized chip on a card with a 256-bit memory interface. We had a similar complaint when AMD introduced the Radeon HD 7970, and at that time, we expressed the hope that competition from Nvidia would drive prices down. Now, we’re having to face the reality that the problem isn’t really lack of competitive fire at the GPU companies, it’s the limited number of 28-nm wafers coming out of TSMC, who makes the chips for both firms. The number of good chips per wafer is likely an issue, too. AMD and Nvidia will probably be able to sell all of the chips they can get at current prices for a while, simply because of supply constraints.

We hate to be a Debbie Downer here, though, so we’ll mention something else. The GTX 680 is part of a family of Kepler-based products, and as the fastest member so far, it’s bound to command a premium. But we expect to see a whole range of new cards and GPUs based on Kepler to be hitting the market in the coming months, almost all of them more affordable than this one. Given the amazing efficiency of the Kepler architecture, we expect really good things to follow—and we’ll be there, making up ridiculous ways to plot the goodness.

GPU Boost sounds interesting. Is the delay always 100 ms, or is that just an “oh, we measured that much once” type of typical number? In other words, [i

Too much focus on the high-end. Even at an enthusiast website I doubt many will bother buying either this one or AMD’s offer.

But the GTX 680 did not usher in a new era in which super high end performance becomes available to the masses. Hell even the “mid-end” HD 7870 is 50% more expensive in my country compared to what i paid for my GTX 560Ti last year, right after launch (also i need to add that even this card hasn’t come down in price much since).

I’ve noticed this as well, prices here in the UK have not come down at all since the launch of the last generation of cards. In fact if anything, they’ve gone up, and the problem isn’t only confided to GPU’s. The Core i7 2600K cost around £200 shortly after launch, then, the price rose to £250 – £260, and now, the price is back down to £220 – what a bargain!

“Intel Core i7-2600K 3.40GHz (Sandybridge) Socket LGA1155 Pro save £15.40 £179.99 £215.99 inc VAT”

I can confirm the part @CPUs, here the i5 2500K is more expensive then when it was launched…..insane.

Plus the US destroyed the world’s economy and they can’t manage to get sane enough again to actually really fix that.

Is anybody else worried that they’re not going to fill in the mid-range properly? AMD already have that weird gap where 78xx is still quite expensive with fairly high power draw and 77xx isn’t at all compelling for price/performance. So far nVidia only seem to have 2 Kepler chips about – the mobile 640/650/660 part with 384 shaders and the desktop 680 part with 1536.

I want to see something in the 1152 – 768 shader ballpark with performance that’s not crippled by poor memory bandwidth and a reasonable price, performance and power draw balance. That would be *real* competition for AMD, because as yet they have nothing decent to fight back with from their current range.

Unfortunately I’m rather worried that won’t happen, because their mobile range is packed out with Fermi in the crucial 675/670 bracket, with that one Kepler part somehow magically filling in for the 660/650/640 area. Extrapolating to how the desktop and mobile lines usually correspond to each other, I don’t see a prospect for a desktop 660 any time soon. Which is a damn shame.

For perspective, I have a 560 in my notebook that I want to replace with something with similar thermals and 25-30% better performance. Should be feasible with 28nm. I await a contender.

It is a bit depressing that Nvidia’s mid-range Kepler competes with AMD’s best Sea Islands product. If AMD had its act together, we could have the 680 named something like 660 and save at least $200.

Both AMD and Nvidia can try and have some huge GPUs made that would crush both Tahiti and Kepler, then they’d list them at 2x the price and you would still buy a GTX 680. Would that make you happier? Note that making a bigger and faster GPU won’t necessarily make the GTX 680 or HD7970 any cheaper.

But I’m hypothesizing that the 680 will turn out like the 8800GT. And the 9800GTX’s die was 25% larger than that of the 8800GT (eventually called 9800GT).

If Big Kepler’s die is about the same size as that of the 7970, that would mean the 680 is almost perfectly analogous to the 8800GT in its day. The only difference would be that Nvidia properly named the 680, as opposed to the 8800GT’s painfully inaccurate moniker.

And once we consider the 680’s lack of innovation in the compute space, the argument for “Big Kepler” becomes even stronger. Compute clients don’t care about midrange parts, they go for the biggest and best they can get. If there is a Big Kepler, the 680’s lackluster compute abilities won’t matter.

And don’t worry about price. The 680 is priced as its name suggests it should be priced. Just don’t be surprised if we see a GTX 685 (or GTX 680+) at the $500 price point in the near future.

I really love how early in the article, very key aspects of the architecture are explained (hardware implementation vs. compiler) and what they mean.

Both of these cards are overpowered for most displays, but it does seem like the 7970 has a slight advantage where it matters – less stutter – and this is only in high res displays which most of us probably don’t have and won’t be buying. So in general, the performance of these cards are the same.

Sigh, who knew fabrication of silicon would cost so much these days. But the price might be fair given that silicon scaling might be coming to an end.

Silicon scaling has been “coming to an end” since the days of the P4 Prescott, maybe even earlier than that if you believe everything you read. When we reach the “end” there will be some new tech to circumvent it instead.

Hard disk areal density has been getting close to it’s theoretical limit for ages, yet we keep marching forwards and must now be at a point where density is 20x higher then when perpendicular recording busted through the last naysayer’s “end is nigh” barrier.

I’ll believe it’s reached the end when things stop getting faster, better and smaller, and if that ever happens, perhaps people can FINALLY STOP WRITING INEFFICIENT, BLOATED CODE and actually start making software advances instead.

Nvidia Investor and all around TR annoyance indeego here. Sticking with my 460@ 1920×1080 (Sorry aspect ratios don’t bug me like they do many here.) There hasn’t been a game that crushes my system that I play. I am about 1-2 years behind the curve though…

To be honest both my work servers, workstations, home laptop, home workstation are all vastly overpowered for their real needs and very little on the technology landscape has excited me lately. I would like to get PCIe SSD put in at work but no real rush.

I’m much more interested in upgrading my LCD to 120Hz and not have it suck, but all the 120Hz monitors I’ve read have either serious issues, drawbacks, crap I don’t need, or poor resolution.

all 120hz monitors I’ve seen are 1920×1080 at most. I’m pretty happy with my Samsung S23A700, but there’s room for improvement for sure.

It seems clear now that this was intended to replace the 560 Ti (not the 570/580-based ones either), giving consumers the 2560×1600 performance at the $250-ish range. When nVidia was publicly saying, “Wow, we expected more!” at the 7970 launch, they were internally rubbing their hands together in glee because suddenly their card went from being a mid-range value card to a new high end contender. All because AMD did not bring its A game.

AMD failed. nVidia, being nVidia, saw an opportunity to undercut them just enough to get that as their headline, but not enough to substantially matter in the long run. AMD is unlikely to lower their price because the nVidia card still has its mid-range card heritage on its sleeve, its 2GB of memory, so the 3GB-based 7970 and 7950 can claim the larger memory buffer warrants the higher cost. So AMD’ll sit up at the high end of cost and have the same performance as the 2GB, slightly less expensive 680 while nVidia revels in being the slightly less expensive, but substantially more expensive than they’d ever dreamed of being able to get away with.

This lets them postpone the bigger, better, badder-ass versions of Kepler that were supposed to make this part look like a chump for six months. Then when AMD finally has something to throw out that’s actually progress performance forward… BAM, Kepler Prime shows up and blows it away. Parts stockpiled for months, paid for by the premium they tossed onto a part that was meant to be a 560 Ti ($250-ish) replacement.

Taking a card that is clearly designed around replacing the Compute-deficient 560 Ti line and then marking it up at twice the last card’s price is quite an achievement. Even more impressive, though, is getting everyone to think it’s a bargain. AMD failed when it got greedy; and nVidia has now continued that greed down the line.

If you think about it, the whole thing makes a lot of sense though. nVidia’s losing money on Tegra, which has yet to really take off in a very PROFITABLE way. AMD’s losing money hand over fist due to the crap that is Bulldozer (casting a dark shadow across all products set to inherit its updated core technology), the now ancient netbook APU they’re still selling mostly unchanged after over a year, and the delays on Trinity.

The scene was set for a milking. Consumers are now cattle and we’re getting milked with both hands, green and red.

This is all assuming that nVidia will “delay” the bigger Kepler; indications are more that it would never have been around in time to fight this particular battle anyway. Not that it matters, because they’re fighting it damn well on their own terms now. Funny how things can work out.

I think the problem here is less than AMD didn’t bring their “A” game, and more that their A game hasn’t been that exciting since the 4/5 series launches.

I really don’t see how AMD failed this time. First of all NVIDIA came with ONE graphic card and for a long time this is it. Everybody is talking that this is 560Ti replacement. Ok, let me see the highend Kepler!!!. Wait… there nothing else to launch for Nvidia. And if you think there are so many people who can afford to buy this, think again (by the way I’m living in Romania and here the GTX680 is more expensive ~500-550Euro so that means 700-750$). On the other hand AMD is having almost a full range and can play with prices if they want. Now Nvidia can change their gaming logo to “GTX680 the way was meant…”

Stick to the current generation. When Big Kepler GK110 turns up in October it will have to face off against AMD Sea Islands HD 8970 for this Christmas battle. Do not expect AMD to be sitting idle. Sea Islands is expected to be an improved architecture plus AMD will look to go for 400+ sq mm die because the 28nm process would be well understood and robust. Also the Big Kepler GK110 will be a compute heavy die cause Nvidia is talking about 2.5 – 3 times the Fermis (Tesla M2090) DP perf. So a lot of die size goes towards compute features like DP perf, ECC, wider memory subsystem required for compute apps (384 or 512 bit),newer features like virtual memory. These will significantly affect perf scaling in gaming scenarios compared to GTX 680. Moreover Nvidia needs to look at what specs it is manufacturable. The larger the die more the leakage, more difficult the yields. TSMC now has a finished wafers agreement at 28nm vs a good die agreement at 40nm. So Nvidia cannot afford to sell a product with rubbish yields as was the case with Fermi. Expectations are for a 30% gaming improvement. So hold on to your horses. I still feel Nvidia might come out on top given they are likely to go for a huge 550 sq mm die. But until we compare final products we can’t say.

Nvidia having the benefit of seeing what clocks AMD is using pushed their card above them…. this gives them the nod and that is fair enough.

the problem for Nvidia is poor scaling during overclocking so while their card will hit 1300+mhz bandwidth bottlencks hurt them noticeably at which point an AMD running 1250mhz can trade blows with the best GTX 680 has to offer while being closer in power efficiency.

I personally like the bad press AMD is getting because I hope it’ll force a $100 or better price adjustment especially when it looks like AMD’s HD 7970 will beat GTX 680 once factory overclocked to 1200mhz in outright performance at 2560 X 1600 given how well it does at 1920 X 1080.

going to wait a month and see how it all fleshes out, AMD came out early and Nvidia exploited it but while sort of impressive Nvidia took advantage of a lowered bar….. if AMD decides to compete using price then FANTASTIC, WOOHOO drop it $200 if you like I’ll take 2, or AMD could be smart and bump the official frequencies and let the inherent advantages they have do the work for them while adjusting their price to match Nvidia’s price dollar for dollar.

regarding power consumption AMD loses on power consumption by 40 watts which is a big win for Nvidia although I personally don’t believe it’s compelling given both cards are 40+ watts under the previous generation consuming GTX 570 levels of power….. even overclocked neither card will stress my $50.00 550watt Naxn psu.

p.s. it’s surprising to see AMD actually have a single gpu high end card that can compete favorably with Nvidia instead of pushing the dual gpu discussion.

p.s.s. I found the overclocking information at VR-Zone, TechPowerUp did some overclocking but they used fewer games & lower resolutions which don’t stress the video cards limiting the value.

both reviews have value as VR offers up more data and TPR offers up more info on the difficulties of pushing GTX 680.

wish the results used 2560 X 1600 and wonder if it’s pressure by an interested party that had those numbers omitted.

Not really impressed, I was expecting a lot more since Nvidia was talking trash when the 7970 was released..

Please read the RWT article on Kepler. It seems Nvidia has taken a u-turn with it’s design efforts with Kepler. I don’t think its fair to put down the AMD architecture and you’ll see why once you read that article. It also gives a good explanation why the Kepler core was weak in the OpenCL compute benchmarks. I don’t think AMD left anything on the table with Tahiti, it is a more balanced design than Cypress but it does lose to Kepler in GFX heavy workload due to that trade off.

I agree with this, the RWT article is very enlightening on explaining how Nvidia managed to achieve what they did. There seems to be a lot of reviews saying how “amazing” Kepler is, when in fact it’s actually a disappointing low cost high margin product that’s marketed as something that’s high end, which it clearly isn’t due to the massive simplification of the GPU’s compute capabilities.

A lot of people also seem to be claiming that Kepler has good performance per watt and per mm^2, which is true, but completely unsurprising. This is simply the 28nm equivalent to a HD5870; of course it’s going to have a high FLOPS throughput and good power efficiency, but we all know in real world scenarios when the work load requires more general dynamic loads and the drivers have not been optimised for these applications, the efficiency of Kepler will not be so good.

Despite all this, it’s hard to argue against the direction Nvidia has chosen, given the state of games today. They’re simply not taking advantage of the general compute capabilities of GPU’s and it doesn’t look like they will be in the near future. Suffice to say though, I do wish GPU’s could be used for more than just graphics and act as more of a co-processor to the CPU, it sort of feels like we’re going backwards again.

The GTX680 comes in at least 35% at minimum ahead of the former single core card king by Nvidia for one and half years, the GTX580, which cost $500+ and resides well over the $400 mark. For any company to release a card with the much higher performance range that the GTX680 exhibits not only at a lower price than their current champion but far below the competing design that it also beats is pure fantasy. Claiming the GTX680 is a mid range product at a top price merely outlines the enormous lead Nvidia holds over not just it’s last iteration, but also it’s competition AMD. AMD has lost miserably, has nothing to compete with the “top end” Nvidia product, and hasn’t matched the currrent GTX680, yet charges more. Your fantasy attack on Nvidia outlines your immense bias, and in reality is humiliation for AMD whose top card costs more, performs less, and is not their mid range product but the best they have to offer. If Nvidia released it’s top card you claim makes the GTX680 a mid range product( which beats the top end of AMD) they would have to charge at least $850 for it currently. Facing reality is something AMD fanboys should do quickly, instead of stirring the pot with twisted lies and falsely impudent “opinions”.

He does have one valid point, and with the wrong numbers still. The GTX 680 does in fact reliably beat the previous green champion GTX 580 in terms of FPS, but only by at least 20%, not 35%. The rest of his “logic” breaks down from there.

Are you guys reading the same review I was? Reading these comments before the actual review led me to expect a total Nvidia blowout, but that just isn’t the case at all. The 99th percentile frame times are slightly in the Radeons favour (which is the more important number, right?), and the traditional FPS numbers are very, very close. The Red’s have idle power consumption, by quite a bit under “display off”, while load numbers favour Green by a margin.

The two cards are very close performers, the only thing that really drags the Radeons down is their price. Quite frankly though they were overpriced to begin with, and after seeing how fast the 680’s have been sold out, and all the rumours of TSMC manufacturing issues, I highly doubt you’ll be able to get the 680 at launch prices for very long anyway. Maybe you’ll be lucky to get it at all…

Cool, I hope you get it