|
NVIDIA GT200 and GT300
GT200 was based upon G80, though not without significant improvements. The
total number of shader pipelines was increased from 128 to 240 which were
gathered into 10 clusters with 3 subclasters each. Like G92, it featured 8
texture load/store units per cluster and supported PCI Express v2.0. Like G80,
it relied upon NVIO for output interfaces. Every register unit of GT200
operated with a double size register file (64Kb) which allowed to improve
performance on multiple threads and long shaders. There are 3 subclusters per
cluster, so the 1st level texture cache has got a size increase from 16Kb to
24Kb per cluster (240Kb in total). Although the total number of texture
filtering units didn't change, their performance was also improved
significantly. Last but not least, there were 8 memory channels 64-bit each
(512 bits in total) which implied 8 raster partitions with 4 ROPs each (32 ROPs
in total). Such a wide memory interface was introduced in AMD/ATI R600, though
the following top performance graphprocessors by AMD/ATI featured less
complicated 256-bit memory designs.
GT200 was supposed to enter the market in November of 2007, but the actual
appearance was made only in June of 2008. Nevetheless, it was a very impressive
design counting 1.4 milliard transistors which possessed a die size of awesome
576mm² under a 65nm TSMC technological process. No need to mention
probably that it was a very expensive thing in manufacturing means, so a 55nm
die shrink called GT200b or GT206 appeared in January of 2009 with a die size
of 470mm². It was also a bit late, the original schedule mentioned
something about August of 2008. A 40nm version called GT200c or GT212 had never
been produced. Neither GT200 nor any of its family members supported the
complete DirectX 10.1 feature set (Shader Model 4.1). The company had got
issues with other 40nm designs which were not nearly as complicated as GT200.
In particular, GT214 was turned over for another development cycle and had seen
the release as GT215. GT216 and GT218 were delayed several times. It seems
obvious that NVIDIA has got real problems with 40nm, but instead of solving
them as soon as possible they have placed a large bet on GT300, another
monstrous design. A big mistake? Time will tell. In general, GT200 based cards
appeared to be expensive power hungry devices (GeForce GTX 280 was advertised
with a 650USD initial target price and 236W TDP), and they delayed on the
market for about half a year. So far, GT300 seems to follow the bad luck of
GT200.
There isn't much to say about GT300 (also advertised as GF100) when it
comes to architecture and technology as these things are kept pretty much
confidential, but some information has surfaced. There are 512 pipelines
gathered into 4 clusters now called graphics processing clusters, and every
such a cluster is subdivided for 4 subclusters still called streaming
multiprocessors. So, there are 128 pipelines per cluster and 32 pipelines per
subcluster. Every subcluster is accommodated with 2 warp schedulers, 2 dispatch
units, 4 special functions units, 16 texture load/store units, a register unit
with a large 128Kb register file and so on. There are 64Kb of local memory per
cluster which may be user configured as 16Kb of the 1st level cache memory
(hardware managed) and 48Kb of shared memory (software managed) or vice versa.
As it has been mentioned before, G80 and GT200 have 16Kb of shared memory per
cluster and no true 1st level cache memory. In general, a single cluster of
GT300 is more advanced than of either G80 or GT200. It has been announced that
GT300 will have the 2nd level cache memory of 768Kb; to be correct, there will
be 128Kb of such cache per memory channel. G80 and GT200 also have the 2nd
level cache memory of either 32Kb or 64Kb per memory channel respectively, but
keep in mind that their shader pipelines cannot make any use of it. GT300 can
grant access to the 2nd level cache for both texture units and shader pipelines
in read/write mode. About the memory interface, first rumours have mentioned
that it's going to be 512 bits wide, but now we can be sure that there will be
6 memory channels 64-bit each (384 bits in total) like in G80. The primary
memory type for GT300 will be GDDR5 SDRAM as opposed to G80 and GT200 which
relied upon GDDR3 SDRAM. While ECC implementation for the register file and
caches is going to be regular SEC/DED, NVIDIA have developed a proprietary ECC
algorithm for memory protection: there will be no additional data lines and
memory chips installed for this purpose as checksums will be stored in reserved
portions of regular video memory. GT300 will support the IEEE 754-2008 standard
for floating point calculations instead of the older IEEE 754-1985, though
there isn't much difference. GT300 also seems to have a hardware tesselating
logic as a part of the PolyMorph engine. What's even more interesting, there
are expected to be as many tesselating units as clusters. Finally, GT300 will
make use of NVIO just like G80 and GT200.
|
|
GT300 is expected to consist of 3 milliard transistors with a die size at
530mm² given a 40nm TSMC technological process. Considering price of a
single 300mm wafer at TSMC for this process at 5000USD to 6000USD and the die
size above, GT300 is going to be much more expensive than a 55nm GT200b. If to
consider additionally the amount of resources spent on development of GT300 and
the Fermi architecture as well as low manufacturing yields and delays to the
market, then it makes a tough job for NVIDIA to generate any reasonable profit
out of this project. About release schedule, it has been planned originally for
GT300 to be supplied in quantity to OEMs in Q3 2009. It's got moved for Q4 2009
after some serious design and manufacturing issues kept strictly confidential.
Anyway, GT300 has failed totally to hit the Christmas and New Year sales, and
the release schedule has been postponed once again to Q1 2010. The latest
rumours tell that it's going to happen in March of 2010, so let's see.
|
|
(click to enlarge, 514Kb)
|
Conclusions
It's diffucult to make any statements on performance of GT300. In a matter
of fact, it depends mostly upon positives and negatives of the Fermi
architecture as well as real clock speed of the GT300 shader domain. The latter
is expected to be between 1.5GHz to 2.0GHz, though probably closer to the lower
limit rather than higher. Anyway, let's suppose that actual performance will be
between 1600 to 3000 gigaFLOPS on single precision floating point or from 800
to 1500 gigaFLOPS on double precision floating point. That's very impressive
because GT200 based Tesla C1060 with its shader domain at 1.3GHz can do 933
gigaFLOPS single precision or 78 gigaFLOPS double precision. The primary
competitor, 40nm RV870 (Cypress) based Radeon HD5870 by AMD/ATI, delivers 2720
gigaFLOPS single precision or 544 gigaFLOPS double precision at 850MHz. The
current AMD/ATI top performance product for scientific calculations, 55nm RV790
based FireStream 9270, can do 1200 gigaFLOPS single precision or 240 gigaFLOPS
double precision at 750MHz. It seems apparent that the next RV870 based
FireStream will be at least two times faster than model 9270. It is also
apparent that future GT300 based products won't gain any serious advantage over
RV870 based products in single precision performance but will prevail
significantly in double precision. The primary conclusion is that when it
comes to computer gaming, GT300 and RV870 will be pretty much equal in means of
performance, but GT300 will be preferred for scientific calculations.
Actual prices, power consumption, support quality and so on may adjust the
decision here and there as it usually is.
(continued; 28-Mar-2010)
So, GT300 (or GF100) has hit the market officially on the 26th of March.
There are two cards released by NVIDIA through their partners, GeForce GTX470
and GeForce GTX480. What's the most interesting thing is that both of them
are based upon GT300 with some units disabled. It seems NVIDIA has faced
really poor manufacturing yields, but it's hardly possible for them to delay
their Fermi based products any further, so they have made the decision. Well,
it's the first time for NVIDIA to release a top performance product with
masked units. A very unpopular move which may cost NVIDIA some reputation. It
seems they have simply got no other choice: something is better than nothing
at all. In means of competition, GeForce GTX470 is supposed to be an
alternative to Radeon HD5850, and GeForce GTX480 is going to hurt Radeon
HD5870 sales. See the table below for the cards' specifications. Note that
execution units of NVIDIA and AMD/ATI graphprocessors cannot be compared by
real numbers due to a very different architecture, so both real and approximate
effective numbers are shown for AMD/ATI products.
|
|
NVIDIA GeForce GTX470
|
NVIDIA GeForce GTX480
|
AMD/ATI Radeon HD5850
|
AMD/ATI Radeon HD5870
|
|
Graphprocessor
|
GT300 (GF100)
|
GT300 (GF100)
|
RV870
|
RV870
|
|
Clock speed (core logic)
|
607MHz
|
700MHz
|
725MHz
|
850MHz
|
|
Clock speed (shader pipelines)
|
1215MHz
|
1400MHz
|
725MHz
|
850MHz
|
|
TMUs
|
56
|
60
|
18 (72 effective)
|
20 (80 effective)
|
|
ROPs
|
40
|
48
|
8 (32 effective)
|
8 (32 effective)
|
|
Shader pipelines
|
448
|
480
|
288 (1440 effective)
|
320 (1600 effective)
|
|
Clock speed (memory)1
|
3350MHz
|
3700MHz
|
4000MHz
|
4800MHz
|
|
Memory bus width
|
320-bit
|
384-bit
|
256-bit
|
256-bit
|
|
Memory size
|
1280Mb
|
1536Mb
|
1024Mb / 2048Mb
|
1024Mb / 2048Mb
|
|
Memory type
|
GDDR5 SDRAM
|
GDDR5 SDRAM
|
GDDR5 SDRAM
|
GDDR5 SDRAM
|
|
TDP2
|
215W
|
250W
|
151W
|
188W
|
|
Idle power consumption
|
~50W
|
~50W
|
~30W
|
~30W
|
MSRP
|
350USD
|
500USD
|
300USD
|
400USD
|
1 — effective data transfer speed of GDDR5 SDRAM;
2 — real world peak power consumption may be higher.
As you may have guessed already, GeForce GTX470 is powered by GT300 with
2 subclusters disabled, so minus 64 shader pipelines and 8 TMUs. The memory
bus width is only 320-bit, hence one 64-bit memory controller is disabled
together with 8 ROPs and 128Kb of the 2nd level cache memory. Considering low
clock speeds, especially the 1.2GHz shader domain, this video card isn't going
to fly sky high. On the other hand, NVIDIA will be able to satisfy market
demand on GeForce GTX470 cards even with poor manufacturing yields. GT300 chips
for GTX480 come with 1 subcluster disabled. That's not good, but there are
other things to worry about. First of all, the shader domain clocked at 1.4GHz
isn't what most people including myself have expected from this top performance
product. Although I haven't been too optimistic, but I've expected for it to
cross a 1.5GHz boundary at least. Another important issue is power consumption.
I'm not sure what to do with those 250W of TDP reported by NVIDIA. There is
some information that real world peak power consumption of GTX480 is 300W to
320W, and the card gets very hot even when running at default clock speeds:
its core temperatures are well over 90°C consistently. Finally, here comes
the money question. Radeon HD5870 is priced at 400USD for a 1Gb version, it's
available on the market for 6 months, it's less power hungry and about as fast
as GeForce GTX480 (plus or minus 10% here and there don't make much difference)
while the latter is priced at 500USD. Frankly speaking, it doesn't make any
sense.
And one more thing. The only serious advantage GT300 could have over RV870
is outstanding double precision floating point performance. However, this way
GT300 based GeForce cards would be highly competitive agaist GT300 based Tesla
cards on some markets. Keep in mind that superior double precision performance
is of very little to zero importance when playing computer games, encoding or
decoding video streams, etc. So, NVIDIA have reduced double precision floating
point performance of GT300 based GeForce cards by 4 times through a software
lock probably (136 gigaFLOPS for GTX470 and 168 gigaFLOPS for GTX480). It's
unclear whether this is a temporary solution or not, but currently GeForce
GTX470 and GTX480 are much slower in double precision than Radeon HD5850 and
HD5870 respectively. Those who need superior double precision performance are
kindly advised by NVIDIA to purchase Tesla C2050 or C2070 cards. The first one
comes with 3Gb of 384-bit 3600MHz GDDR5 SDRAM (2.625Gb available with ECC
enabled), the second one — with 6Gb of 384-bit 4000MHz GDDR5 SDRAM
(5.250Gb available with ECC enabled). Both of them are powered by GT300 with 2
subclusters disabled (minus 64 shader pipelines). These cards can do double
precision floating point calculations at 560 and 628 gigaFLOPS respectively,
and single precision floating point calculations — at 1120 and 1256
gigaFLOPS respectively. NVIDIA wants 2500USD for C2050 and 4000USD for C2070.
|