CPU vs GPU - Other Technologies
This is a discussion on CPU vs GPU - Other Technologies ; "Skybuck Flying" <nospam@hotmail.com> writes:
> "Minotaur" <antnel@hotmail.com> wrote in message
> news:40a8b13d$0$1588$afc38c87@news.optusnet.com.au...
> > Skybuck Flying wrote:
> >
> > > "joe smith" <rapu@ra73727uashduashfh.org> wrote in message
> > > How does the GPU do that ?
> >
...
-
-
-
Re: CPU vs GPU
Skybuck Flying wrote:
> Well
>
> I just spent some time reading on about differences and similarities between
> CPU's and GPU's and how they work together =D
>
> Performing graphics requires assloads of bandwidth ! Like many GByte/sec.
>
you have NO idea.
> Current CPU's can only do 2 GB/sec which is too slow.
>
in most cases, though, this is more than enough. there are
memory-intensive applications that need faster memory to keep from
stalling the CPU, but this is mostly a limitation of the FrontSide Bus
and memory modules. the faster and bigger memory gets, the more
expensive it is.
> GPU's are becoming more generic. GPU's could store data inside their own
> memory. GPU's in the future can also have conditional jumps/program flow
> control, etc.
>
> So GPU's are staring to look more and more like CPU's.
>
inasmuch that GPUs now perform pixel-specific calculations on texture
and video information. GPUs now also perform a lot of vector processing
work, which helps immensely in such tasks as model animation and the like.
> It seems like Intel and AMD are a little bit falling behind when it comes to
> bandwidth with the main memory.
>
> Maybe that's a result of Memory wars
Like Vram, Dram, DDR, DRRII,
> Rambit(?) and god knows what =D
>
no. those are different memory technologies that evolved to keep up
with the growing demand for faster memory. keep in mind CPUs are only
10x as fast as memory now, and that gap is shrinking FAST. (due to
increasing popularity of DDR and the introduction of DDR-II, which, by
the way, is very fast.)
> Though intel and amd probably have a little bit more experience with making
> generic cpu's are maybe lot's of people have left and joined GPU's lol.
>
> Or maybe amd and intel people are getting old and are going to retire soon
> <- braindrain 
>
i haven't the slightest clue how you got this inane idea. AMD and intel
aren't going anywhere except UP. CPUs are where the Operating System
run, and they are where programs run. everything in a computer,
including its peripherals, depend on the CPU and its support hardware.
software, which depends on the CPU has the capability to offload work to
specialized hardware - which ATi, nVidia, Creative, 3D Labs, et. al.
manufacture. work can even be sent to the sound card for
post-processing and special effects! (this is audio processing that can
NOT be done on the CPU no matter how bad you want it to.)
> However AMD and INTEL have always done their best to keep things
> COMPATIBLE... and that is where ATI and NVidia fail horribly it seems.
>
ATi and nVidia are competing to set standards. what they are trying to
do is not create hardware that will work like other hardware. they are
trying to make hardware that is not only better in performance, but
better architecturally with more features and capabilities. all thanks
to the demands of the gaming industry.
you're not making an "apples-to-apples" comparison. you're comparing
two unrelated industries. things don't work that way.
nVidia and ATi write driver software to control their cards and pass
information from software to hardware. their *DRIVERS* provide two
interfaces to the hardware: DirectX and OpenGL. that is all the
standardisation nVidia, ATi, et. al. need. you can write an OpenGL game
and expect it to work on both brands of cards, for example.
> My TNT2 was only 5 years old and it now can't play some games lol =D
>
that's because it doesn't have the capabilities that games expect these
days. games expect the hardware to handle all of the texturing,
lighting, and transformations, and rarely do it in software.
> There is something called Riva Tuner... it has NVxx emulation... maybe that
> works with my TNT2... I haven't tried it yet.
>
if memory serves, it activates a driver option. that emulation is in
nVidia's drivers. it's hidden and disabled by default but was made
available for developer testing.
> The greatest asset of GPU's is probably that they deliver a whole graphical
> architecture with it... though opengl and directx have that as well... these
> gpu stuff explain how to do like vertex and pixel shading and all the other
> stuff around that level.
>
again you've got it backwards. the GPU *PROVIDES* the graphical
services *THROUGH* OpenGL and DirectX. but in addition to this, OpenGL
and DirectX themselves provide a /separate/ software-only
implementation. that is a requirement of both standards. anything that
is not done in hardware must be done in software. *BY THE DRIVER.*
in addition, vertex shading is a convenience. nothing more. but it
happens to be a *FAST* convenience.
pixel shading can *only* be done in hardware. it is a technology that
requires so much computational power that ONLY a GPU can provide it. it
would take entirely too much CPU work to do all that.
> Though games still have to make sure to reduce the number of triangles that
> need to be drawn... with bsp's, view frustum clipping, backface culling,
> portal engines, and other things. Those can still be done fastest with
> cpu's, since gpu's dont support/have it.
>
you again have it wrong.
game engines use those techniques to reduce the amount of data that they
send to the GPU. vertices still have to be sent to the GPU before they
are drawn - and sent again for each frame. to speed THAT process up,
game engines use culling techniques so they have less data to send. not
so the GPU has less work to do.
> So my estimate would be:
>
> 1024x768x4 bytes * 70 Hz = 220.200.960 bytes per second = exactly 210
> MB/sec
>
> So as long as a programmer can simply draw to a frame buffer and have it
> flipped to the graphics card this will work out just nicely...
>
that's the way it was done before GPUs had many capabilities - like an
old TNT or the like.
but in those days, it was ALL done one pixel at a time.
> So far no need for XX GB/Sec.
>
> Ofcourse the triangles still have to be drawn....
>
for a triangle to be drawn, first the vertexes must be transformed.
this invoves a good deal of trigonometry to map a 3D point to a dot on
the screen. second, the vertices are connected and the area is filled
with a color. or:
1. vertexes transformed (rememebr "T&L"?)
2. area bounded among the vertices
3. texture processing is performed (if necessary. this is done by pixel
shaders now, so it's all on the GPU. without pixel shaders, this is
done entirely on the CPU.) this includes lighting, coloring (if
applicable), or rendering (in the case of environment maps).
4. the texture map is transformed to be mapped to the polygon.
5. the texture map is then drawn on the polygon.
in an even more extreme case, we deal with the Z-buffer (or the DirectX
W-buffer), vertex transformations (with vertex shaders), screen
clipping, and screen post-processing (like glow, fades, blurring,
anti-aliasing, etc.)
> Take a beast like Doom III.
>
> How many triangles does it have at any given time...
>
> Thx to bsp's, (possible portal engines), view frustum clipping, etc...
>
> Doom III will only need to drawn maybe 4000 to maybe 10000 triangles for any
> given time. ( It could be more... I'll find out how many triangles later
>
)
>
> Maybe even more than that...
>
as you found out very quickly, it draws a LOT of polygons. how many did
Doom use? there were about 7 or 8 on the screen, on average. more for
complex maps. the monsters? single-polygon "imposters" (to use current
terminology).
> But I am beginning to see where the problem is.
>
> Suppose a player is 'zoomed' in or standing close to a wall...
>
> Then it doesn't really matter how many triangles have to be drawn....
>
> Even if only 2 triangles have to be drawn... the problem is as follows:
>
> All the pixels inside the triangles have to be interpolated...
>
> And apperently even interpolated pixels have to be shaded etc...
>
> Which makes me wonder if these shading calculations can be interpolated...
> maybe that would be faster 
>
> But that's probably not possible otherwise it would already exist ?!
Or
> somebody has to come up with a smart way to interpolate the shading etc for
> the pixels 
>
during the texture shading step, that's irrelevant. most textures are
square - 256x256, 512x512, 1024x1024, etc. (there are even 3D textures,
but i won't go into that.) when those textures are shaded for lighting,
all of those pixels must be processed before the texture can be used.
pixel shaders change that and enable us to do those calculations at
run-time.
i won't explain how, since i'm not entirely sure. i haven't had the
chance to use them yet. they may be screen-based or texture-based. i
don't know. maybe both. i'll find out one of these days.
> So now the problem is that:
>
> 1024x768 pixels have to be shaded... = 786.432 pixels !
>
> That's a lot of pixels to shade !
>
really. you figured this out. how nice. imagine doing that on the CPU.
on second thought, go back up and re-read what i said about pixel
shaders being done in software.
> There are only 2 normals needed I think... for each triangle... and maybe
> with some smart code... each pixel can now have it's own normal.. or maybe
> each pixel needs it's own normal... how does bump mapping work at this point
> ?
>
normals are only really required for texture orientation and lighting.
with software lighting, the texture is usually processed and then sent
to the GPU before the level begins. some games still do that. some use
vertex lighting which gives a polygon lighting based on the strength of
light at each vertex - rather than at each pixel within the polygon. as
you can imagine, that's quite ugly.
hardware lighting (implied by "T&L") gives that job to the GPU. it
allows the GPU to perform lighting calculations accurately across a
polygon while the CPU focuses on more important things. (you might see
this called "dynamic lighting".)
pixel shaders allow for even more amazing dynamic effects with lights in
real-time.
now, about bump-mapping. well, it is what its name implies. it is a
single-color texture map that represents height data. it is processed
based on the position of lights relative to the polygon it's mapped to.
it adds a lot to the realism of a game.
there are numerous algorithms for this, each with its advantages and
disadvantages. nVidia introduced something cool with one of the TNTs
(or maybe GeForce. my history is rusty.) called "register combiners".
this allows developers to do lots of fancy texture tricks like bump
mapping on the GPU.
the basic idea is that light levels are calculated based on wether the
bump map is rising or falling in the direction fo the light. if you
want to know more, there are a lot of tutorials out there.
> In any case let's assume the code has to work with 24 bytes for a normal.
> (x,y,z in 64 bit floating point ).
>
it's not. 32-bit floats are more commonly used because of the speed
advantage. CPUs are quite slow when it comes to floating-point math.
(compared to GPUs or plain integer math.)
> The color is also in r,g,b,a in 64 bit floating point another 32 bytes for
> color.
>
> Maybe some other color has to be mixed together I ll give it another 32
> bytes...
>
> Well maybe some other things so let's round it at 100 bytes per pixel 
>
now you're way off. the actual data involves 16 bytes per vertex (the
fourth float is usually 0.0), with usually 3 vertices per polygon, and a
plane normal (another 4-part vertex, sometimes not stored at all), with
texture coordinates. that's sent to the GPU in a display list.
the GPU performs all remaining calculations itself and /creates/ pixel
data that it places in the frame buffer. the frame buffer is then sent
to the monitor by way of a RAMDAC.
<snip misconceived calculation>
>
> So that's roughly 5.1 GB/sec that has to move through any processor just to
> do my insane lighting per pixel 
>
assuming a situation that doesn't exist in game development on current CPUs.
> Ofcourse doom III or my insane game... uses a million fricking verteces (3d
> points) plus some more stuff.
>
> vertex x,y,z,
> vertex normal x,y,z
> vertex color r,g,b,a
>
> So let's say another insane 100 bytes per vertex.
>
let's not.
> 1 Million verteces * 100 bytes * 70 hz = 7.000.000.000
>
> Which is rougly another 7 GB/sec for rotating, translating, storing the
> verteces etc.
>
no. you're still assuming the video card stores vertices with 64-bit
precision internally. it doesn't. 32-bit is more common. 16-bit and
24-bit is also used on the GPU itself to varying degrees.
> So that's a lot of data moving through any processor/memory !
>
it would be, but it's not.
> I still think that if AMD or Intel is smart... though will increase the
> bandwidth with main memory... so it reaches the Terabyte age 
>
you're misleading yourself now. it's not Intel or AMD's responsibility.
> And I think these graphic cards will stop existing
just like windows
> graphic accelerator cards stopped existing...
>
they still exist. what do you think your TNT2 does? it accelerates
graphics. on windows. imagine that.
> And then things will be back to normal =D
>
> Just do everything via software on a generic processor <- must easier I hope
> =D
>
you still seem terribly confused about what even is stored in GPU memory.
the vertex data is actually quite small and isn't stored in video memory
for very long.
the reason video card susually come with so much memory is for TEXTURE,
FRAME BUFFER, and AUXILLIARY BUFFER storage.
the frame buffer is what you see on the screen.
the Z buffer is the buffer that keeps track of the vertex distance from
the screen. this is so it can sort the polygons and display them correctly.
auxilliary buffers can be a lot of things: stencil buffers are used for
shadow volume techniques, for example.
textures take up the majority of that storage space. a single 256x256
texture takes up 262,144 bytes. a 512x512 texture takes up 1MB.
1024x1024 is 4MB. textures as large as 4096x4096 are possible (though
not common) - that's 64MB.
and what of 3D textures? let's take a 64x64x64 texture. small, right?
that's 1MB all on its own.
so how big is the frame buffer? well, if there's only one, that's just
fine. but DirectX supports triple-buffering and OpenGL supports
double-buffering. that means 2 or 3 frames are stored at once. they
are flipped to the screen through the RAMDAC.
and not only must the GPU store and keep track of all that data, but it
PROCESSES IT in real-time with each frame.
your proposal requires we go back to the 200+ instructions per pixel
that games once required. do you expect us to go back to mode 13h where
that kind of computation is still feasible with the same kind of graphic
quality we have now?
for us as developers, the GPU is a Godsend. it has saved us from doing
a lot of work ourselves. it has allowed us to expand our engines to the
point where we can do in real-time that once required vast
super-computer clusters to do over the course of MONTHS. the GeForce 4
(as i recall) rendered Final Fantasy: The Spirits Within at 0.4 frames
per second. it took several MONTHS to render the final movie on CPUs.
one final time: CPUs are general purpose. they are meant to do a lot
of things equally well. GPUs are specialized. they are meant to do one
thing and do it damned well. drivers for those GPUs are written to make
developers' lives easier, and let developers do wat is otherwise impossible.
now, i'll close because lightning may strike the power grid at any
moment after the rain passes.
--
-- Charles Banas
-
-
Re: CPU vs GPU
> the basic idea is that light levels are calculated based on wether the
> bump map is rising or falling in the direction fo the light. if you
> want to know more, there are a lot of tutorials out there.
The bumpmapping is commonly done using dot3 or dot4, which returns cosine of
the angle between two vectors.. namely surface normal and vector from
surface normal to the light source. What is stored in the normal map are the
normal vectors for the surface. It is common practise novadays compute the
normal map from higher precision geometry which is then discarded and the
resulting normal map is mapped into lower resolution trimesh.
Just want to clarify this bit so that it is more obvious that the sourcedata
from which the normal map is generated is not input to the GPU dp3/dp4.
To make matters more interesting, the normal map can be in tangent or model
coordinates. In model coordinates it is more "straightforward" as the light
emitters/sources can be transformed to model coordinates and then the
interpolated normal map samples can be used directly for lighting
computation. This approach lends itself poorly to animated data, say, a
skinned character because when trimesh is skinned the vertices move relative
to model coordinate system. Therefore the normal map is inaccurately stored
; the solution is to store the normal map in "tangent space", which is space
relative to 3x3 transformation matrix at each vertex of primitives where the
normal map is, uh, mapped. Only two basis vectors are often stored because
it is possible to synthesize the third with cross product from other two.
*phew*, so the input for the GPU is actually:
for ps:
- texture coordinate for normal map
- sampler for normal map
- light vector in tangent space
- tangent space transformation
I played around with optimization to the default way a while ago that I
never stored light vector in tangent space, but rather embedded the
transformation to tangent space transformation.. yessir, it worked - but
problem: it was more hassle than it was worth, because this way then needed
multiple tangent space transformation, one for each lightsource and
generally it was more efficient (no shit!) to pass on the light vector (and
other information like attenuations) separately instead. Just want to warn
anyone from this shitty approach. It is pretty useless, but I wanted to
document my error, the point is that when in doubt, try! In that sense the
original poster IMHO is close to the truth..
A lot of design choises are based on information and experience, but
sometimes just trying shit is the only way to "click" new information into
the existing framework of experience.. if That Guy is interested in GPU
programming (why else would he hang around here asking these questions?) I
recommend he tries some GPU programming, it's easy & fun.. and get
impressive results really easily. 
MUCH easier than software rendering days, much... and things work a hell of
a lot better "out of the box" novadays than just 4-5 years ago! Biggest
problem is getting started.. maybe he'll ask about that.. if not.. I assume
he is already doing whatever he likes to do..
-
-
Re: CPU vs GPU
"Stephen H. Westin" <westin*nospam@graphics.cornell.edu> wrote in message
news:s0isevt3j5.fsf@diesel.graphics.cornell.edu...
> "Skybuck Flying" <nospam@hotmail.com> writes:
>
> > Hi,
> >
> > Today's CPUs are in the 3000 to 4000 MHZ clock speed range.
> >
> > Today's GPU's are in the 250 to 600 MHZ clock speed range.
> >
> > I am wondering if GPU's are holding back game graphics performance
because
> > their clock speed is so low ?
>
> Au contraire. GPU's may run at lower clock speeds, but they compute
> far faster than CPU's. And that speed is increasing much faster than
> for CPU's. Because of this, many are working on ways to implement
> traditional CPU computations on GPU's. See, for example,
> <http://www.gpgpu.org/>. The advantage of the GPU is that it doesn't
> have to implement the traditional von Neumann model of a single
> processor executing sequentially from a large memory that can all be
> accessed at the same speed and that also contains data.
Are you suggesting or anybody else... that we just give up on cpu's and
start using gpu's ?
And just throw away 30 years of assembler/compiler/development environments
and software libraries ?
Than rather increase cpu performance so that all this 30 year old proven and
tested stuff can still be used and even run faster ? =D
Bye,
Skybuck.
-
Re: CPU vs GPU
"Skybuck Flying" <nospam@hotmail.com> writes:
> "Stephen H. Westin" <westin*nospam@graphics.cornell.edu> wrote in message
> news:s0isevt3j5.fsf@diesel.graphics.cornell.edu...
> > "Skybuck Flying" <nospam@hotmail.com> writes:
> >
> > > Hi,
> > >
> > > Today's CPUs are in the 3000 to 4000 MHZ clock speed range.
> > >
> > > Today's GPU's are in the 250 to 600 MHZ clock speed range.
> > >
> > > I am wondering if GPU's are holding back game graphics performance
> because
> > > their clock speed is so low ?
> >
> > Au contraire. GPU's may run at lower clock speeds, but they compute
> > far faster than CPU's. And that speed is increasing much faster than
> > for CPU's. Because of this, many are working on ways to implement
> > traditional CPU computations on GPU's. See, for example,
> > <http://www.gpgpu.org/>. The advantage of the GPU is that it doesn't
> > have to implement the traditional von Neumann model of a single
> > processor executing sequentially from a large memory that can all be
> > accessed at the same speed and that also contains data.
>
> Are you suggesting or anybody else... that we just give up on cpu's and
> start using gpu's ?
No. But you were suggesting giving up on GPU's because you thought
they were slower, and developing more slowly, than CPU's. I was giving
concrete evidence that the opposite is true: GPU's are faster, and
developing faster, so people have an incentive to go the other
direction.
> And just throw away 30 years of assembler/compiler/development environments
> and software libraries ?
That would be over 50 years, actually. And GPU's use compilers,
too. An alumnus from this program visited a couple of weeks ago and
showed us nVidia's IDE for their just-announced chip.
> Than rather increase cpu performance so that all this 30 year old proven and
> tested stuff can still be used and even run faster ? =D
Whoops, I got caught again, troller.
--
-Stephen H. Westin
Any information or opinions in this message are mine: they do not
represent the position of Cornell University or any of its sponsors.
-
Re: CPU vs GPU
Skybuck Flying wrote:
> Are you suggesting or anybody else... that we just give up on cpu's and
> start using gpu's ?
Yes - iff (if and only if) you have a problem to compute, that fits well
to a GPU.
In former times you had two major options, if you had a problem to compute:
1) you could have taken general prupose CPUs or
2) build an individual ASIC (e.g. neural network processors, systolic
arrays...)
Option 1 is a mass-product, relatively cheap and flexible (can be used,
even if your problem is solved for a different problem).
Option 2 gives you a higly optimized (and therefore faster) solution,
but that may be incapable of solving different problems.
Today GPUs are a mass-product, but they have structures, that are better
optimized for a specific problem than an CPU. So they _may_ be a
compromise between the two mentioned options.
> And just throw away 30 years of assembler/compiler/development environments
> and software libraries ?
Is your world only black and white?
Ralf
-
Re: CPU vs GPU
"Stephen H. Westin" <westin*nospam@graphics.cornell.edu> wrote in message
news:s0y8nprcqa.fsf@diesel.graphics.cornell.edu...
> "Skybuck Flying" <nospam@hotmail.com> writes:
>
> > "Stephen H. Westin" <westin*nospam@graphics.cornell.edu> wrote in
message
> > news:s0isevt3j5.fsf@diesel.graphics.cornell.edu...
> > > "Skybuck Flying" <nospam@hotmail.com> writes:
> > >
> > > > Hi,
> > > >
> > > > Today's CPUs are in the 3000 to 4000 MHZ clock speed range.
> > > >
> > > > Today's GPU's are in the 250 to 600 MHZ clock speed range.
> > > >
> > > > I am wondering if GPU's are holding back game graphics performance
> > because
> > > > their clock speed is so low ?
> > >
> > > Au contraire. GPU's may run at lower clock speeds, but they compute
> > > far faster than CPU's. And that speed is increasing much faster than
> > > for CPU's. Because of this, many are working on ways to implement
> > > traditional CPU computations on GPU's. See, for example,
> > > <http://www.gpgpu.org/>. The advantage of the GPU is that it doesn't
> > > have to implement the traditional von Neumann model of a single
> > > processor executing sequentially from a large memory that can all be
> > > accessed at the same speed and that also contains data.
> >
> > Are you suggesting or anybody else... that we just give up on cpu's and
> > start using gpu's ?
>
> No. But you were suggesting giving up on GPU's because you thought
> they were slower, and developing more slowly, than CPU's. I was giving
> concrete evidence that the opposite is true: GPU's are faster, and
> developing faster, so people have an incentive to go the other
> direction.
>
> > And just throw away 30 years of assembler/compiler/development
environments
> > and software libraries ?
>
> That would be over 50 years, actually. And GPU's use compilers,
> too. An alumnus from this program visited a couple of weeks ago and
> showed us nVidia's IDE for their just-announced chip.
Seems like re-inventing the wheel to me dude =D
How long is it going to take before programming for gpu's face the same
problems as cpu programmers...
out of range errors, buffer overflows, exceptions etc... complexity
management.
Seems like the same old shit all over again =D
And it's (ughhhhly) C only

Where is my delphi compiler for gpu's ? 
I would say 10% trolling and 90% I have a point
:P =D
>
> > Than rather increase cpu performance so that all this 30 year old proven
and
> > tested stuff can still be used and even run faster ? =D
>
> Whoops, I got caught again, troller.
>
> --
> -Stephen H. Westin
> Any information or opinions in this message are mine: they do not
> represent the position of Cornell University or any of its sponsors.