Modern Unification

Welcome to the modern era. All of the examples in this book are designed on and for this era of hardware, though some of them could run on older ones with some alteration. The release of the Radeon HD 2000 and GeForce 8000 series cards in 2006 represented unification in more ways than one.

With the prior generations, fragment hardware had certain platform-specific peculiarities. While the API kinks were mostly ironed out with the development of proper shading languages, there were still differences in the behavior of hardware. While 4 dependent texture accesses were sufficient for most applications, naive use of shading languages could get you in trouble on ATI hardware.

With this generation, neither side really offered any real functionality difference. There are still differences between the hardware lines, and certainly in terms of performance. But the functionality differences have never been more blurred than they were with this revision.

Another form of unification was that both NVIDIA and ATI moved to a unified shader architecture. In all prior generations, fragment shaders and vertex shaders were fundamentally different hardware. Even when they started doing the same kinds of things, such as accessing textures, they were both using different physical hardware to do so. This led to some inefficiencies.

Deferred rendering probably gives the most explicit illustration of the problem. The first pass, the creation of the g-buffers, is a very vertex-shader-intensive activity. While the fragment shader can be somewhat complex, doing several texture fetches to compute various material parameters, the vertex shader is where much of the real work is done. Lots of vertices come through the shader, and if there are any complex transformations, they will happen here.

The second pass is a very fragment shader intensive pass. Each light layer is comprised of exactly 4 vertices. Vertices that can be provided directly in clip-space. From then on, the fragment shader is what is being worked. It performs all of the complex lighting calculations necessary for the various rendering techniques. Four vertices generate literally millions of fragments, depending on the rendering resolution.

In prior hardware generations, in the first pass, there would be fragment shaders going to waste, as they would process fragments faster than the vertex shaders could deliver triangles. In the second pass, the reverse happens, only even moreso. Four vertex shader executions, and then all of those vertex shaders would be completely useless. All of those parallel computational units would go to waste.

Both NVIDIA and ATI devised hardware such that the computational elements were separated from their particular kind of computations. All shader hardware could be used for vertices, fragments, or geometry shaders (new in this generation). This would be changed on demand, based on the resource load. This makes deferred rendering in particular much more efficient; the second pass is able to use almost all of the available shader resources for lighting operations.

This unified shader approach also means that every shader stage has essentially the same capabilities. The standard for the maximum texture count is 16, which is plenty enough for doing just about anything. This is applied equally to all shader types, so vertex shaders have the same number of textures available as fragment shaders.

This smoothed out a great many things. Shaders gained quite a few new features. Uniform buffers became available. Shaders could perform computations directly on integer values. Unlike every generation before, all of these features were parceled out to all types of shaders equally.

Along with unified shaders came a long list of various and sundry improvements to non-shader hardware. These include, but are not limited to:

Various other limitations were expanded as well.