Architecture
The speed of the pipelines is determined by the speed of the slowest stage called bottleneck. The fatest is called starved, because it needs to wait for other stages to be done.
A coarse division of the real-time rendering pipeline into four main stages: application, geometry processing, rasterization, and pixel processing.
Application Stage
Running on CPUs.
But some application work can be performed by the GPU, using a separate mode called a compute shader. This mode treats the GPU as a highly parallel general processor, ignoring its special functionality meant specifically for rendering graphics.
Some of the tasks traditionally performed on the CPU include collision detection, global acceleration algorithms, animation, physics simulation, and many others, depending on the type of application.
The application stage is also the place to take care of input from other sources, such as the keyboard, the mouse, or a head-mounted display.
Acceleration algorithms, such as particular culling algorithms (Chapter 19), are also implemented here, along with whatever else the rest of the pipeline cannot handle.
Output: Rendering Primitives, i.e., points, lines, and triangles, that might eventually end up on the screen.
Geometry Processing
Running on GPUs.
The geometry processing stage on the GPU is responsible for most of the per-triangle and per-vertex operations.
This stage is further divided into the following functional stages: vertex shading, projection, optional vertex processing, clipping, and screen mapping
Vertex Shading
Two main tasks:
computing the position for a vertex, also called MVP
evaluate whatever the programmer may like to have as vertex output data (such as a normal and texture coordinates)
The operation of determining the effect of a light on a material is known as shading. Shading may be performed on a model’s vertices or during per-pixel processing.
A variety of material data can be stored at each vertex, such as the point’s location, a normal, a color.
Projection
Projection is expressed as a matrix.
orthographic(parallel) projection
Transforming a rectangular box to a unit cube, and parallel lines remain parallel after the transform.perspective projection
Perspective Frustum(截头视椎体)
Then follows Normalized Device Coordinates(NDC, 坐标归一化).
Both orthographic and perspective transforms can be constructed with 4 × 4 matrices.
They are called projections because after display, the z-coordinate is not stored in the image generated but is stored in a z-buffer. In this way, the models are projected from three to two dimensions.
Optional Vertex Processing
Their use depends both on the capabilities of the hardware—not all GPUs have them—and the desires of the programmer.
Tessellation(图元细分)
The camera for the scene can be used to determine how many triangles are generated: many when the patch is close, few when it is far away.Geometry Shader
It can be used for particle generation(粒子生成).
Imagine simulating a fireworks explosion. Each fireball could be represented by a point, a single vertex. The geometry shader can take each point and turn it into a square (made of two triangles) that faces the viewer and covers several pixels, so providing a more convincing primitive for us to shade.Stream Output
It can be used for particle simulations(粒子模拟).
This stage lets us use the GPU as a geometry engine. Instead of sending our processed vertices down the rest of the pipeline to be rendered to the screen, at this point we can optionally output these to an array for further processing. These data can be used by the CPU, or the GPU itself, in a later pass.
These three stages are performed in this order—tessellation, geometry shading, and stream output—and each is optional.
Clipping
Only the primitives wholly or partially inside the view volume need to be passed on to subsequent stage.
The primitives that are partially inside the view volume require clipping. And some new vertices will be located at the intersection between the line and the view volume.
The advantage of performing the view transformation and projection before clipping is that it makes the clipping problem consistent; primitives are always clipped against the unit cube.
Finally, perspective division(division by w) is performed, which places the resulting triangles’ positions into three-dimensional normalized device coordinates.
Screen Mapping
The coordinates are still three-dimensional when entering this stage.
The x- and y-coordinates of each primitive are transformed to form screen coordinates. Screen coordinates together with the z-coordinates are also called window coordinates.
The last step in the geometry stage is to convert from NDC to window coordinates.
Rasterization
Rasterization, also called scan conversion, is thus the conversion from two-dimensional vertices in screen space—each with a z-value (depth value) and various shading information asso- ciated with each vertex—into pixels on the screen.
Triangle Setup
In this stage the differentials, edge equations, and other data for the triangle are computed. Fixed- function hardware is used for this task.
Triangle Traversal
Finding which samples or pixels are inside a triangle is often called triangle traversal.
Pixel Processing
At this point, all the pixels that are considered inside a triangle or other primitive have been found as a consequence of the combination of all the previous stages.
Pixel processing is the stage where per-pixel or per-sample computations and operations are performed on pixels or samples that are inside a primitive.
In computer graphics, a sample is an intersection of channel and a pixel.
Pixel Shading
The end product is a color value for each fragment. And this stage is executed by programmable GPU cores. The programmer supplies a program for the fragment shader, which can contain any desired computations.
Texturing is employed here.
Merging
Combine the fragment color produced by the pixel shading stage with the color currently stored in the buffer.
Unlike the shading stage, the GPU subunit that performs this stage is typically not fully programmable. However, it is highly configurable, enabling various effects.
This stage is also responsible for resolving visibility.
Alpha Test
Stencil Buffer
Frame Buffer
It generally consists of all the buffers on a systemDepth Test
It is done with the z-buffer algorithm. However, z-buffer algorithm cannot be used for partially transparent primitives. These must be rendered after all opaque primitives, and in back-to-front order, or using a separate order-independent algorithm (Section 5.5). Transparency is one of the major weaknesses of the basic z-buffer.Double Buffer