Optimizing fragment shaders¶

Depending on the underlying GPU architecture, the GPU can execute many stages of rendering, such as vertex processing, fragment processing, and memory reading in parallel for each draw call. The draw call waits until Kanzi finishes processing all the fragments. If the fragment shader executes slower than the vertex shader, or other stages, the other stages need to wait for the fragment shader execution to complete.

You can optimize fragment shaders by:

Decreasing the precision of fragment shaders. See Decreasing the precision of a fragment shader.
Using the vertex shader to calculate the values that stay constant and are calculated only a few times. See Moving calculations from pixel to vertex.

Decreasing the precision of a fragment shader¶

If fragment shading is a performance bottleneck, you can decrease the precision of the shader from full 32-bit floating point to 16-bit floating point. Many GPUs support operations at double rate when you decrease the precision.

To decrease the precision of a fragment shader:

In the Library, select Resource Files > Shaders and open the fragment shader whose precision you want to decrease.

The Shader Source Editor opens.

Declare variables with the appropriate precision qualifier:

Precision qualifier	Range	Data type	Examples of use
`lowp`	$(-2,2)$	8-bit decimal. Modern GPUs do not have fixed point shader units and map this to 16-bit floating point. When you are not sure which precision to choose, prefer `mediump` over `lowp`.	Colors in the RGB data range [0..1] and intensities in the range [0..1].
`mediump`	$(-2^{15},2^{15})$	16-bit floating point with 1 sign bit, 5 bits of exponent, and 10 bits of mantissa. Some GPUs map this to 32-bit floating point.	Colors, normals, textures, positions in object space.
`highp`	$(-2^{128},2^{128})$	32-bit floating point with 1 sign bit, 8 bits of exponent, and 23 bits of mantissa.	Matrices, large textures, positions in world space.

Precision qualifier

Range

Data type

Examples of use

lowp

$(-2,2)$

8-bit decimal.

Modern GPUs do not have fixed point shader units and map this to 16-bit floating point. When you are not sure which precision to choose, prefer mediump over lowp.

Colors in the RGB data range [0..1] and intensities in the range [0..1].

mediump

$(-2^{15},2^{15})$

16-bit floating point with 1 sign bit, 5 bits of exponent, and 10 bits of mantissa.

Some GPUs map this to 32-bit floating point.

Colors, normals, textures, positions in object space.

highp

$(-2^{128},2^{128})$

32-bit floating point with 1 sign bit, 8 bits of exponent, and 23 bits of mantissa.

Matrices, large textures, positions in world space.

For example, for a fragment shader, use this code:

uniform sampler2D Texture;
// lowp and mediump often double the performance compared to highp.
uniform lowp float BlendIntensity;
varying mediump vec2 vTexCoord;

void main()
{
    // lowp and mediump often double the performance compared to highp.
    precision lowp float;

    vec4 color = texture2D(Texture, vTexCoord);
    gl_FragColor.rgba = color.rgba * BlendIntensity;
}

      
    

Do not use this code:

// Default to highp for everything.
precision highp float;
uniform sampler2D Texture;
uniform float BlendIntensity;
varying vec2 vTexCoord;

void main()
{

    vec4 color = texture2D(Texture, vTexCoord);
    gl_FragColor.rgba = color.rgba * BlendIntensity;
}

      
    

Moving calculations from pixel to vertex¶

Use the vertex shader to calculate the values that stay constant and are calculated only a few times. Do similarly for lighting calculations that can interpolate results from one vertex to another without losing too much quality, because most often the vertex coverage is much smaller than fragment coverage (except with highly dense geometry). For deferred shading, mobile GPU hardware stores the varyings to memory. It is recommended to recalculate simple calculations in a fragment shader.

For example, for a vertex shader use this code:

attribute vec3 kzPosition;
attribute vec2 kzTextureCoordinate0;
uniform highp mat4 kzProjectionCameraWorldMatrix;
uniform mediump float kzTime;

varying mediump vec2 vTexCoord;
varying lowp vec4 vColor;

void main()
{
    precision mediump float;
    // Trigonometric operation is only performed for each vertex, for example,
    // for quad 3 * 2 times (2 triangles containing 3 vertices each)
    vColor = vec4(sin(kzTime));
    gl_Position = kzProjectionCameraWorldMatrix * vec4(kzPosition.xyz, 1.0);
}

      
    

For example, for a fragment shader use this code:

varying lowp vec4 vColor;

void main()
{
    precision lowp float;

    // For each written fragment, constant interpolated assignment with same
    // precision (lowp -> lowp) is applied. This should not be longer than
    // one cycle on most GPUs.
    gl_FragColor.rgba = vColor;
}

      
    

For example, do not use this code for a vertex shader:

attribute vec3 kzPosition;
uniform highp mat4 kzProjectionCameraWorldMatrix;

void main()
{
    precision mediump float;
    // Vertex shader outputs the position and calculates in fragment shader,
    // which is not a good idea when the number of fragments exceed that of vertices.
    gl_Position = kzProjectionCameraWorldMatrix * vec4(kzPosition.xyz, 1.0);
}

      
    

For example, do not use this code for a fragment shader:

uniform mediump float kzTime;

void main()
{
    precision lowp float;

    // For each written fragment, the additional trigonometric function sin()
    // is executed. Trigonometric functions are expensive - depending on GPU,
    // several GPU cycles per fragment. Effectively the outcome is the same
    // as when storing the result to varying.
    gl_FragColor.rgba = vec4(sin(kzTime));
}

      
    

Optimizing fragment shaders¶

Decreasing the precision of a fragment shader¶

Moving calculations from pixel to vertex¶

See also¶