Optimizing fragment shaders

Depending on the underlying GPU architecture, the GPU can execute many stages of rendering, such as vertex processing, fragment processing, and memory reading in parallel for each draw call. Draw call waits until all the fragments are processed. If the fragment shader executes slower than vertex shader, or other stages, the other stages need to wait for fragment shader execution to complete.

You can optimize fragment shaders by:

Decreasing the precision of a fragment shader

If fragment shading is a performance bottleneck, a decrease in precision from two cycles to one cycle improves GPU performance by half.

To decrease the precision of a pixel shader:

  1. In the Library select Resource Files > Shaders and open the fragment shader the precision of which you want to decrease.
    The Shader Source Editor window opens.
  2. For precision use the appropriate value range:
    • lowp for data such as colors (RGB data range [0..1]) and intensities range [0..1], but not, for example, for texture coordinates, which need more accurate precision. lowp supports range [-2..2] and contains 8-bit decimal precision.
    • mediump for most of the rendering, matrices need more accurate precision because the floating point values are relatively small.
    • highp contains accurate representation for 3D rendering, including matrices.

    For example, for a fragment shader use this code:
    uniform sampler2D Texture;
    uniform lowp float BlendIntensity;
    varying mediump vec2 vTexCoord;
    
    void main()
    {
        precision lowp float;
    
        vec4 color = texture2D(Texture, vTexCoord);
        gl_FragColor.rgba = color.rgba * BlendIntensity;
    }

    Do not use this code:
    uniform sampler2D Texture;
    uniform mediump float BlendIntensity;  // In comparison to lowp, mediump doubles the number of cycles
    varying mediump vec2 vTexCoord;
    
    void main()
    {
        precision mediump float;  // In comparison to lowp, mediump doubles the number of cycles
    
        vec4 color = texture2D(Texture, vTexCoord);
        gl_FragColor.rgba = color.rgba * BlendIntensity;
    }

Moving calculations from pixel to vertex

Use the vertex shader to calculate the values that stay constant and are calculated only a few times. Do similarly for lighting calculations that can interpolate results from one vertex to another without losing too much quality, because most often the vertex coverage is much smaller than fragment coverage (except with highly dense geometry).

For example, for a vertex shader use this code:

attribute vec3 kzPosition;
attribute vec2 kzTextureCoordinate0;
uniform highp mat4 kzProjectionCameraWorldMatrix;
uniform mediump float kzTime;

varying mediump vec2 vTexCoord;
varying lowp vec4 vColor;

void main()
{
    precision mediump float;
    // Trigonometric operation is only performed for each vertex, for example,
    // for quad 3 * 2 times (2 triangles containing 3 vertices each)
    vColor = vec4(sin(kzTime));
    gl_Position = kzProjectionCameraWorldMatrix * vec4(kzPosition.xyz, 1.0);
}

For example, for a fragment shader use this code:

varying lowp vec4 vColor;

void main()
{
    precision lowp float;

    // For each written fragment, constant interpolated assignment with same
    // precision (lowp -> lowp) is applied. This should not be longer than
    // one cycle on most GPUs.
    gl_FragColor.rgba = vColor;
}

For example, do not use this code for a vertex shader:

attribute vec3 kzPosition;
uniform highp mat4 kzProjectionCameraWorldMatrix;

void main()
{
    precision mediump float;
    // Vertex shader outputs the position and calculates in fragment shader,
    // which is not a good idea when the number of fragments exceed that of vertices.
    gl_Position = kzProjectionCameraWorldMatrix * vec4(kzPosition.xyz, 1.0);
}

For example, do not use this code for a fragment shader:

uniform mediump float kzTime;

void main()
{
    precision lowp float;

    // For each written fragment, the additional trigonometric function sin()
    // is executed. Trigonometric functions are expensive - depending on GPU,
    // several GPU cycles per fragment. Effectively the outcome is the same
    // as when storing the result to varying.
    gl_FragColor.rgba = vec4(sin(kzTime));
}

See also

Reducing shader switches

Using binary shaders

Loading resources in parallel

Shaders best practices

Troubleshooting the performance of your application

Best practices