Home iOS & Swift Books Metal by Tutorials

Tile-Based Deferred Rendering Written by Caroline Begbie & Marius Horga

Heads up... You're reading this book for free, with parts of this chapter shown beyond this point as scrambled text.

You can unlock the rest of this book, and our entire catalogue of books and videos, with a raywenderlich.com Professional subscription.

Up to this point, you’ve treated the GPU as an immediate mode renderer (IMR) without referring much to Apple-specific hardware. In a straightforward render pass, you send vertices and textures to the GPU. The GPU processes the vertices in a vertex shader, rasterizes them into fragments and then the fragment shader assigns a color.

Immediate mode pipeline
Immediate mode pipeline

The GPU uses system memory to transfer resources between passes where you have multiple passes.

Immediate mode using system memory
Immediate mode using system memory

Since the A7 64-bit mobile chip, Apple began transitioning to a tile-based deferred rendering (TBDR) architecture. With the arrival of Apple Silicon on Macs, this transition is complete.

The TBDR GPU adds extra hardware to perform the primitive processing in a tiling stage. This process breaks up the screen into tiles and assigns the geometry from the vertex stage to a tile. It then forwards each tile to the rasterizer. Each tile is rendered into tile memory on the GPU and only written out to system memory when the frame completes.

TBDR pipeline
TBDR pipeline

Programmable Blending

Instead of writing the texture in one pass and reading it in the next pass, tile memory enables programmable blending. A fragment function can directly read color attachment textures in a single pass with programmable blending.

Programmable blending with memoryless textures
Programmable blending with memoryless textures

The G-buffer doesn’t have to transfer the temporary textures to system memory anymore. You mark these textures as memoryless, which keeps them on the fast GPU tile memory. You only write to slower system memory after you accumulate and blend the lighting. This speeds up rendering because you use less bandwidth.

Tiled Deferred Rendering

Confusingly, tiled deferred rendering can apply to the deferred rendering or shading technique as well as the name of an architecture. In this chapter, you’ll combine the deferred rendering G-buffer and Lighting pass from the previous chapter into one single render pass using the tile-based architecture.

The Starter Project

➤ In Xcode, open the starter project for this chapter.

The starter app
Xje mfinhul edb

GPU frame capture
VBI kvuke yuvzole

Starter app render passes
Zmeycak ihf dijtef cectig

1. Making the Textures Memoryless

➤ Open TiledDeferredRenderPass.swift. In resize(view:size:), change the storage mode for all four textures from storageMode: private to:

storageMode: .memoryless

2. Changing the Store Action

➤ Stay in TiledDeferredRenderPass.swift. In draw(commandBuffer:scene:uniforms:params:), find the for (index, texture) in textures.enumerated() loop and change attachment?.storeAction = .store to:

attachment?.storeAction = .dontCare

3. Removing the Fragment Textures

➤ In drawSunLight(renderEncoder:scene:params:), remove:

  index: BaseColor.index)
  index: NormalTexture.index)
  index: NormalTexture.index + 1)

4. Creating the New Fragment Functions

➤ Still in TiledDeferredRenderPass.swift, in init(view:), change the three pipeline state objects’ tiled: false parameters to:

tiled: true
texture2d<float> albedoTexture [[texture(BaseColor)]],
texture2d<float> normalTexture [[texture(NormalTexture)]],
texture2d<float> positionTexture [[texture(texture(NormalTexture + 1)]]
GBufferOut gBuffer
uint2 coord = uint2(in.position.xy);
float4 albedo = albedoTexture.read(coord);
float3 normal = normalTexture.read(coord).xyz;
float3 position = positionTexture.read(coord).xyz;
float4 albedo = gBuffer.albedo;
float3 normal = gBuffer.normal.xyz;
float3 position = gBuffer.position.xyz;
texture2d<float> normalTexture [[texture(NormalTexture)]],
texture2d<float> positionTexture [[texture(NormalTexture + 1)]],
GBufferOut gBuffer
uint2 coords = uint2(in.position.xy);
float3 normal = normalTexture.read(coords).xyz;
float3 position = positionTexture.read(coords).xyz;
float3 normal = gBuffer.normal.xyz;
float3 position = gBuffer.position.xyz;
Render pass descriptor color attachments
Kecmag zady gokcqehsof movoc iqrexmniqhh

5. Combining the Two Render Passes

➤ Open TiledDeferredRenderPass.swift. In draw(commandBuffer:scene:uniforms:params:), change let descriptor = MTLRenderPassDescriptor() to:

let descriptor = viewCurrentRenderPassDescriptor

// MARK: Lighting pass
// Set up Lighting descriptor
guard let renderEncoder =
    descriptor: viewCurrentRenderPassDescriptor) else {

6. Updating the Pipeline States

➤ Open Pipelines.swift. Add this code to createSunLightPSO(colorPixelFormat:tiled:) and createPointLightPSO(colorPixelFormat:tiled:) after setting colorAttachments[0].pixelFormat:

if tiled {
if tiled {
    = colorPixelFormat
A single render pass
E yevcsu yekluc zerr

The final render
Hvo disez tiymam

The final frame capture
Tnu wosef gkupu yabrugi

pointLights = Self.createPointLights(
  count: 40,
  min: [-6, 0.1, -6],
  max: [6, 1, 6])

Stencil Tests

The last step in completing your deferred rendering is to fix the sky. First, you’ll work on the Deferred render passes GBufferRenderPass and LightingRenderPass. Then you’ll work on the Tiled Deferred render pass as your challenge at the end of the chapter.

Stencil testing
Rwejqec feqyecn

A stencil texture
A sheqgaf gelnewi

Stencil Test Configuration

All fragments must pass both the depth and the stencil test that you configure to render.

1. The Comparison Function

When the rasterizer performs a stencil test, it compares a reference value with the value in the stencil texture using a comparison function. The reference value is zero by default, but you can change this in the render command encoder with setStencilReferenceValue(_:).

2. The Stencil Operation

Next, you set the stencil operations to perform on the stencil buffer. There are three possible results to configure:

3. The Read and Write Mask

There’s one more wrinkle. You can specify a read mask and a write mask. By default, these masks are 255 or 11111111 in binary. When you test a bit value against 1, the value doesn’t change.

Create the Stencil Texture

The stencil texture buffer is an extra 8-bit buffer attached to the depth texture buffer. You optionally configure it when you configure the depth buffer.

if !tiled {
    = .depth32Float_stencil8
    = .depth32Float_stencil8
depthTexture = Self.makeTexture(
  size: size,
  pixelFormat: .depth32Float_stencil8,
  label: "Depth and Stencil Texture")
descriptor?.stencilAttachment.texture = depthTexture
descriptor?.stencilAttachment.storeAction = .store
New stencil texture
Bip rqiwvor gokkibo

Configure the Stencil Operation

➤ Open GBufferRenderPass.swift, and add this new method:

static func buildDepthStencilState() -> MTLDepthStencilState? {
  let descriptor = MTLDepthStencilDescriptor()
  descriptor.depthCompareFunction = .less
  descriptor.isDepthWriteEnabled = true
  return Renderer.device.makeDepthStencilState(
    descriptor: descriptor)
let frontFaceStencil = MTLStencilDescriptor()
frontFaceStencil.stencilCompareFunction = .always
frontFaceStencil.stencilFailureOperation = .keep  
frontFaceStencil.depthFailureOperation = .keep
frontFaceStencil.depthStencilPassOperation = .incrementClamp  
descriptor.frontFaceStencil = frontFaceStencil
The ground is rendered in front of the trees and sometimes fails the depth test
Hku cdeihv ox yicqocig ic mxowl ek kgi gwuel uln tarevekiv zaokl zpu lugjx gabk

models = [treefir1, treefir2, treefir3, train, ground]
models = [ground, treefir1, treefir2, treefir3, train]
Ground renders first
Jlaabk zerlunf cedvr

1. Passing in the Depth/Stencil Texture

➤ Open LightingRenderPass.swift, and add a new texture property to LightingRenderPass:

weak var stencilTexture: MTLTexture?
descriptor?.stencilAttachment.texture = stencilTexture
lightingRenderPass.stencilTexture = gBufferRenderPass.depthTexture

2. Setting Up the Render Pass Descriptor

➤ Open LightingRenderPass.swift. At the top of draw(commandBuffer:scene:uniforms:params:), add:

descriptor?.depthAttachment.texture = stencilTexture
descriptor?.stencilAttachment.loadAction = .load
descriptor?.depthAttachment.loadAction = .dontCare

3. Changing the Pipeline State Objects

➤ Open Pipelines.swift.

if !tiled {
    = .depth32Float_stencil8
    = .depth32Float_stencil8
Stencil texture in frame capture
Dtecbur ranfuxi ax mseza secyonu

Masking the Sky

When you render the quad in LightingRenderPass, you want to bypass all fragments that are zero in the stencil buffer.

let frontFaceStencil = MTLStencilDescriptor()
frontFaceStencil.stencilCompareFunction = .equal
frontFaceStencil.stencilFailureOperation = .keep
frontFaceStencil.depthFailureOperation = .keep
frontFaceStencil.depthStencilPassOperation = .keep
descriptor.frontFaceStencil = frontFaceStencil
A deliberate mistake
O jubacebite ditkefu

frontFaceStencil.stencilCompareFunction = .notEqual
Clear blue skies
Xgoiy lxii mjier


You fixed the sky for your Deferred Rendering pass. Your challenge is now to fix it in the Tiled Deferred render pass. Here’s a hint: just follow the steps for the Deferred render pass. If you have difficulties, the project in this chapter’s challenge folder has the answers.

Key Points

  • Tile-based deferred rendering takes advantage of Apple’s special GPUs.
  • Keeping data in tile memory rather than transferring to system memory is much more efficient and uses less power.
  • Mark textures as memoryless to keep them in tile memory.
  • While textures are in tile memory, combine render passes where possible.
  • Stencil tests let you set up masks where only fragments that pass your tests render.
  • When a fragment renders, the rasterizer performs your stencil operation and places the result in the stencil buffer. With this stencil buffer, you control which parts of your image renders.

Where to Go From Here?

Tile-based Deferred Rendering is an excellent solution for having many lights in a scene. You can optimize further by creating culled light lists per tile so that you don’t render any lights further back in the scene that aren’t necessary. Apple’s Modern Rendering with Metal 2019 video will help you understand how to do this. The video also points out when to use various rendering technologies.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

© 2022 Razeware LLC

You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a raywenderlich.com Professional subscription.

Unlock Now

To highlight or take notes, you’ll need to own this book in a subscription or purchased by itself.