Those who have been following this blog know that I wrote an article about implementing a Light Pre-Pass renderer last year. Since then I have made numerous improvements and fixes as I have tried the system over different PC configurations.
This time around I will be including those changes into the implementation, and also releasing sample code for educational use. If you feel you have learned enough from the previous article, then feel free to skip this. For those new to the topic, please use this article instead of the older one.
This article will focus on the theory, and will attempt to be as API neutral as possible. I will be making some references to XNA SurfaceFormats as this article is aimed more towards XNA developers, but for reference here are the matching Direct3D formats:
See the sample for an XNA implementation.
For a number of years now, the concept of Deferred Rendering has been a hot topic in games. The ability to have a large number of lights, with a low cost per light is an attractive option. However deferred rendering suffers from the inability to render transparent objects at the same time as opaque objects. A few “solutions” have appeared in recent years that approach deferred rendering in a different way to allow for some level of transparency, however none have solved the problem completely, and all require extra work in some form.
There are a few different types of Deferred Rendering, however the two key forms are Deferred Shading, and Deferred Lighting. These differ in exactly which stage of rendering is deferred.
Deferred Shading defers the entire shading and lighting process, handling geometry rendering once, and then using an Uber Shader to light and shade the objects. One of the key negatives here is that you are limited in your material selection – although there are a few alternative solutions that can help here. Some other negatives include the “Fat Framebuffer” that this technique requires. As Deferred Shading relies on Multiple Render Targets (MRTs) to render the geometry once and store all important details, we are hit with both a memory cost, as well as limitations in DirectX9 which prevent us from applying multi-sampling (AA) to MRTs.
Negatives aside, Deferred Shading allows us to get the benefit of many lights, and still only render the scene once, which can be good for performance in scenes where the triangle count is high, and you do not need a diverse set of materials.
Deferred Lighting on the other hand defers only the lighting step, and requires an extra pass over the geometry to apply the details. The benefit here is that you can apply a unique material to each of your objects, whilst retaining the ability to have many lights. This technique also makes use of MRTs, however it only uses 2 render targets during the Depth + Normals stage, so if you can afford another pass over geometry, you can split those into two passes and gain the ability to run code on older hardware that does not support MRTs.
Deferred Lighting Overview
Deferred Lighting consists of 3 key stages:
- Render the Depth and World Space Normals of the Geometry
- Render the lighting term using the Depth + Normals generated previously
- Render the Geometry again, this time with unique materials, making use of the lighting term above.
Remember that if you retain your hardware depth stencil, you get a performance increase from the early-z tests that modern hardware provides.
One key optimisation that is used in Deferred Shading is to render the lights using bounding volumes, ie a Sphere for a Point Light. This prevents overdraw for areas that are not lit.
I will be following the steps mentioned above, with the aforementioned optimisation, for the rest of this article.
Stage 1: Depth + Normals
The key requirement for any lighting calculation is fragment position, and normal. This means that if we want to determine the contribution a light makes to a particular pixel, we need to know where in world space that pixel is, and the normal at that pixel.
As we are deferring the lighting calculations, we need to store this information in a format we can use later. Now to store a position, we could go the naive route and store one component per channel, i.e. R=X, G=Y and B=Z, however this only gives us 8 bits of precision for each component, and each component is a full 32bit floating point number.
Instead we store the depth of the pixel and later reconstruct the position using the depth, and information from the camera.
Unfortunately, as we are working with XNA, we are restricted to D3D9 technology, and even then cannot use the FOURCC driver hacks that modern GPUs provide. This means that we have to write out the depth buffer to a separate render target instead of making use of the depth buffer the graphics card always creates.
So begin by preparing two render targets, the same size as the backbuffer you are going to draw to in the end. Make the depth target use SurfaceFormat.Single, and the Normals target use SurfaceFormat.Bgra1010102.
Of course when implementing this in a game for Windows, you should always verify that the SurfaceFormats are supported, and in the case of the depth buffer you can fall back to HalfSingle if Single is not supported. The Normal buffer can fall back to SurfaceFormat.Color if the 1010102 format is not supported, and you can always take the depth buffer back to Color if you absolutely need to, however note that you will need to add in extra packing code to make use of the full buffer then.
After that, we need to render the visible scene using a special shader that writes the Depth and Normal information to our render targets.
There are two key things to note here:
- If you decide to subtract the true depth from 1.0f, ensure you reverse that later on.
- You need to remap the Normal from [-1, 1] to [0, 1] so that it stores correctly in the render target
To accomplish #2, you simply need to use the following equation:
output.Normal = (( input.Normal + 1.0f ) / 2.0f )
Later on we will return this value to the [-1, 1] range by using the reverse of the above equation.
Stage 2: Lighting
Now that we have the Depth and Normal for each pixel, we can calculate the lighting from every light that touches each pixel. If you plan to include some form of shadow mapping, this is where you would handle rendering the shadow maps, and make use of them when lighting the object.
You can approach lighting in two ways:
- Render every type of light as a fullscreen quad.
- Render each light as an appropriate light volume [Hargreaves04]
We will focus on #2, which allows us to reduce pixel draw and improve performance dramatically.
Although I will only be implementing a Directional Light in the sample, you can use the following primitives to render each type of light:
|Ambient Light||Fullscreen Quad|
|Directional Light||Fullscreen Quad|
|Spot Light||Cone or Box|
As you can see, the two light types that do not have a position or volume, are still rendered as fullscreen quads, whilst the other two use approximations of their light volumes to prevent overdraw
In the case of the ambient light, you may choose to implement this as an extra term in your material shader instead, however I choose to implement it as a light object so that I may alter/combine/disable at will.
Here you need to set a new render target, which will hold our Light Buffer. Once that is done, render each visible light volume using a special shader based on the light type.
This is where the magic happens. Inside these special shaders, we reconstruct the world space position of the pixel from the depth value, and using the position and the normal, calculate the lighting term as you would during forward rendering.
There are a couple of ways you can reconstruct the position from the depth value. The most common and obvious way would be to use the X/Y value of the pixel in Post-Projection Space, combined with the depth as Z, and multiply that by the inverse of the ViewProjection matrix. This will give you the position of the pixel in World space, which you can use.
There are other techniques, some which end up cheaper than the matrix multiply mentioned above. For a full overview of most (if not all) of the techniques available for you to use, simply read through these two articles:
In the sample I chose to use the Frustum Ray method, which is a cheaper way to get the position, but requires you to do some processing on the CPU to get the rays ready. In the sample this is calculated every time the camera parameters are set, however realistically you only need to do this any time your bounding frustum changes or moves.
Our light buffer, which is a SurfaceFormat.Color, can take four 8bit integers. The original, and (to my knowledge) recommended layout (from Wolfgang Engel) is to store the N.L * Color terms in the RGB channels, and the specular term in the Alpha channel. This leaves you with the following layout:
|A||R.V (or N.H for Blinn-Phong)|
|R||N.L * Red|
|G||N.L * Green|
|B||N.L * Blue|
At this point you can also take into account the shadow map for each light, and use that to choose whether this light should illuminate or do nothing to the pixel. Now when rendering the lights, we need to set a couple of Render States to ensure things go smoothly.
First of all, we need to ensure that the depth buffer is disabled. This has to happen because we may need to render lights behind/inside other lights, and if the depth buffer is enabled, the graphics card will reject the contribution from some of those lights via the Early-Z feature.
Since we are using geometry to render the lighting, we need to handle the case when the camera is inside the light volume. Here we change the CullMode render state so that it either culls interior faces when the camera is outside the object, or it culls exterior faces when the camera is inside the object. Normally this would be set to Counter-Clockwise, which remains our default when outside the object.
Simply detect when the camera is inside the object and set the CullMode to Clockwise to handle the other case.
Another issue you may notice with this is when the camera is entering a light volume, especially the point light sphere. At this point, part of the screen is inside the object, and the rest is outside, so we need to set CullMode to None here to handle both cases without creating visual glitches.
Finally we need to enable alpha blending and set the blend mode to Add, this allows multiple lights that illuminate the same pixel to combine correctly (see Phong Shading).
Do not forget to re-enable the Depth Buffer and revert the CullMode to Counter-Clockwise before continuing, otherwise you might encounter hard to debug issues when rendering the materials.
Stage 3: Materials
Once the light buffer has been filled with all of the visible lights, you can move on to rendering the final image using each object’s material. First you need to make sure that all of your Render States are back to normal after the changes made during the Lighting stage.
To render the materials, you need to re-render every object that was drawn during the Depth + Normals stage, however this time around each object can use its own shader to draw itself, and can take the Light buffer as a texture, which can then be used as the lighting term when shading.
There are numerous effects that can be used to shade an object. In the sample I will only provide a Blinn-Phong shader, however as long as you understand the basic concepts of lighting in modern graphics, you can adapt the lighting values into any other technique that can use them.
This is where I find the big benefits of this technique appear. By allowing the object to choose its own shader/material, you can regain the diverse range of materials that forward rendering allows, providing your artists with plenty of possibilities. Not only that, but you do not have to manage a monster Uber Shader, balancing it against potentially restrictive instruction/texture count limits. (Depending on your Shader Model)
The sample will show how to implement a Blinn-Phong lighting model, with a textured object.
On the other hand, the big negative of this technique appears here as well. You now have to re-render every object a second time. Depending on your scene, this could be a reasonable performance hit. So ensure you weigh up both options when choosing your renderer for your game, and consider what you will need when choosing.
Transparency and Anti-Aliasing have traditionally been a major problem when it comes to Deferred Rendering. The need to render a Depth buffer has prevented the ability to do any reasonable form of transparency aside from the basic Invisible/Opaque states. There are a few solutions to this issue, although none solve it perfectly.
- Render your transparent objects using traditional forward rendering, with a limited number of lights, and composite the transparency scene into your deferred scene.
- Use stippling to alternate between layers on a high resolution render target. This is used in the recent articles on Inferred Rendering.
- Use Order Independent Transparency/Depth Peeling, which can be fairly expensive. (See DX SDK Feb 2010 Samples)
I would recommend using option #1 and constructing your scenes so that the only transparent objects are particles, which can be lit with a small number of lights for little to no visual impact.
Anti-Aliasing has also been a problem for Deferred Renderers, primarily because DirectX 9 does not allow for AA with multiple render targets. DirectX 10 allows this, and fixes the issue, however since my target here is XNA, this is not possible.
Traditionally the AA issue was solved by using an Edge Detection filter + Blur to soften the edges of objects, this is relatively cheap and can be implemented as a post processing effect.
However for those interested in taking advantage of the AA modes of modern graphics cards, with very crisp edges, LPP provides a way to have your cake and eat it. (With a small cost)
As there are only 2 Render Targets needed during the Depth + Normals pass, and one at a time after that, you can split that pass up and render the scene an extra time, this time with AA enabled for each step. Of course if you have high scene complexity this might be extremely expensive and not worth it, but the option is there, and it is up to you and the needs of your game.
You now have a backbuffer (or Render Target) filled with your shaded scene, ready for further post processing, or presentation to the user. By using this technique you gain the ability to have thousands of lights in your scene at once, which allows level designers and artists to really let loose with their creativity.
Simply having this sheer number of lights also opens up other possibilities for lighting, including fake indirect lighting, a technique that as not been easy to do dynamically. You can assign point lights to every particle in your scene, and really make your worlds much more vibrant.
Also remember that you have a Depth and Normal buffer available for free (you needed it for LPP anyway) for use in post processing, which allows you to add other techniques like SSAO to your game.
Deferred Rendering seems to be the "cool thing” in game graphics today, and with the Compute Shader/OpenCL becoming readily available, some of these techniques can be adapted to make use of the general purpose capabilities of modern graphics cards.
Whilst there are plenty of benefits, there are also negatives to using this technique, and you really need to think about what you want out of your renderer before making your final decision. There have been plenty of debates in the graphics world about whether Deferred Rendering is better than Forward Rendering, and I am not going to go into one of those today.
For more information about Deferred Lighting/Light Pre Pass, you can check out the following books/websites:
- Section8, Chapter 5; ShaderX7, Wolfgang Engel, Course Technology [ISBN: 1584505982]
- http://www.bungie.net/images/Inside/publications/siggraph/Engel/LightPrePass.ppt (SIGGRAPH 2009)
I have written a sample implementation of the Deferred Lighting/Light Pre Pass renderer using XNA 3.1. Feel free to learn from this implementation, and if you have any questions, just ask in the comments.
The code provided in the sample is for Academic Use Only, and cannot be copied into a Commercial program.
This sample shows off 6 directional lights around a single model taken from the DirectX SDK. Feel free to add more lights as you please. I have only implemented Directional lights for this sample, to keep things simple. I will provide information on implementing other lights soon, however in the meanwhile feel free to look at my previous (now obsolete) article for a point light implementation, which should be similar.
You can download the sample here.
1 : Show Depth Buffer
2 : Show Normal Buffer
3 : Show Lights Buffer
4 : Show Material Buffer
W : Move Forward
S : Move Backward
A : Strafe Left
D : Strafe Right
Left Arrow : Rotate Left
Right Arrow : Rotate Right