Thursday, January 12, 2012

Parallel draw-update XNA game component

Video games are often demanding when it comes to computation power, especially simulations and 3d games. The Xbox 360 has 3 cores with two hardware threads each. Of these 2 are reserved for the system, leaving 4 available to the programmer.
This article describes an easy to use custom XNA game component that allows to run computations in parallel with rendering.

The typical game loop is often divided in two steps, update and draw. I suggest a slightly different workflow divided in three steps, two of which run in parallel: update, compute and draw. The traditional update stage is divided in two new steps, update and compute.

In this new setting, update is used for receiving inputs from the player and the network, in the case of an online multiplayer game. Compute is potentially cpu-hungry operation implemented in a purely functional way. Rendering plays the same role as usual.

The code for the entire component fits in little over 100 lines, see below.
namespace CleverRake.XnaUtils

open Microsoft.Xna.Framework
open System.Threading

type IFramePerSecondCounter =
    abstract FramesPerSecond : float

/// An update-able and drawable game component which performs light updates on the main
/// thread, then draws on a separate thread in parallel of more computation-heavy updates.
/// initialize_fun is called when assets are loaded.
/// dispose is called when the component is disposed, and should be used to unload assets.
/// update_fun takes a GameTime and a state, and should produce draw data and computation
/// data.
/// draw_fun takes a game time and draw data and should return nothing.
/// compute_fun takes a game time and computation data and should return a new state.
type ParallelUpdateDrawGameComponent<'State, 'DrawData, 'ComputationData>
    (game,
     initial_state : 'State,
     initialize_fun : unit -> unit,
     update_fun : GameTime -> 'State -> 'DrawData * 'ComputationData,
     compute_fun : GameTime -> 'ComputationData -> 'State,
     draw_fun : GameTime -> 'DrawData -> unit,
     dispose : unit -> unit) =

    inherit DrawableGameComponent(game)

    let mutable state = initial_state
    let mutable draw_data = Unchecked.defaultof<'DrawData>
    let mutable compute_data = Unchecked.defaultof<'ComputationData>
    let mutable gt_shared = GameTime()

    let mutable enabled = true
    let mutable update_order = 0

    let signal_start = new AutoResetEvent(false)
    let mutable kill_requested = false
    let signal_done = new AutoResetEvent(false)

    let do_compute() =
#if XBOX360
        // Set affinity
        // 0 and 2 are reserved, I assume the "main" thread is 1.
        Thread.CurrentThread.SetProcessorAffinity(3)
#endif
        while not kill_requested do
            signal_start.WaitOne() |> ignore
            state <- compute_fun gt_shared compute_data
            signal_done.Set() |> ignore

    let compute_thread = new Thread(new ThreadStart(do_compute))

    // Must be called from the main thread.
    let post_compute_then_draw gt =
        if not kill_requested then
            let state = state
            gt_shared <- gt
            signal_start.Set() |> ignore
            draw_fun gt draw_data
            signal_done.WaitOne() |> ignore

    let mutable frameCounter = 0
    let mutable timeCounter = 0.0
    let mutable fps = 0.0
    let fpsUpdatePeriod = 0.3

    do
        compute_thread.IsBackground <- true
        compute_thread.Start()

    override this.Initialize() =
        base.Initialize()
        initialize_fun()

    override this.Update(gt) =
        if base.Enabled then
            base.Update(gt)
            let draw, compute = update_fun gt state
            draw_data <- draw
            compute_data <- compute

    override this.Draw(gt) =
        if base.Visible then
            base.Draw(gt)
            post_compute_then_draw gt
        else
            state <- compute_fun gt compute_data
        timeCounter <- timeCounter + gt.ElapsedGameTime.TotalSeconds
        frameCounter <- frameCounter + 1
        if timeCounter > fpsUpdatePeriod then
            fps <- float frameCounter / timeCounter
            timeCounter <- timeCounter - fpsUpdatePeriod
            frameCounter <- 0

    
    interface System.IDisposable with
        member this.Dispose() =
            base.Dispose()
            dispose()
            signal_start.Dispose()
            signal_done.Dispose()

    interface IFramePerSecondCounter with
        member this.FramesPerSecond = fps

    member this.FramesPerSecond = fps

    member this.RequestKill() =
        kill_requested <- false
        signal_start.Set() |> ignore

    member this.WaitUntilDead() =
        compute_thread.Join()

    member this.IsDead = not(compute_thread.IsAlive)

To be noted is that this component isn't meant to be extended the usual way which consists in inheriting and overriding. I'm experimenting with staying away from traditional OOP. I'm not sure it's the nicest way to do things, but building a wrapper from which to inherit shouldn't be hard. The OOP way will definitely be more appealing when using this component from C#, so I'll probably have to take this step anyway.

The constructor of the class takes an initial state and an initializing function which shouldn't be mixed. The initializing function is meant for loading assets. The dispose function should undo the effects of the initializing function and should dispose of any disposable asset.

The initial state represents the state of the world when the game starts. This is typically built from a level description when entering a new level, or by loading a previously saved session.

When the game is running, the state is passed to the update function, which isn't so well named, in retrospect. Its role is to prepare data for the compute and draw functions, which are run in parallel. In most cases I expect that ComputationData and State are the same type.

As is common in parallel update/draw scenarios, the draw function renders the state produced by the compute function in the previous frame. This is necessary to avoid race conditions between the compute and draw functions.

I am not yet sure whether I'm happy with my implementation of IDisposable, and this may change if turns out to be impractical or wrong. I've been using this component in a new game I'm working on, and I have been positively surprised. I hope you will find it useful!