Sunday, March 29, 2009

Structs that look like records

I've spent some time trying to improve my asteroids clone in F# for the XBox360. I present the first results in this post.

But first, some non-technical considerations. Microsoft has now made available to XNA creators (i.e. game programmers) the download statistics of their games. A vast majority of creators seem disappointed, but I think that's more a sign of too high expectations than a failure of the platform itself.
One of the creators reported that full game downloads went down when Microsoft raised the time limit in trial versions from 4 minutes to 8 minutes. Players have the ability to download a trial version of an XBox Live Community Game before buying the full version. Trial versions typically have some features disabled (this is up to the game creator) and a mandatory time limit (which cannot be controlled by game creators).
What happened there, apparently, is that the game is played in short sessions shorter than 8 minutes at a time, making the free trial version sufficiently appealing that gamers did not feel a need to buy the full version.

Asteroids is not typically a game you play for long periods at a time, so I expect my game may suffer the same problem.

Back to the technical side of things. I found a way to make structs look like records. Here is the declaration:


type State =
struct
val props : Properties
val pos : Vector3
val speed : Vector3
val impulses : Vector3
val force : Vector3

new (props, pos, speed, impulses, force) =
{ props = props ;
pos = pos ;
speed = speed ;
impulses = impulses ;
force = force }

static member inline Default (props, ?pos, ?speed, ?impulses, ?force) =
State(
props,
pos = defaultArg pos Vector3.Zero,
speed = defaultArg speed Vector3.Zero,
impulses = defaultArg impulses Vector3.Zero,
force = defaultArg force Vector3.Zero)

member inline x.Clone (?pos, ?speed, ?impulses, ?force) =
new State (
x.props,
pos = defaultArg pos x.pos,
speed = defaultArg speed x.speed,
impulses = defaultArg impulses x.impulses,
force = defaultArg force x.force )


... and here is how it's used:


member x.ApplyCenterForce (v : Vector3) =
let force = x.force + v
x.Clone (force = force)


This is very similar to the old code:


let applyCenterForce (v : Vector3) (x : State) =
let force = x.force + v
{ x with force = force }


Looking back at the first snipplet, notice how I declared "x.Clone":


static member inline Default (props, ?pos, ?speed, ?impulses, ?force) =

The question marks before the arguments mean that these arguments are optional. If the caller does not provide them, they automatically get the "None" value.

new State (
x.props,
pos = defaultArg pos x.pos,
speed = defaultArg speed x.speed,
impulses = defaultArg impulses x.impulses,
force = defaultArg force x.force )

"defaultArg" is a convenience F# standard function which returns the first value unless it's "None", in which case the second value is returned.

The function was declared "inline" to avoid creating lots of "Option" objects at call sites. For instance, the disassembled version of "ApplyCenterForce" looks like this:

public HeavyObject.State ApplyCenterForce(Vector3 v)
{
return new HeavyObject.State(this._props, this._pos, this._speed, this._impulses, this._force + v);
}


Compare with the version without "inline":


public HeavyObject.State ApplyCenterForce(Vector3 v)
{
Vector3 vector = this._force + v;
return this.Clone(null, null, null, new Option(vector));
}


As the entire point with structs is to avoid allocating objects on the heap, having an "Option" created for each call to ApplyCenterForce is not quite acceptable.

It's not clear that using structs is a clear win in all situations. Performance on the PC is negatively affected, as garbage collection is almost free there, whereas copying structs when passing them as parameters to functions isn't. Performance on the XBox360, however, is improved, as garbage collection is kept under control.

Wednesday, March 25, 2009

First obstacles and solutions

I've got forward with my asteroids clone on the XBox 360 using XNA, and I started noticing a few issues, not all technical.

First, a 3d space is mostly empty. I did not realize the probability of being hit by an asteroid is pretty low. In order to make the game somewhat interesting, I need a good 1k asteroids in a 100m x 100m x 100m cube. I doubt any player will have the patience and perseverance to shoot each of these 1000 asteroids. I have ideas about solving that problem, but I'm not sure which one to pick yet.

Regarding performance, the game initially ran fine with 100 asteroids, but as I say above, I will need 10 times that amount. The initial bottleneck was collision detection, which was quadratic. I solved this on the PC using space partitioning with an octree.


#light

open Microsoft.Xna.Framework
open System.Collections.Generic

type NodeData<'a> =
{ bbox : BoundingBox ;
sub_nodes : Node<'a> [] }

and Node<'a> =
| Leaf of BoundingBox * 'a []
| Node of NodeData<'a>


let mkBBox (p1 : Vector3) (p2 : Vector3) =
let x1, x2 =
if p1.X < p2.X then (p1.X, p2.X) else (p2.X, p1.X)
let y1, y2 =
if p1.Y < p2.Y then (p1.Y, p2.Y) else (p2.Y, p1.Y)
let z1, z2 =
if p1.Z < p2.Z then (p1.Z, p2.Z) else (p2.Z, p1.Z)

BoundingBox (Vector3 (x1, y1, z1), Vector3 (x2, y2, z2))


let map_parallel (func : 'a -> 'b) (items : 'a[]) =
let results = Array.zero_create (items.Length)
let count = ref items.Length
use notify = new System.Threading.AutoResetEvent(false)

items
|> Array.iteri (
fun i item ->
System.Threading.ThreadPool.QueueUserWorkItem (
fun _ ->
let res = func item
results.[i] <- res
let c = System.Threading.Interlocked.Decrement count
if c = 0 then notify.Set() |> ignore
) |> ignore
)
notify.WaitOne() |> ignore
results


let rec partition (count_limit) (intersect) (items : 'a []) (bbox : BoundingBox) multi_threaded =
if items.Length < count_limit
then
(bbox, items) |> Leaf
else
let in_bbox =
items
|> Array.filter (fun item -> intersect item bbox)

let center = 0.5f * (bbox.Min + bbox.Max)
let pts = bbox.GetCorners ()

let sub_nodes =
pts
|> (if multi_threaded then map_parallel else Array.map) (
fun pt ->
let sub_box = mkBBox center pt
partition count_limit intersect in_bbox sub_box false
)

{ bbox = bbox ;
sub_nodes = sub_nodes }
|> Node


let rec getVisible (view : BoundingFrustum) addVisible (partition : Node<'a>) =
match partition with
| Leaf (bbox, items) ->
if bbox.Intersects (view)
then
items |> Array.iter addVisible
| Node data ->
if data.bbox.Intersects (view)
then
data.sub_nodes
|> Array.iter (getVisible view addVisible)


It's pretty rough code, but there are some interesting bits anyway. In "partition", the function creating the octree, I use multithreading in the top call in an attempt to take advantage of the multiple cores. My timings on the PC show however little global benefit in using multithreading, but I don't know how much "partition" contributes to the overall execution time. It's too early to draw any conclusion, but at least performance is not negatively affected.

Note that I reimplemented map_parallel (which I had introduced in an earlier Tutorial about concurrency) without using asynchronous computations. It's obviously inferior to the older version as it does not handle exceptions, but it has the advantage of actually working on both the PC and the XBox360. The older version causes an "error code 4" on the XBox. I don't know why, I'll try to report that on hubfs with a smaller test case as soon as I get time.

The octree made the PC version run nicely with 1000 asteroids and more, but the XBox360 is still running very slowly, which makes me think that I have finally hit the limitations of its garbage collector. Note that I have not verified that, not being very comfortable with the development kit yet.

Currently, each asteroid is represented using a record, and records get translated to classes by the F# compiler. The records are stored in an array. I wonder if it would work better with structs instead, the idea being that an array of structs should be seen as one object, whereas an array of classes is composed of 1001 objects in this case. As the performance of garbage collection is dependent on the number of objects, but not so much on their size, an array of struct should offer better performance.

The sad thing is that I really liked records, I wish F# allowed the [<Struct>] attribute on records.

Saturday, March 14, 2009

F# game running on the console

I have just succeeded to write and run my first game on my XBox360. Here is proof of it:



The game is composed of an F# library and a top-level C# application. The game, a 3d clone of Asteroids, is still very far from being complete. For instance controls require a keyboard, and the code is still riddled with bugs. Never the less, the first attempt to deploy on the XBox was successful, which is quite encouraging. See Grammerjack's old blog for details on each step of the deployment process.

The F# code is written in a pure functional way, which I fear is not going to work out in the end. Such code relies heavily on the garbage collector, and the current version of the XBox360 .NET environment is known to be weak on this point.

I wonder if it's possible to modify the .NET code emitted by the F# compiler to replace instantiations of new objects by references to items in pre-allocated arrays. This would make it possible to keep purity in the F# source code while maintaining decent performance.
Right now, I feel that if I was to write code relying heavily on in-place modifications, I would rather do that in C# than in F#.