Friday, May 18, 2012

Recursive descent parser using active patterns

Today's post isn't specifically about games, but about parsing, which I find is a recurring task in many programming tasks, including game-related tasks.

In F#, the most popular methods for writing parsers are FParsec and fslex/fsyacc. Although parser generators are very useful, I'm always a bit reluctant to depend on third-party software.

In the past I have worked on a development tool for safety-critical systems, which was itself subject to some of the limitations of software for safety-critical systems. In particular, the use of lex and yacc was not accepted.

I fell back on a technique suitable for hand-written parsers, namely recursive descent parsers. I was positively surprised by the inherent simplicity of the method and the clarity of code that resulted.

I was first exposed to the idea of using active patterns for parsing when I read a blog post by Jon Harrop.
Active patterns is a feature specific to F#, and the F# wikibook has a chapter dedicated to them. I'll be using parameterized partial active patterns.

We start by defining a type for the input. The integer denotes the number of lines read so far, the list of strings is the input split at line-breaks.
open System.Text.RegularExpressions

type Stream = Stream of int * string list

A pattern to detect the end of the input, when the list of strings is empty.
let (|EOF|_|) = function
    | Stream(_, []) -> Some()
    | _ -> None

A pattern to detect end-of-lines, by recognizing list of strings which start with the empty string.
let (|EOL|_|) = function
    | Stream(n, "" :: rest) ->
        Some (Stream(n + 1, rest))
    | _ -> None
This pattern eats white space.
let (|OneWS|_|) = function
    | Stream(n, line :: rest) ->
        let trimmed = line.TrimStart()
        if trimmed <> line then
            Some (Stream(n, trimmed :: rest))
        else
            None
    | _ -> None
A convenient pattern to eat sequences of white space and newlines. Note the use of the rec keyword, which allows to refer to the pattern itself in its implementation.
let rec (|WS|_|) = function
    | OneWS (WS s)
    | EOL (WS s) -> Some s
    | s -> Some s
We could also have attempted another variant, where WS uses itself for the first part, which would implement a left-recursive grammar. Unfortunately, this pattern would be useless, as all it would do is raise stack-overflow exceptions.
let rec (|WS1|_|) = function
    | WS1 (OneWS s) -> Some s
    | s -> Some s
My variant of the Regex pattern differs a bit from the one in the wikibook, to avoid re-building Regex objects, which has a non-neglectable cost.
let (|Regex|_|) (regex : Regex) = function
    | Stream(n, line :: rest) ->
        let m = regex.Match(line)
        if m.Success then
            let lineRest = line.Substring(m.Length)
            let values =
                [ for gr in m.Groups -> gr.Value ]
            Some (values, Stream(n, lineRest :: rest))
        else None
    | _ -> None
A sample pattern to illustrate the use of the Regex pattern.
let personRegex = Regex("NAME: (\w+) AGE: (\d+)")
let (|Person|_|) = function
    | WS (Regex personRegex ([_ ; name ; age], s)) -> Some ((name, age), s)
    | _ -> None
A pattern to parse data of a large number of persons. I could have used recursion at the level of the pattern itself, similarly to WS. However, such a pattern would not be tail-recursive, and would cause stack overflows for large numbers of entries.
let (|Persons|_|) s =
    let rec work s persons =
        match s with
        | Person (p, s) -> work s (p :: persons)
        | _ -> (persons |> List.rev, s)
    let persons, s = work s []
    Some (persons, s)
Finally, some utility code to initialize streams, generate one million lines to parse, followed by the parsing itself.
let streamOfLines lines =
    Stream(0, lines |> List.ofSeq)

let streamOfFile path =
    Stream(0, System.IO.File.ReadAllLines(path) |> List.ofArray)

let stream =
    seq { for n in 0..1000000 do
            yield sprintf "  NAME: Anonymous AGE: %d              " (n % 99) }
    |> streamOfLines

#time "on"
match stream with
| Persons (p, WS EOF) -> printfn "Success %A" p
| _ -> printfn "Failed"
#time "off"
Parsing takes about 9 seconds on my PC (18s if I recreate the Regex object repeatedly). I think that's very decent performance.
In a future post, I'll describe how to parse Boolean formulas and expand a bit on the power and performance of this method.

Tuesday, May 1, 2012

Problems with portable libraries and object-oriented APIs

I've got along well with inheritance for a long time, but this was not peace. Inheritance was preparing, in the dark, cowardly, waiting to byte me. And it just did that!

In my last post, I presented a method to detach libraries from external assemblies and let applications take responsibility for the "linking".

The key was to use interfaces to shadow the API of the external assemblies. Sadly, it falls apart when the API being shadowed, say XNA, relies on inheritance to connect to the user's code.

Take GameComponent, for example. Users are expected to define a type that inherits from GameComponent, overriding a number of methods (Initialize, Update and Draw, typically).

It would be nice if one could declare that XNA provides a type GameComponent with abstract methods (the syntax is made up, that's not valid ML or F#):
signature Xna =
  type GameTime
  type GameComponent with
    abstract new : unit -> unit
    abstract Initialize : unit -> unit
    abstract Update : GameTime -> unit
    abstract Draw : GameTime -> unit

Using the method described in the previous post, this would become in F#:
type XnaImplementation<'GameTime, 'GameComponent>() =
  ...

Then F# code could maybe do something like that:
type MyGameComponent<'GameTime, 'GameComponent>(xnaOps : XnaImplementation<'GameTime, 'GameComponent>) =
  inherit 'GameComponent()
  ...

That is however not valid F# code, as inheriting from generic types is not allowed.
I have a solution for that in my code, but I'm not very happy. It involves a new interface IGameComponent that the API provider has to wrap inside XNA's GameComponent.
Throw into the lot that XNA's GameComponent provides some functionality (properties Visible and Enabled) that MyGameComponent needs to be able to access, and you need to throw another wrapper into the picture. Not pretty at all.

I've been considering whether I should take a different approach and create a so-called reference assembly for XNA. In theory, it shouldn't be hard. Extract the type definitions and methods that are needed from the actual libraries, put them into a reference assembly, and be done with it. Unfortunately, there does not seem to be a tool to do that available to the public. Microsoft should seriously consider including such a tool with VS11, it would allow users to add the little bits that were not portable enough for the portable .NET core, but that are nevertheless available on the platforms that users are targeting.

There are days when I really miss SDL. A simple library that lets itself be called, but doesn't call you. Lesson of the day: object-oriented APIs without proper tooling are painful.