Using the Task Parallel Library to Enforce Render-Thread Invocation

**amadmonk** · 12-13-2010

A pretty common problem when writing in-process bots is this:

1. I want to do some long-running work that will bog down the render thread, so I want to put it in another thread (or a background thread, or whatever). There are a lot of examples of this kind of work (local pathfinding, status spamming to a network connection, etc.).

2. However, some or all of this work cannot be run in the render thread. This is because Wow stores some important constructs in thread-local storage (TLS), which means that unless you poke this TLS info into every other thread you create, you risk crashing the game any time you touch certain pieces of information.

It is definitely possible to solve this problem through setting the TLS values in the worker thread and being very, very careful with your thread synchronization. Even then, you run the risk of crashes due to game state being inconsistent with this architecture (for instance, accessing the same Lua state object on multiple threads at the same time is NEVER going to be very safe, and that's just one example.) The only really 100% "safe" solution has previously been to always just run in the render thread.

Microsoft has been pushing code parallelism alot lately, with things like parallel LINQ (PLINQ) stealing much of the spotlight. However, a bit of code that's standard in .Net 4.0 is the Task Parallel Library (System.Threading.Tasks). Essentially this gives you, among other things, a seamless and easy way to package a unit of work up into a "task" and ship it off to... somewhere else... to be run, and then fetch the results locally. In Wow, you can use a little bit of glue code to ship all of your render thread work to the render thread, but make it look like it runs locally!

Enough talk. Here's the code snippet:

Code:

public readonly Queue<Task> Work = new Queue<Task>();

public T RunOnRenderThread<T>(Func<T> function)
{
    var task = new Task<T>(function);
    RunOnRenderThread(task);
    return task.Result;
}

public void RunOnRenderThread(Action action)
{
    RunOnRenderThread(new Task(action));
}

public void RunOnRenderThread(Task task)
{
    // if we're already on the render thread, just run...
    if (GetCurrentThreadId() == _renderThreadId)
    {
        task.RunSynchronously();
    }
    else
    {
        // otherwise, queue up a task to do the work
        lock (Work)
            Work.Enqueue(task);

        // and wait for it
        task.Wait();
    }
}

Now you're building up a queue of Task objects to execute in the render thread. So in the render thread's EndScene method, you need something like this:

Code:

...

if (Work.Count > 0)
    lock (Work)
        while (Work.Count > 0)
        {
            var task = Work.Dequeue();
            task.RunSynchronously();
        }

...

And that (along with tracking the render thread ID so that the RunOnRenderThread method will know which thread is the render thread) is all you need. Note that we favor the render thread in the lock contention (each EndScene drains the entire queue, whereas each execute only gets to enqueue one item in the lock)... this is for two reasons: first, the render thread is more important, and second, if you use this code a lot you can forget that you are essentially "gating" your worker thread to run as fast as the render thread does -- one call per frame!

There are ways to get around the one call per frame limitation. One of the simplest is simply to package more tasks into a single task. These are just Actions or Func's, after all, so you can just write an anonymous delegate to execute a batch of them at a time. This is what I do in my code, and it's pretty effective.

Here's a sample usage:

Code:

...
var myPosition = RunOnRenderThread(() => player.GetPosition());
...

And that's it... now the position virtual method invoke will always automatically happen on the right thread. If you're on the render thread, this will just loop back and execute synchronously. If you're NOT on the render thread, it will queue up a work item, wait for it to be completed, and then shuttle the result back to the calling thread.

Note that this isn't magic -- you could have done this exact same thing with a queue of Action's or Func's or whatever. However, the TPL makes the code much, much cleaner and simpler -- you don't have to write any of the waiting/synchronization code yourself (each task potentially has a wait while we compute the results, plus you have to associate queued results with queued tasks). So, while TPL doesn't technically (in this case) allow you to do anything you couldn't do before, it makes the code so much cleaner that there's no real reason NOT to implement this in your bots; even if you don't use it very often due to the "render thread gating" effect, it can save you some grief.

Edit: forgot to mention that this technique is infinitely extensible; if you want to define some methods that can invoke in a parallel thread, with the simple requirement that the TLS data is set, you can extend this design pattern so that you use a custom Task object that has an "invocation policy." This policy can then determine (a) if it's okay to run this task synchronously, or (b) if not, how do we run it in the "correct" invocation context? Using this, you can have all kinds of interesting arbitrary call patterns, including RPC over the network (but figuring out how to serialize an Action or Func is something that I'll leave for advanced readers

) You can also use this technique to enforce a memoization policy (which can dramatically improve your FPS since you really don't need "live" values for every frame for most values). Of course things can get weird if you have "chained" policies like a memoization policy and a cross-thread invocation policy.

Edit2: yes, astute readers will realize that setting an invocation policy on generic Task objects begins to approach AOP. But since I really loathe all of the aspect-weaving tools I've come across, I'll leave the full AOP implementation up to you.

**Robske** · 12-13-2010

Very nice, thank you amadmonk - I was looking for something like this.

**XTZGZoReX** · 12-13-2010

Good approach; exactly what the TPL was made for.

I should mention, though, that Task objects are relatively expensive; don't overuse them.

**dook123** · 12-13-2010

Lots of things are "expensive" depending on "relative" conditions. I once was told OOP overhead was expensive and to write everything in assembly

**amadmonk** · 12-13-2010

I'm instantiating, invoking, and GC'ing on the order of 100-500 Task objects/frame without seeing a frame rate hit.

They may be expensive, but apparently not expensive enough to matter

**adaephon** · 12-14-2010

Nice work looks interesting. .Net 4 also has thread safe collections so it might be worth checking out System.Collections.Concurrent.ConcurrentQueue instead of a normal queue + locks

**XTZGZoReX** · 12-14-2010

Originally Posted by adaephon

Nice work looks interesting. .Net 4 also has thread safe collections so it might be worth checking out System.Collections.Concurrent.ConcurrentQueue instead of a normal queue + locks

Absolutely. It uses CaS instead of locks for most operations.

I'm instantiating, invoking, and GC'ing on the order of 100-500 Task objects/frame without seeing a frame rate hit.

They may be expensive, but apparently not expensive enough to matter

OK, when you're in that range you should be just fine. The stuff I deal with is in the 100.000-200.000 range, so yeah... I had to resort to a Task-less actor model based on coroutines because the Task objects were thrashing the GC.

Lots of things are "expensive" depending on "relative" conditions. I once was told OOP overhead was expensive and to write everything in assembly

I'm not an unreasonable person; I do realize what real overhead in the real world is. But that's completely irrelevant here.

**amadmonk** · 12-14-2010

Originally Posted by XTZGZoReX

OK, when you're in that range you should be just fine. The stuff I deal with is in the 100.000-200.000 range, so yeah... I had to resort to a Task-less actor model based on coroutines because the Task objects were thrashing the GC.

Ah yes, most of my local bot logic (it's distributed) is in Lua, so there's not a need for quite so many cross-thread invocations. Since I get simple property gets (etc.) for "free" in Lua, I don't have to do thousands (or more) of checks from a remote thread.

Shout-Out

User Tag List

Thread: Using the Task Parallel Library to Enforce Render-Thread Invocation

Thread Tools

Search Thread

Using the Task Parallel Library to Enforce Render-Thread Invocation

Similar Threads

Donte use the Picklocking Exploit

How to use the Mind Controll tactic

Using The Auction House To Create Cheap Gold

Lower your risk of being banned when using the backspace scam

Warning: Do Not Use The Trade Scam!

OwnedCore Forums

casino

CoreCoins

My OwnedCore

About Us

Casino