Saturday, September 21, 2013

CLR Concurrency: Preventing Torn Reads Without Locks

The CLR's value types are incredibly useful for reducing memory usage of programs, but they have a severe limitation in concurrent scenarios: structs larger than the atomic type on a given machine can suffer from torn reads.

Most typical applications won't encounter this because their concurrent accesses, both reads and writes, are protected by locks. However, there are some scenarios where locks just aren't viable for various reasons. For instance, if a few shared variables are read an order of magnitude more often than they're written, then all the lock contention is 90% wasted work among readers that aren't performing any updates.

In principle, the lock is really there to permit only one writer to modify the variable at a time. This means we can possibly use some other signalling mechanism to notify readers that a write is taking place, or has taken place. Ideally, this mechanism shouldn't cause contention among readers thus permitting more read parallelism.

The typical abstraction for this is a reader-writer lock (rwlock). Basically, a number of concurrent readers are permitted to access a resource protected by the rwlock, and writers have to request access and wait until all readers are done. Then the writer proceeds, and all readers must wait until the writer is finished. Unfortunately, rwlocks aren't all they're cracked up to be. Most of the problems stem from the fact that any amount of shared state will inherently limit scalability, but part of the problem is also because readers are performing writes, and thus they are introducing unnecessary contention.

Turns out, it's possible to solve this contention issue using only an additional piece of data: a version number. Writers still coordinate via a standard lock, and when a writer enters the critical section, it increments a version number an executes a memory barrier. The version number is now odd, which indicates that a write is in progress. Then the writer writes to the variable, executes another barrier, then increments the version number again. The version number is now even, indicating that no write is in progress:

public static void Write<T>(
    ref T location,
    ref T value,
    ref int version,
    object writeLock)
{
    lock(writeLock)
    {
        ++version;              // ++version odd: write in progress
        Thread.MemoryBarrier(); // ensure increment complete before update
        location = value;
        Thread.MemoryBarrier(); // ensure update complete before increment
        ++version;              // ++version even: write complete
    }
}

Readers instead only consult the version number to check whether a write is in progress, or whether a write has transpired since we first started the read. We read the version number into a local called 'old', and then spin until the version number is even, which indicates that no write is in progress. Then we read the value from 'location' into a local.

However, a write could have occurred since we exited the loop checking for an odd version number. So we read the version number again and compare it against 'old'. If it differs, that means a write occurred while we were reading, possibly corrupting our read. In that case, we abort and retry the whole process from the beginning. If the version number matches 'old', then the value we read is correct, and we can safely return it [1]:

public static T Read<T>(ref T location, ref int version)
{
    T x;
    int old;
    do
    {
        // loop until version is even = no write in progress
        do
        {
            old = version;
            if (0 == (old & 0x01)) break; // odd version means write in progress
            Thread.MemoryBarrier();       // read(version) from memory
        } while (true);
        x = location;                     // read value from location
        Thread.MemoryBarrier();           // read(version) from memory
    } while (version != old);
    // if version after read == old, no concurrent write
    return x;
}

So we've achieved our goal: we have fully concurrent reads that don't contend for resources, and that don't block writers. I've just included these two atomic read/write functions, and some useful variants, into Sasa's Atomics class.

Turns out this approach isn't new, and Duffy covered a similar approach in a later blog post which I only came across after writing this piece. He rightly points out that these are the sort of techniques employed in software transactional memory (STM), whereby we do as much as possible optimistically, then validate at the end that nothing untoward happened (like a write when we weren't expecting it). If the unexpected happens, we "abort" and retry. As pointed out in the comments to that post, this is essentially the seqlock locking mechanism as used in the Linux kernel.

I don't anticipate using this too often, but I wouldn't have included it in Sasa if it didn't have an important application within Sasa itself. Sasa.Reactive is the assembly that provides reactive variables that can be concurrently read, but only updated by a single writer. I designed the atomic read/write functions above while refactoring the Sasa.Reactive implementation to be more robust, yet more conservative in resource use. These atomic read/write functions allow concurrent readers to safely obtain a snapshot of a reactive variable's value, without resorting to storing values in atomic types, like a reference type.

Hopefully others will find it useful as well. If anyone spots a problem in the above algorithm please do let me know! The CLR provides a relatively strong memory model, so I'm pretty sure I can eliminate one memory barrier in the atomic write function, but I'd love to have some other input. Concurrency is hard after all!

Edit: Brian Gideon helpfully ran some tests of his own that support the correctness of these operations, and their performance benefits over both locking, and interlocked operations.

[1] Technically, the version number could have been incremented so many times that it wrapped around until it matches the saved 'old' value, but IMO it's exceedingly unlikely that 232 writes occurred while we were trying to read a single variable.

2 comments:

Cory said...

Using Volatile.Read instead of Thread.MemoryBarrier should be significantly more scalable and give the same results.

Sandro Magi said...

Probably. Unfortunately, the Volatile class is a new addition, and only supports .NET 4.5. Sasa targets older frameworks as well.