Wednesday, January 21, 2015

NHibernate: Associations with Composite Primary Keys as Part of a Composite Primary Key

NHibernate is a pretty useful tool, but occasionally it's not entirely documented in a way that makes it's flexibility evident. Composite keys are a particularly difficult area in this regard, as evidenced by the numerous articles on the topic.

Most of the existing articles cover this simply enough, but there is one uncommon corner case I have yet to see explained anywhere: a composite primary key one of whose key properties is an association with a composite key. This is probably pretty uncommon and there are ways around it, hence the lack of examples, but as a testament to NHibernate's flexibility, it's possible! Here's the example in code listing only the primary keys:

public class MotorType
{
  public Horsepower Horsepower { get; protected set; }
  public VoltageType VoltageType { get; protected set; }
}
public class Motor
{
  public MotorType MotorType { get; protected set; }
  public Efficiency Efficiency { get; protected set; }
}

The tables look like this:

MotorType
HorsepowerIdVoltageTypeId
Motor
HorsepowerIdVoltageTypeIdEfficiencyId

Most people would probably map this with the Motor class having the three primary key properties, one for each column, and an additional many-to-one association referencing MotorType, but that shouldn't actually be necessary. Composite primary keys are possible, and the primary key can contain an association, therefore it should be possible, in principle, for that association to itself need a composite key.

And here's how it's done for the Motors table:

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2">
  <class name="Motor, ExampleProject" table="Motors">
    <composite-id>
      <key-many-to-one name="MotorType">
        <column name="HorsepowerId" />
        <column name="VoltageTypeId" />
      </key-many-to-one>
      <key-property name="Efficiency" column="EfficiencyId" />
    </composite-id>
  </class>
</hibernate-mapping>

The MotorType class is a simple composite primary key:

<?xml version="1.0" encoding="utf-8" ?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2">
<class name="MotorType, ExampleProject" table="MotorTypes">
    <composite-id>
      <key-property name="Horsepower" column="HorsepowerId" />
      <key-property name="VoltageType" column="VoltageTypeId" />
    </composite-id>
</class>
</hibernate-mapping>

Saturday, October 11, 2014

Generalized Multicast Delegates in Pure C#

.NET's delegates are a powerful and convenient abstraction available since .NET 1.1. They encapsulate a method pointer and the object the method belongs to in a single callable "function object". Multicast delegates are an extension of plain delegates, in that they encompass multiple single delegates. Invoking a multicast delegate invokes every encapsulated delegate, in the order they were added. Multicast delegates are key to event handling patterns that were a core part of the CLR nearly since its inception.

If you're curious about virtual machines or CLR internals, you've perhaps wondered how multicast delegates actually work. These are the key properties of multicast delegates:

  1. They encapsulate a list of delegates of the same delegate type.
  2. Adding a delegate returns a new multicast delegate containing the new addition at the end of the list (the original is unchanged).
  3. Removing a delegate returns a new multicast delegate without the specified delegate (the original is unchanged).
  4. Invoking a multicast delegate invokes each encapsulated delegate, in order, with the given parameters.

These properties are a straightforward immutable queue of delegates, and if we just wanted something that works, we could just use an immutable queue like Sasa.Fifo<T>. However, the CLR's multicast delegates are insanely fast to invoke by comparison. Here's a simple timing result:

    = System.MulticastDelegate =
    Build           =       1,390,819
    Run             =          53,762

    = Delegate Immutable Queue =
    Build           =         507,871
    Run             =         527,292

This builds up a multicast delegate of 20 items about 200,000 times, then executes that delegate 200,000 times. Building multicast delegates is quite slow on the CLR, but invoking them is an order of magnitude faster than naively traversing a queue and invoking each delegate. Presumably, invoking delegates happens more often than adding and removing delegates, so better invocation speed is a good tradeoff. So how do we achieve it?

First, let's consider the fastest possible implementation of invocation: arrays. If we preallocate an appropriately sized array of delegate slots, and invoke each entry in a for loop, we get these timings:

    = Delegate Array =
    Build           =          71,554
    Run             =          54,022

Exactly equal to multicast delegates, to within reasonable experimental error. So now we know we need some data structure that approximates a queue, but uses contiguous memory for fast traversals during delegate invocation. We can simulate an immutable array queue using Sasa's array combinators (see Append and Remove):

public struct MulticastSimple<T>
    where T : class
{
    internal T[] items;

    public MulticastSimple&lgt;T> Add(T item)
    {
        return new MulticastSimple<T>
        {
            items = items == null ? new[] { item } : items.Append(item),
        };
    }

    public MulticastSimple<T> Remove(T item)
    {
        if (items != null)
        {
            var i = Array.IndexOf(items, item);
            if (i >= 0) return new MulticastSimple<T>
            {
                items = items.Length == 1 ? null : items.Remove(i)
            };
        }
        return this;
    }
}

It's pretty simple, but let's see how it compares with the built-in multicast:

    = Multicast Delegate =
    Build           =       1,334,914
    Run             =          57,646

    = MulticastSimple =
    Build           =         860,529
    Run             =          58,011

Pretty good! Building MulticastSimple is much faster, and invocation is just as fast as the built-in multicast delegates. Remember that these numbers are for 20 delegates though, and given the cost of adding delegates is O(n), this won't scale very well to large numbers of delegates. Let's see what the numbers look like for 200 delegates iterated 20,000 times:

    = System.MulticastDelegate =
    Build           =       1,324,381
    Run             =          45,683

    = MulticastSimple =
    Build           =       2,545,824
    Run             =          45,497

MulticastSimple is already more expensive than multicast delegates, so now let's see 2,000 delegates iterated 2,000 times:

    = System.MulticastDelegate =
    Build           =       1,324,422
    Run             =          45,293

    = MulticastSimple =
    Build           =      21,115,099
    Run             =          51,515

And here we see the limitations of the naive approach. The cost of generating multicast delegates appears constant regardless of the number of delegates, but the cost for MulticastSimple scales linearly with the number of delegates. However, even 20 delegates in a multicast delegates is uncommon, so optimizing for the rare cases probably isn't worth the effort.

A Scalable Multicast up to M

Fortunately, there is a simple extension of the above design which will efficiently support up to M delegates, where M is some configurable number. Basically, efficient delegate addition needs a single small array, and efficient traversal needs an array as well, but there's no need for these two arrays to be the same:

public struct Multicast<T>
    where T : class
{
    internal T[][] items;
    internal T[] end;

    public Multicast<T> Add(T item)
    {
        if (items == null) return new Multicast<T> { end = new[] { item } };
        if (items.Length < 16) return new Multicast<T>
        {
            items = items, end = end.Append(item)
        };
        return new Multicast<T>
        {
            items = items == null ? new T[][] { end } : items.Append(end),
            end = new[] { item },
        };
    }

    public Multicast<T> Remove(T item)
    {
        if (items != null)
        {
            var i = Array.IndexOf(items, item);
            if (i >= 0) return new Multicast<T>
            {
                items = items, end = end.Length == 1 ? null : end.Remove(i)
            };
        }
        else if (items != null)
        {
            for (int i = 0; i < items.Length; ++i)
            {
                var j = Array.IndexOf(items[i], item);
                if (j >= 0) return new Multicast<T>
                {
                    items = items[i].Length == 1
                          ? items.Remove(i)
                          : items.Set(i, items[i].Remove(j))
                };
            }
        }
        return this;
    }

    public T[][] Items { get { return items; } }
    public T[] End { get { return end; } }

    public static Multicast<T> operator +(Multicast<T> lhs, T rhs)
    {
        return lhs.Add(rhs);
    }

    public static Multicast<T> operator -(Multicast<T> lhs, T rhs)
    {
        return lhs.Remove(rhs);
    }
}

So we keep a small array up to 32 items for efficient appends (field 'end'), and when 32 or more delegates are added this overflows into a nested array (field 'items') and the 'end' array is restarted. So appends have the same low constant factor overheads as the MulticastSimple, but scale at linearly at a factor of n/32 instead of just n. This is still fundamentally O(n), but the constant factor overheads of builtin multicast delgate appends are so high that they only beat out this implementation around 10,000 delegates:

    = System.MulticastDelegate =
    Build           =       1,479,262
    Run             =          44,318

    = Multicast<T> =
    Build           =       1,778,939
    Run             =          58,599

A multicast delegate with 10,000 entries is more than any reasonable program will ever need. At 200 delegates:

    = System.MulticastDelegate =
    Build           =       1,445,167
    Run             =          44,934

    = Multicast<T> =
    Build           =         949,040
    Run             =          57,367

At 20 delegates:

    = System.MulticastDelegate =
    Build           =       1,411,269
    Run             =          55,544

    = Multicast<T> =
    Build           =         853,145
    Run             =          57,036

And probably the most likely case, 4 delegates:

    = System.MulticastDelegate =
    Build           =         979,093
    Run             =          80,781

    = Multicast<T> =
    Build           =         632,594
    Run             =          77,277

Multicast Invocation

Normally the C# compiler produces the code needed to invoke any delegate type, and we can achieve something similar with T4 templates and reusing the built-in System.Action overloads. First, let's look at the basic invocation for System.Action:

public static void Invoke(this Multicast<Action> m)
{
    for (var i = 0; m.items != null && i < m.items.Length; ++i)
    {
        var x = m.items[i];
        for (var j = 0; j < x.Length; ++j)
        {
            x[j]();
        }
    }
    for (var i = 0; m.end != null && i < m.end.Length; ++i)
      m.end[i]();
}

Pretty straightforward: just iterate over each nested array and invoke the delegates. The invocation for an action with a parameter consists of just adding the parameter to invoke:

public static void Invoke<T0>(this Multicast<Action<T>> m, T0 arg0)
{
    for (var i = 0; m.items != null && i < m.items.Length; ++i)
    {
        var x = m.items[i];
        for (var j = 0; j < x.Length; ++j)
        {
            x[j](arg0);
        }
    }
    for (var i = 0; m.end != null && i < m.end.Length; ++i)
        m.end[i](arg0);
}

And so on for all System.Action overloads up to 16 type arguments. We can automate generating these overloads using T4 templates.

The Semantics of Multicast System.Func

Multicast delegates on the CLR only have proper semantics for delegates that return no value. If you combine multiple delegates that return a value, then only the value from the last delegate is ever returned. But in principle, executing a list of delegates should generate a list of results. And the advantage of pure C# multicast delegates is that implementing this semantics is trivial!

public static IEnumerable<T> Invoke<T>(this Multicast<Func<T>> m)
{
    for (var i = 0; m.items != null && i < m.items.Length; ++i)
    {
        var x = m.items[i];
        for (var j = 0; j < x.Length; ++j)
        {
            yield return x[j]();
        }
    }
    for (var i = 0; m.end != null && i < && m.end.Length; ++i)
        yield return m.end[i]();
}

This is basically the same C# code as Action invocations, but we now exploit C#'s generator syntax to easily return lists of values while executing all the encapsulated delegates. Like System.Action, we can generate all the necessary overloads for System.Func using T4 templates.

Summary

We haven't reproduced precisely the internal multicast implementation, but we have something incredibly simple that's just as fast for all foreseeable programs, and it's in safe C# to boot. Furthermore, we've extended multicast delegates to include a more coherent and complete semantics for delegates that return values, all in about 150 lines of code.

You can download the full source code for this post here, including MulticastSimple and Multicast with T4 templates for the latter.

Thursday, September 25, 2014

Sasa v0.14.0 Released

A quick release, fixing one bug and adding some performance improvements and some new collection types. Here's the changelog:

 * added a faster and more compact hash array mapped trie under Sasa.Collections.Trie to replace Sasa.Collections.Tree
 * added Sasa.Collections.FingerTree
 * added purity attributes throughout Sasa (for .NET 4)
 * expanded various test suites and documentation
 * added a persistent vector, Sasa.Collections.Vector
 * factored out Atomics.BeginRead/BeginWrite read/write protocol so it can be used in more flexible scenarios
 * Sasa.Dynamics.Type.MaybeMutable is now computed eagerly
 * added an efficient, mutable union-find data structure
 * fixed: Uris.UriDecode now handles the incorrect but common UTF16 encoding
 * moved Sasa.LockSet under Sasa.Concurrency.LockSet

The Sasa.Collections.Tree type is no longer available, and is replaced by the faster and more appropriately named Sasa.Collections.Trie. The original tree was actually a trie, and I didn't want it confused with a future search tree type which provides different operations. The FingerTree is an experimental addition, as is Sasa.Collections.Vector which provides a fast immutable array.

Edit: the online docs for the new release are also available here.

Friday, July 18, 2014

Sasa v0.13.0 Released

The latest Sasa release fixes a few bugs with MIME parsing, and adds a few new concurrency features. Here is the online documentation, or a downloadable CHM file from Sourceforge is available alongside the binaries. The binaries are also available via nuget, of course. Here's the changelog:

  • added Sasa.Concurrency.RWLock, a truly slim concurrent read/write lock
  • switched Sasa.Dynamics.PIC to use RWLock
  • switched Sasa.Dynamics.PIC to rely only on arrays for performance reasons
  • Mail message parsing now doesn't use ad-hoc means to extract a body from the attachments
  • added Sasa.Changeable<T> which encapsulates all INotifyPropertyChanged and INotifyPropertyChanging logic with no space overhead
  • fixed an MIME HTML parsing bug
  • fixed regex lexing
  • added more efficient Enums class exposing static properties for various enum properties
  • alternate views inside multipart/related no longer incorrectly dropped
  • added well-behaved standards conforming URI encode/decode to Sasa.Uris
  • added overload to customize string comparison type when tokenizing

Nothing too Earth-shattering. While I generally deplore reinventing the wheel, I found the URL encoding/decoding functions provided by System.Uri and in System.Web to be too inconsistent for my purposes in Clavis. The encode/decode functions in Sasa.Uris now work on StringBuilder, so they are more efficient, and they fully conform to the latest RFC.

The RWLock was covered here before, so no need to detail that. The PIC uses internal tables which are now protected by RWLock.

The only other really new feature is the Sasa.Changeable<T> type, which encapsulates the logic implementing INotifyPropertyChanging and INotifyPropertyChanged:

public struct Changeable<T>
{
  public T Value { get; private set; }

  public bool Update(string name, T value,
                    PropertyChangingEventHandler onChanging,
                    PropertyChangedEventHandler onChanged)
  {
    if (EqualityComparer<T>.Default.Equals(Value, value)) return false;
    onChanging.Raise(name, new PropertyChangingEventArgs(name));
    Value = value;
    onChanged.Raise(name, new PropertyChangedEventArgs(name));
    return true;
  }
}

So instead of repeating this logic in every property, you simply declare the field to be a Changeable<T> and call update with references to the appropriate event handlers:

public class Foo : INotifyPropertyChanging, INotifyPropertyChanged
{
  Changeable<int> foo;
  PropertyChangingEventHandler onChanging;
  PropertyChangedEventHandler onChanged;

  public PropertyChangingEventHandler PropertyChanging
  {
    add { Sasa.Events.Add(ref onChanging, value); }
    remove { Sasa.Events.Remove(ref onChanging, value); }
  }

  public PropertyChangedEventHandler PropertyChanged
  {
    add { Sasa.Events.Add(ref onChanged, value); }
    remove { Sasa.Events.Remove(ref onChanged, value); }
  }

  public int Foo
  {
    get { return foo.Value; }
    set { foo.Update("Foo", value, PropertyChanging, PropertyChanged); }
  }
}

If the value differs from the current value, then the events will be raised. The Update method returns true if the value was updated, and false otherwise so you can implement your own change logic as well.

Note that the handlers can be null if you're not implementing both INotifyPropertyChanging and INotifyPropertyChanged.

Sunday, April 20, 2014

Immutable Sasa.Collections.Tree vs. System.Collections.Dictionary vs. C5 HashDictionary

I've previously posted about Sasa's hash-array mapped trie, but I never posted any benchmarks. I recently came across this post on Stackoverflow which provided a decent basic benchmark between .NET's default Dictionary<TKey, TValue>, the C5 collection's hash dictionary, F#'s immutable map, and .NET's new immutable collections.

I slightly modified the file to remove the bench against the F# map and the new immutable collections since I'm still using VS 2010, and I added a simple warmup phase to ensure the methods have all been JIT compiled and the GC run to avoid introducing noise:

static void Warmup()
{
    var x = Tree.Make<string, object>();
    var y = new C5.HashDictionary<string, object>();
    var z = new Dictionary<string, object>();
    z.Add("foo", "bar");
    for (var i = 0; i < 100; ++i)
    {
        x = x.Add("foo" + i, "bar");
        y.Add("foo" + i, "bar");
        z.Add("foo" + i, "bar");
        var tmp1 = x["foo" + i];
        var tmp2 = y["foo" + i];
        var tmp3 = z["foo" + i];
    }
    x = default(Tree<string, object>);
    y = null;
    z = null;
    GC.Collect();
}

The results are still somewhat representative. This is a sample of an average output, where "Imm" is Sasa's immutable HAMT:

# - 100
SCGD -          0 MS -         25 Ticks
C5   -          0 MS -        887 Ticks
Imm  -          0 MS -        387 Ticks

# - 1000
SCGD -          0 MS -        257 Ticks
C5   -          0 MS -        294 Ticks
Imm  -          0 MS -        368 Ticks

# - 10000
SCGD -          1 MS -       4084 Ticks
C5   -          1 MS -       5182 Ticks
Imm  -          1 MS -       5436 Ticks

# - 100000
SCGD -         28 MS -      85742 Ticks
C5   -         32 MS -      99280 Ticks
Imm  -         32 MS -      97720 Ticks

Observations:

  1. C5's standard deviation was somewhat wider than both Sasa's HAMT and SCGD, so it's performance seems slightly less predictable
  2. Sasa's immutable HAMT appears to perform within 5% of the mutable C5 collection at all collection sizes
  3. Sasa's immutable HAMT appears to perform within 15% of the mutable SCGD for large collections where the hash table with higher load factors
  4. Small collections requiring a small load factor clearly advantage the mutable SCGD by up to an order of magnitude, an advantage not shared by C5 for some reason (possibly they maintain a higher load factor)
  5. C5's terrible performance on very small collections of 100 items was consistent on every test run, again possibly because they maintain a high load factor before resizing
  6. Sasa's HAMT takes just as much time to load 1000 items as it takes to load 100 items; this was consistent across every test run, and it's not clear why

Finally, while not exactly apples-to-apples, Sasa's HAMT is easily 3-4× faster than F#'s map given the numbers cited in the above Stackoverflow post. F# still has an advantage for very small collections though. Sasa's HAMT also appears to be at least 2× faster than the new immutable collections.

Also keep in mind that this benchmark only tests lookup performance. F#'s map would have an advantage over Sasa's HAMT in load performance because the HAMT does not yet include a "bulk-load" operation, which the F# map does appear to support.

Tuesday, April 1, 2014

A Truly Slim Read/Write Lock in C#

It's pretty well known that the CLR's ReaderWriterLock and ReaderWriterLockSlim have unappealing performance characteristics. Each class also encapsulates signficant state, which precludes its use in fine-grained concurrency across large collections of objects.

Enter Sasa.Concurrency.RWLock in the core Sasa assembly. This is the most lightweight R/W lock I could come up with, particularly in terms of resources used. It's a struct that encapsulates a simple integer that stores the number of readers and a flag indicating whether a writer is active.

The interface is similar to ReaderWriterLockSlim, although there are a few differences which are needed to keep the encapsulated state so small:

public struct RWLock
{
  // this field is the only state needed by RWLock 
  private int flags;

  public void EnterReadLock();
  public void ExitReadLock();
  public bool TryEnterReadLock();

  public void EnterWriteLock(object sync);
  public bool TryEnterWriteLock(object sync);
  public void ExitWriteLock(object sync);
}

Conceptually, EnterWriteLock calls Monitor.Enter(sync), which ensures that only a single writer acquires the write lock. It then sets the write bit in the "flags" state, and loops yielding its time slice until all read locks are released.

EnterReadLock also loops yielding its time slice until the write flag is cleared, and then it uses Interlocked.Increment to acquire a read lock, and Interlocked.Decrement to release the read lock.

The TryEnterReadLock and TryEnterWriteLock provide non-blocking semantics, so there is no looping. If the lock on 'sync' cannot be acquired, or the write flag is set, TryEnterWriteLock and TryEnterReadLock respectively return false immediately. They never block or loop under any circumstances.

The RWLock implementation is about 150 lines of heavily commented code, so it's easily digestible for anyone whose interested in the specifics. There are also some rules to abide by when using RWLock:

  1. The same 'sync' object must be passed to all write lock calls on a given RWLock. Obviously if you use a different object, more than one writer can proceed. Different objects can be used for different RWLocks of course.
  2. Recursive write locks are forbidden and will throw LockRecursionException. Recursive read locks are permitted.
  3. You cannot acquire a read lock inside a write lock, or a write lock inside a read lock. If you do, your program will immediately deadlock.

Unlike the base class libraries, none of my concurrency abstractions accept timeout parameters. Timeouts hide concurrency bugs and introduce pervasive non-determinism, which is partly why concurrent programs are traditionally hard to debug. Timeouts should be rare, and specified separately at a higher level than these low-level concurrency primitives.

Friday, February 14, 2014

Clavis 1.0.0-alpha2 Released

Stan Drapkin, of SecurityDriven.NET fame, was nice enough to provide a little feedback on the Clavis implementation. He pointed out a possible issue with parameter names not being properly URL-encoded, which this release fixes. I also applied a few minor optimizations so generating URLs should be a little faster. I've been using Clavis in production for a couple of years now, so it's fairly stable and user-friendly.

The Clavis wiki and issue tracker are available here.