Sunday, November 10, 2013

Degenerate Implicits in C#

Scala and Haskell both have very useful type-directed concepts for resolving implicit function parameters, called "implicits" and "type classes" respectively. To illustrate the impact of this, consider a generic sorting function like Enumerable.OrderBy. It must define two overloads, one that takes a custom comparer, and one that does not and so should use a default comparison for type TKey:

public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    IComparer<TKey> comparer
)
public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector
)

While it's simple to define the second overload as calling the first with Comparer<TKey>.Default for the custom comparer, in general, for N overridable parameters, you would need to define something like factorial(N) overloads for all possible permutations of defaults and overridable parameters. This can quickly become unwieldy. Ideally, we could define only a single function with optional parameters and sane default values.

C# 4.0 introduced a much more limited form of implicits called "default values" to address part of this problem. These implicit values are limited to only primitive types like Int32 and String, so they aren't useful for general resolution of implicit values, but they were a good start.

Fortunately, C# has a few other features that, when combined with default values, allow us to add something like the more flexible implicit values as found in Scala: reified generics and value types. Basically, we define the implicit parameter to be some sort of value type, whose default value is always defined, and this value type resolves to either the default instance, or to a custom instance if one is provided. For instance, we can define a single OrderBy overload like so:

public static IOrderedEnumerable<TSource> OrderBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    CompareImplicit<IComparer<TKey>> comparer =
      default(CompareImplicit<IComparer<TKey>>)
)

Where CompareImplicit can be defined something like this:

public struct CompareImplicit<T>
{
    IComparer<T> cmp;
    static IComparer<T> def = Comparer<T>.Default;
    public CompareImplicit(IComparer<T> cmp) { this.cmp = cmp; }
    public IComparer<T> Resolve()
    {
        return cmp ?? def;
    }
    public int Compare(T left, T right)
    {
        return Resolve().Compare(left, right);
    }
}

You can then write any sort of instance resolution for the IComparer<T> value you like, and still allow callers to specify optional overrides. For instance, here's an example of using the built-in comparison and overriding it with a "reverse" comparer:

public static class Example
{
    sealed class Reverse : IComparer<int>
    {
        public int Compare(int left, int right)
        {
            return right.CompareTo(left);
        }
    }
    public static IEnumerable<T> OrderBy<T>(
      IEnumerable<T> source,
      CompareImplicit<T> cmp = default(CompareImplicit<T>))
    {
        return source.OrderBy(x => x, cmp.Resolve());
    }
    public static void Main(string[] args)
    {
        var orig = new[] { 1, 66, 4, 32 };
        var inorder = OrderBy(orig);
        var revorder = OrderBy(orig, new CompareImplicit<int>(new Reverse()));
        Console.WriteLine("in order: {0}", inorder.Aggregate("", (seed, x) => seed + x + ", "));
        Console.WriteLine("rev order: {0}", revorder.Aggregate("", (seed, x) => seed + x + ", "));
        Console.ReadLine();
        // prints:
        // in order: 1, 4, 32, 66,
        // rev order: 66, 32, 4, 1
    }
}

I reused the Enumerable.OrderBy overload for clarity, but in principle, the OrderBy function would normally just call Resolve() to access the IComparer instance to use locally. We can define similar implicit arguments to use for resolving instances of IEqualityComparer, and so on, but there's perhaps a more uniform, reusable approach using only a single struct for all such implicits:

public struct Implicit<T>
{
    public readonly T Instance;
    public Implicit(T instance)
    {
        this.Instance = instance;
    }
    public T Resolve(T defaultInstance)
    {
        return Instance != null ? Instance : defaultInstance;
    }
    public static implicit operator Implicit<T>(T instance)
    {
        return new Implicit<T>(instance);
    }
}

With a little more work for the library writer, this allows us to use a single struct for all implicit parameters, and the receiving function will simply need to provide the default instance if no override is specified. For instance, here's OrderBy rewritten using this type:

public static class Example
{
    public static IEnumerable<T> OrderBy<T>(
      IEnumerable<T> source,
      Implicit<IComparer<T>> cmp = default(Implicit<IComparer<T>>))
    {
        return source.OrderBy(x => x, cmp.Instance ?? Comparer<T>.Default);
    }
}

In order to avoid hard-coding the source of the default implicit in every function, it's better to use dependency injection. This will will decouple defining the default value for implicits from the uses, thus allowing more flexibility:

public static class Example
{
    public static IEnumerable<T> OrderBy<T>(
      IEnumerable<T> source,
      Implicit<IComparer<T>> cmp = default(Implicit<IComparer<T>>))
    {
        return source.OrderBy(x => x, cmp.Instance ?? IoC.Resolve<IComparer<T>>());
    }
}

Technically, you don't even need the Implicit type in parameter list, as all reference types can be null be default. However, the presence of Implicit is good documentation implying that some sort of implicit instance resolution is taking place, and that this process can be overridden.

Finally, there are still some unfortunate C# warts in this approach. For instance, C# does not allow implicit conversions to or from interface types, so you will always have to explicitly construct the CompareImplicit or Implicit wrapper around a custom instance, even though Implicit defines an implicit conversion. This is annoying, but hardly the only time C# has restricted expressiveness for flimsy aesthetic reasons. I would also define an extension method to simplify lifting a custom value into an Implicit wrapper:

public static class Implicits
{
    public static Implicit<T> AsImplicit<T>(this T instance)
    {
        return instance;
    }
}

The previous code example will now look like this:

public static class Example
{
    ...

    public static void Main(string[] args)
    {
        var orig = new[] { 1, 66, 4, 32 };
        var inorder = OrderBy(orig);
        var revorder = OrderBy(orig, new Reverse().AsImplicit());
        Console.WriteLine("in order: {0}", inorder.Aggregate("", (seed, x) => seed + x + ", "));
        Console.WriteLine("rev order: {0}", revorder.Aggregate("", (seed, x) => seed + x + ", "));
        Console.ReadLine();
        // prints:
        // in order: 1, 4, 32, 66,
        // rev order: 66, 32, 4, 1
    }
}

In my opinion, not too bad given C#'s inherent limitations.

2 comments:

svick said...

This looks like too much work just to avoid using null. The following code will work just as well, and doesn't need any custom structs:

public static IEnumerable<T> OrderBy<T>(
IEnumerable<T> source,
IComparer<T> cmp = null)
{
return source.OrderBy(x => x, cmp ?? Comparer<T>.Default);
}

Sandro Magi said...

I noted that in my post, and I addressed why the extra notation is useful: because it's good documentation to show that a parameter is implicitly resolved to some default that isn't visible. If I just provided you with the following function:

public void DoSomething(Foo x = null);

What is the intent and meaning of x = null above?