Sunday, May 5, 2013

Sasa.Operators<T> Overhaul - Now With More Generic Operator Goodness

Sasa.Operators<T> was covered in a previous post, and was useful in its own right, but was still somewhat limited in the operators it could expose. Since it abstracted only over a single type parameter T, it exposed those operators defined only on T. For instance, addition has signature "T add(T, T)", negation is "T negate(T)", and so on.

But not all operators are defined on only a single type. For instance, System.Decimal provides addition operators that work on integers, both signed and unsigned. Sasa.Operators<T> couldn't handle that. It can now.

Sasa.Operators is now a fully generic operator framework, exposing static generic class types Operators<T> as before, Operators<T0, T1> which exposes operators defined over two type parameters, like equals whose signature is "bool equals(T0, T1)", and finally, Operators<T0,T1,T2> for operators defined over three possible types, like addition "T2 add(T0, T1)". Furthermore, all overloadable operators are now accessible, including True/False and explicit and implicit conversions.

The delegates accessible via the above Operators classes are also now more efficient, and don't rely on LINQ expression compilation. This was all possible due to a new function I introduced under Sasa.Func called "Operator". It takes a delegate signature as a type parameter, and a Sasa.Operator value designating the operator type, and it searches for that static operator method that matches the delegate signature needed. Then it creates a direct delegate to that method of the given signature, so invocation via Operators is simply a direct virtual dispatch into a static method.

The only exception is for primitive types, like Int32 and Int64, because they don't provide static method operators. When the arguments are all primitives and no operator method is available, a small stub function is dynamically generated that implements the operation in efficient bytecode. Can't get much faster than this.

All of this was precipitated by my implementation of a LINQ expression interpreter in the Sasa.Linq assembly. You can now easily evaluate almost any LINQ expression with a single call:

var z = CLR.Eval(() => 3 * 2.0 + 1); // z = 7.0
var x = CLR.Eval(() => new Func<int, int>(z => z + 1)(3)); // x=4
...

This is in alpha status obviously, but it passes a few tests, and more will come. Obviously LINQ expressions already have compilation built-in, but sometimes dynamic compilation either isn't available, or is too costly to perform. For instance, consider the case of compiling a LINQ to SQL query. You don't want to dynamically generate code every time you want to simplify a LINQ expression. An interpreter is the right choice here due to its much smaller overhead.

Saturday, May 4, 2013

Circumventing C#'s Type Constraint Limitations

It's well known that C# doesn't permit certain types to appear as constraints, like System.Enum or System.Delegate, and that C# further prevents sealed classes from appearing as constraints. However, the example in the above article serves to point to an interesting circumvention of C#'s limitations on constraints.

Suppose we wish to provide a statically typed enum interface, similar to what I provide in my Sasa class library, but without having to resort to IL rewriting as I do. We can implement this by exploiting the fact that type constraints are inherited. Consider the following general base class:

public abstract class Constrained<TArg0, TReturn>
{
    public abstract T1 Apply<T0, T1>(T0 arg0)
        where T0 : TArg0
        where T1 : TReturn;
}

Notice how all the type parameters at the class level are fully abstract, so the C# compiler can't complain about using Enum or Delegate at this level. The class then constrains the type parameters at the abstract method level based on the class-level type constraints. An statically type-safe enum parser is then simply:

public class EnumParser : Constrained<string, Enum>
{
    public override T1 Apply<T0, T1>(T0 arg0)
    {
        return (T1)Enum.Parse(typeof(T1), arg0);
    }
}
...
var uint16 = new EnumParser().Apply<TypeCode, string>("UInt16");

Despite the fact that the method type parameters T0 and T1 look fully abstract, they inherit the constraints of T0 : string and T1 : Enum from the base class, and so the C# compiler knows they have the correct types. Of course, Enum.Parse returns System.Object, so we have to cast to return the correct enum type.

Since EnumParser has no state, you can cache it in a static field:

public static class Enums
{
  public static readonly EnumParser Parse = new EnumParser();
}
...
var uint16 = Enums.Parse.Do<TypeCode, string>("UInt16");

You can also define more specialized base classes other than Constrained to eliminate the need for two type arguments on Apply:

public abstract class ParseConstrained<TReturn>
{
    public abstract T Apply<T>(string arg0)
        where T : TReturn;
}
public class EnumParser : ParseConstrained<Enum>
{
    public override T Apply<T>(string arg0)
    {
        return (T)Enum.Parse(typeof(T), arg0);
    }
}
...
var uint16 = Enums.Parse.Do<TypeCode>("UInt16");

I'm currently exploring whether it's possible to define a truly reusable set of base classes like Constrained. There are two general problems here:

  1. A general set of base classes won't have type constraints between method parameters, ie. like T0 : T1, which are sometimes needed.
  2. Covariant type safety really requires that the constraint on the return type be TReturn : T1, not the contravariant constraint I specified in Constrained. You could do this with type constraints extension methods over Constrained, but you have to specify a lot of type parameters.

Ultimately, covariance/contravariance will probably partition the set of constrained function abstractions. For instance, a set of reusable function abstractions with type constraints would look something like this:

// covariant function types
public abstract class Covariant<TReturn>
{
    public abstract TReturn Apply();
}
public abstract class Covariant<TArg0, TReturn>
{
    public abstract TReturn Apply<T0>(T0 arg0)
        where T0 : TArg0;
}
public abstract class Covariant<TArg0, TArg1, TReturn>
{
    public abstract TReturn Apply<T0, T1>(T0 arg0, T1 arg1)
        where T0 : TArg0
        where T1 : TArg1;
}

// contravariant function types
public abstract class Contravariant<TReturn>
{
    public abstract T Apply<T>()
        where T : TReturn;
}
public abstract class Contravariant<TArg0, TReturn>
{
    public abstract T1 Apply<T0, T1>(T0 arg0)
        where T0 : TArg0
        where T1 :TReturn;
}
public abstract class Contravariant<TArg0, TArg1, TReturn>
{
    public abstract T2 Apply<T0, T1, T2>(T0 arg0, T1 arg1)
        where T0 : TArg0
        where T1 : TArg1
        where T2 : TReturn;
}

But even this isn't quite enough to properly type System.Enum.GetValues. GetValues is covariant but its method type argument has class-level type parameter dependency:

public abstract class TypeIndexedCovariant<TIndex, TReturn>
{
    public abstract TReturn Apply<T0>()
        where T0 : TIndex
        where TReturn : IEnumerable<T0>;
}

Of course, this isn't legal C# because you can't specify a class-type constraint on a method type parameter. I'm afraid these scenarios will require custom base class implementations for the near future. So we can implement a special base class to type System.Enum.GetValues like so:

public abstract class IndexedCovariant<TIndex>
{
    public abstract IEnumerable<T0> Apply<T0>()
        where T0 : TIndex;
}
class EnumValues : IndexedCovariant<Enum>
{
    public override IEnumerable<T0> Apply<T0>()
    {
        return Enum.GetValues(typeof(T0)) as T0[];
    }
}

In summary, a rewrite of my Sasa.Enums library in pure C# with Enum type constraints and no IL rewriting would look something like the following, assuming the above implementations for values and parsing:

public class EnumNames : IndexedCovariant<Enum, IEnumerable<string>>
{
    public override IEnumerable<string> Apply<T0>()
    {
        return Enum.GetNames(typeof(T0));
    }
}
// type-safe enum operations
public static class Enums
{
    public static readonly EnumParser Parse;
    public static readonly EnumValues Values;
    public static readonly EnumNames Names;
}
...
var names = Enums.Names.Apply<TypeCode>();
var values = Enums.Values.Apply<TypeCode>();
TypeCode parsed = Enums.Parse.Apply<TypeCode>("UInt16");

Not quite as pretty as the current approach in Sasa, but I could see a language on top of the CLR exploiting these sorts of base classes to support first-class functions with first-class polymorphism, ie. where the function argument constraints are just System.Object.

Edit: it's come to my attention that my explanations sometimes suck, so here's a TLDR version I posted to reddit:

The post was about exploring some options to enforce type constraints that C# normally forbids you from specifying, like constraints on Enum and Delegate in a way the C# compiler accepts and checks for you. This is done by encoding functions as objects and lifting the type constraints to class level type parameters, and enforcing method parameters to be subtypes of those parameters. Subclassing such an abstract type lets you specify Enum and Delegate constraints at the class level, and the method parameter constraints are inherited from the base class. Later in the post was a brief discussion of a few reusable abstractions for covariant and contravariant "function objects" of this form.
Hopefully that clarifies a little!

Friday, April 26, 2013

Sasa.Linq.Enumerables - Extensions on IEnumerable<T>

This is the sixteenth post in my ongoing series covering the abstractions in Sasa. Previous posts:

In my opinion, the release of .NET 3.5 and LINQ to Objects was one of the greatest productivity and safety enhancements to .NET since its inception. It enabled the concise expression of highly generic algorithms depending solely on the simple notion of an iterable stream, IEnumerable<T>. You could then specify semantics for duplicates, ordering, and grouping constraints over streams with a nice query syntax or a simple first-class function syntax.

The advent of extension methods allowed such extensions to be packaged into a neat, consistent framework under System.Linq.Enumerable. Sasa.Linq.Enumerables extends the functionality for IEnumerable<T> even further.

Sasa.Linq.Enumerables.Append

Sasa.Linq.Enumerables.Append is an extension method on IEnumerable<T> to add a single element to the end of a sequence:

IEnumerable<int> source = new[] { 2, 99, 8 };
IEnumerable<int> sourcePlusOne = source.Append(-10);
foreach (var x in sourcePlusOne)
{
    Console.WriteLine(x);
}
// output:
// 2
// 99
// 8
// -10

Sasa.Linq.Enumerables.Apply

Sasa.Linq.Enumerables.Apply is an extension on IEnumerable<T> that applies an Action<T> to every value as the stream is being iterated:

IEnumerable<int> source = new[] { 2, 99, 8 }
                          .Apply(Console.Write);
foreach (var x in source)
{
    Console.WriteLine();
}
// output:
// 2
// 99
// 8

Sasa.Linq.Enumerables.CompareTo

Sasa.Linq.Enumerables.CompareTo is an extension on IEnumerable<T> that performs an element-wise comparison between two streams in the same vein as the IComparable<T> interface:

IEnumerable<int> largest = new[] { 3, 99, 8 };
IEnumerable<int> larger  = new[] { 2, 99, 8 };
IEnumerable<int> smaller = new[] { 2, 98, 8 };

Console.WriteLine(largest.CompareTo(larger));
Console.WriteLine(larger.CompareTo(smaller));
Console.WriteLine(smaller.CompareTo(largest));
Console.WriteLine(smaller.CompareTo(smaller));
// output:
// 1
// 1
// -1
// 0

Sasa.Linq.Enumerables.Consume

Sasa.Linq.Enumerables.Consume is an extension on IEnumerable<T> that consumes a whole stream in one go:

static IEnumerable<int> Foo()
{
    for (int i = 0; i < 3; ++i)
    {
        Console.WriteLine(i);
        yield return i;
    }
}
...
Foo().Consume(); // consume stream and return when done

// output:
// 0
// 1
// 2

Sasa.Linq.Enumerables.CopyTo

Sasa.Linq.Enumerables.CopyTo is an extension on IEnumerable<T> that copies elements from the stream into a pre-allocated array:

IEnumerable<int> source = new[] { 3, 99, 8, -5, 12, -999 };
int[] buffer = new int[3];
source.CopyTo(buffer, 0);
foreach (var x in buffer)
{
    Console.Write("{0}, ", x);
}
// output:
// 3, 99, 8,

Sasa.Linq.Enumerables.Difference

Sasa.Linq.Enumerables.Difference is an extension on IEnumerable<T> that produces a stream of Change<T> values indicating the changes needed to transform one stream into another:

IEnumerable<int> first = new[] { 3, 99, 8, -5, 12, -9 };
IEnumerable<int> second = new[] { 3, 99, -5, 12, 66, 234, -9 };
IEnumerable<Change<int>> diff = first.Difference(second);
foreach (var x in diff)
{
    Console.WriteLine(x);
}
// output:
// -1:99
// +1:99,8
// +5:66, 234

Basically, the 'diff' stream in the above sample describes the changes you have to make to 'first' in order to transform it into the stream 'second'. This is fully described by the output which specifies a subtraction at position 1 of the value 99, an addition at position 1 of values 99 and 8, and the addition of two values at position 5 of 66 and 234.

Sasa.Linq.Enumerables.Do

Sasa.Linq.Enumerables.Do is an extension on IEnumerable<T> that fully consumes a stream while applying a function to each element:

IEnumerable<int> source = new[] { 2, 99, 8 };
source.Do(Console.WriteLine);
// output:
// 2
// 99
// 8

Sasa.Linq.Enumerables.Flatten

Sasa.Linq.Enumerables.Flatten is an extension on IEnumerable<T> that flattens a nested sequence:

IEnumerable<IEnumerable<int>> source = new int[][]
{
    new[] { 2, 99 },
    new[] { 8 }
};
IEnumerable<int> flat = source.Flatten();
flat.Do(Console.WriteLine);
// output:
// 2
// 99
// 8

This is functionally equivalent to:

IEnumerable<int> flat = source.SelectMany(x => x);

Except the C# compiler won't have to create so many delegates in all the various assemblies that flatten sequences.

Sasa.Linq.Enumerables.Format

Sasa.Linq.Enumerables.Format is a set of extension method overloads on IEnumerable<T> that make it easy to construct a string from a sequence. Basically, you provide a sequence and a value separator, like ",", and Format will output each value in the sequence spaced with the separator:

IEnumerable<int> source = new[] { 2, 99, 8 };
string formatted = source.Format(", ");
Console.WriteLine(formatted);
// output:
// 2, 99, 8

Sasa.Linq.Enumerables.Generate

Sasa.Linq.Enumerables.Generate is a static method used to construct sequences of values by taking a seed value and a generator function:

Func<int, Option<int>> step = x => x < 3 ? x + 1:
                                           Option<T>.None;
IEnumerable<int> source = Enumerables.Generate(0, step);
Console.WriteLine(source.Format(", "));
// output:
// 0, 1, 2, 3

The generator function returns an Option<T>, so terminating the generator requires simply returning Option<T>.None.

Sasa.Linq.Enumerables.Push

Sasa.Linq.Enumerables.Push is an extension on IEnumerable<T> that pushes an element to the front of an enumerable sequence:

IEnumerable<int> source = new[] { 2, 99, 8 };
Console.WriteLine(source.Push(-1).Format(","));
// output:
// -1, 2, 99, 8

Sasa.Linq.Enumerables.ReplaceElementAt

Sasa.Linq.Enumerables.ReplaceElementAt is an extension on IEnumerable<T> that replaces the element at a specified index with a new value:

IEnumerable<int> source = new[] { 2, 99, 8 }
                         .ReplaceElementAt(index: 1, value: -1);
Console.WriteLine(source.Format(","));
// output:
// 2, -1, 8

Sasa.Linq.Enumerables.Transpose

Sasa.Linq.Enumerables.Transpose is an extension on a nested sequence, IEnumerable<IEnumerable<T>> that transposes the rows and columns:

IEnumerable<IEnumerable<int>> source = new int[][]
{
    new[] { 2, 99 },
    new[] { 8, -10 }
};
// transpose the rows and columns
foreach (var row in source.Transpose())
{
    foreach (var column in row)
    {
        Console.Write("{0}, ", column);
    }
    Console.WriteLine();
}
flat.Do(Console.WriteLine);
// output:
// 2, 8
// 99, -10

Note that this method will handle jagged sequences, but transposing twice will not necessarily recover the same sequence as the original input. All the jagged entries will be pushed to the last row.

Sasa.Linq.Enumerables.Zip

Sasa.Linq.Enumerables.Zip is a set of extension method overloads on IEnumerable<T> that combines streams together element-wise, returning a stream of Sasa's tuples. This is known as "zipping" streams, analogously to the operation of a physical zipper where the teeth pair up one after another:

IEnumerable<int> source1 = new[] { 2, 99 };
IEnumerable<int> source2 = new[] { 8, -10 };
IEnumerable<Pair<int, int>> zipped = source1.Zip(source2);

Console.WriteLine(zipped.Format("\r\n"));
// output:
// (2, 8)
// (99, -10)

There is a zip overload for each of Sasa's tuple types, so you can zip up to 4 streams together.

Sasa.Linq.Enumerables.ZipWith

Sasa.Linq.Enumerables.ZipWith is a set of extension method overloads on IEnumerable<T> that combines the elements of streams together element-wise using a client-specified function, returning a stream of values. This is known as "zipping" streams, analogously to the operation of a physical zipper where the teeth pair up one after another:

IEnumerable<int> source1 = new[] { 2, 99 };
IEnumerable<int> source2 = new[] { 8, -10 };
IEnumerable<Pair<int, int>> zipped =
    source1.ZipWith(source2, (x,y) => Tuples.Create(x,y));

Console.WriteLine(zipped.Format("\r\n"));
// output:
// (2, 8)
// (99, -10)

Like the Enumerables.Zip extension, ZipWith is defined for zipping up to 4 streams.

Thursday, April 25, 2013

Sasa.IO.Streams - Convenient Stream Extensions

This is the sixteenth post in my ongoing series covering the abstractions in Sasa. Previous posts:

Streams are a pervasive component of .NET I/O, but the standard stream interface is missing a few convenient extensions, if only needed for testing and debugging purposes. Sasa.IO.Streams provides a few simple extension methods to simplify working with streams.

Sasa.IO.Streams.CopyTo

Sasa.IO.Streams.CopyTo is a set of extension methods to copy data from one stream to another:

var source = new MemoryStream(Encoding.ASCII.GetBytes("Hello world!"));
var target = new MemoryStream();
source.CopyTo(target);
Console.WriteLine(Encoding.ASCII.GetString(target.ToArray()));
// output:
// Hello world!

There is also an overload explicitly specifying the number of bytes to copy.

Sasa.IO.Streams.ToArray

Sasa.IO.Streams.ToArray dumps the contents of any stream into a byte[]:

// write contents to file
using (var fs = File.OpenWrite("foo.txt"))
{
    fs.Write(Encoding.ASCII.GetBytes("Hello world!"));
}
// read contents from file in one go
byte[] data;
using (var fs = File.OpenRead("foo.txt"))
{
    data = fs.ToArray();
}
Console.WriteLine(Encoding.ASCII.GetString(data));
// output:
// Hello world!

Wednesday, April 24, 2013

Sasa.IO.FilePath - Easy and Safe Path Manipulations

This is the fifteenth post in my ongoing series covering the abstractions in Sasa. Previous posts:

One persistent difficulty in dealing with IO in .NET is path handling. .NET exposes a platform's directory separator characters, but really this sort of thing should be automated. Furthermore, paths are considered simple strings and so concatenating fragments could leave you with a path string that goes up and down directories with no clear final result, ie. resolving the final path string is left to the OS.

This means that you can't easily reason about the constructed paths without consulting the OS, which is a relatively expensive operation. Furthermore, Path.Combine has a number of corner cases that make constructing paths non-compositional, requiring numerous argument validations to ensure a correct result.

Enter Sasa.IO.FilePath. It's a simple struct that encapsulates an underlying path string, so FilePath operations are just as efficient as your current path handling. FilePath fully resolves directory change operations where possible, so the final path string designates the path the OS will actually look up. Null or empty strings are considered references to the current directory, ie. ".". There is also a "jail" operation which ensures that a provided path cannot escape a particularly sub-directory, just like the OS-level chroot jail.

I first posted about this abstraction back in 2009, but the name and implementation has changed a little since then.

Sasa.IO.FilePath Constructor

The constructor takes an arbitrary path string, resolves all the inner directory change operations, and returns a final path:

var foo = "foo/../..";
var path1 = new FilePath(foo);
var path2 = new FilePath("bar/" + foo);
Console.WriteLine(path1);
Console.WriteLine(path2);
Console.WriteLine("root" / path2); // FilePath composition using /
// output:
// ..
// .
// root

Sasa.IO.FilePath.Combine

The Combine method exposes the same semantics as System.IO.Path.Combine, but on FilePath instances:

var foo = new FilePath("foo/../..");
var bar = new FilePath("/bar");
Console.WriteLine(FilePath.Combine(foo, bar));
Console.WriteLine(FilePath.Combine(bar, foo));
// output:
// ..\bar
// .

Sasa.IO.FilePath.IsParentOf

The IsParentOf method compares two paths to see if one is a subset of the other:

var foobar = new FilePath("foo/bar");
var bar = new FilePath("bar");
var foo = new FilePath("foo");
Console.WriteLine(bar.IsParentOf(foobar));
Console.WriteLine(foo.IsParentOf(foobar));
Console.WriteLine(foobar.IsParentOf(foo));
// output:
// false
// true
// false

Sasa.IO.FilePath.Jail

The Jail methods provide a chroot-jail operation on file paths, to ensure a provided path string cannot escape a certain sub-directory:

var escape = new FilePath("../..");
var bar = new FilePath("/bar");
Console.WriteLine(FilePath.Jail(bar, escape));
// output:
// bar

Sasa.IO.FilePath.Resolve

The real workhorse behind FilePath, Sasa.IO.FilePath.Resolve resolves all path change operations in a string and returns a simplified path string. In case you want to stick with raw string paths, you can use this method to perform path simplification:

var path1 = "foo/../..";
var path2 = "bar/" + path1;
Console.WriteLine(FilePath.Resolve(path1));
Console.WriteLine(FilePath.Resolve(path2));
Console.WriteLine(FilePath.Resolve("root/" + path2));
// output:
// ..
// 
// root

Sasa.IO.FilePath./ Operator

The division operator on file paths is a convenient shorthand for composing paths, and a shorthand for FilePath.Combine:

var foo = "foo/../..";
var path1 = new FilePath(foo);
var path2 = "bar" / path1;
Console.WriteLine(path1);
Console.WriteLine(path2);
Console.WriteLine("root" / path2);
// output:
// ..
// .
// root

Sasa.IO.FilePath's Interfaces

  • IEnumerable<string>: the sequence of path components making up a full path.
  • IEquatable<FilePath>: compare two paths for equality
  • IComparable<FilePath>: order two paths alphanumerically

Tuesday, April 23, 2013

Sasa.Operators<T> - Generic Arithmetic and Logical Operators

This is the fourteenth post in my ongoing series covering the abstractions in Sasa. Previous posts:

When writing generic code, it's pretty common to yearn for some generic arithmetic interface so you don't have to copy-paste the same function over and over to provide overloads for all the CLR's numeric types. Omitting a standard arithmetic interface was one of .NET's biggest mistakes in my opinion.

Enter Sasa.Operators<T>, which is a static class library that exposes the delegates for the standard mathematical and logical operators on any given type T. T must contain operator overloads for the corresponding operation to be available. If T does not contain operator overloads for a given operation, the corresponding delegate for that operation will be null, so you just need to check for their presence as a precondition.

Sasa.Operators<T>.Add

The standard addition operation, Sasa.Operators<T>.Add, is straightforward, taking two T arguments and returning their addition:

var x = Operators<int>.Add(1, 2);
Console.WriteLine(x);
// output:
// 3

Sasa.Operators<T>.And

The Sasa.Operators<T>.And operation takes two T arguments, and returns a T argument which is the bitwise/logical 'and' of the two arguments:

var x = Operators<int>.And(1, 3);
Console.WriteLine(x);
// output:
// 1

Sasa.Operators<T>.Decrement

The Sasa.Operators<T>.Decrement operation takes a T argument, and returns a "decremented" T argument:

var x = Operators<double>.Decrement(1.5);
Console.WriteLine(x);
// output:
// 0.5

Sasa.Operators<T>.Divide

The Sasa.Operators<T>.Divide operation two T arguments and returns a T argument corresponding to their division:

var x = Operators<int>.Divide(6, 2);
Console.WriteLine(x);
// output:
// 3

Sasa.Operators<T>.Increment

The Sasa.Operators<T>.Increment operation a T argument and returns an "incremented" T:

var x = Operators<int>.Increment(98);
Console.WriteLine(x);
// output:
// 99

Sasa.Operators<T>.LeftShift

The Sasa.Operators<T>.LeftShift operation a T argument, followed by an Int32 argument, and returns a T "left-shifted" by the Int32:

var x = Operators<int>.LeftShift(1, 3);
Console.WriteLine(x);
// output:
// 8

Sasa.Operators<T>.Modulo

The Sasa.Operators<T>.Modulo operation takes two T arguments, and returns the modulus/remainder of dividing the two arguments:

var x = Operators<int>.Modulo(9, 2);
Console.WriteLine(x);
// output:
// 1

Sasa.Operators<T>.Multiply

The Sasa.Operators<T>.Multiply operation takes two T arguments, and returns their multiplication:

var x = Operators<int>.Multiply(9, 2);
Console.WriteLine(x);
// output:
// 18

Sasa.Operators<T>.Negate

The Sasa.Operators<T>.Negate operation takes a T argument, and returns its negation:

var x = Operators<int>.Negate(9);
Console.WriteLine(x);
// output:
// -9

Sasa.Operators<T>.Not

The Sasa.Operators<T>.Not operation takes a T argument, and returns logical/bitwise not of the value:

// given twos-complement, not(-1) == 0
var x = Operators<int>.Not(-1); 
Console.WriteLine(x);
// output:
// 0

Sasa.Operators<T>.Or

The Sasa.Operators<T>.Or operation takes two T arguments, and returns their logical/bitwise-or:

var x = Operators<int>.Or(1, 2);
Console.WriteLine(x);
// output:
// 3

Sasa.Operators<T>.RightShift

The Sasa.Operators<T>.RightShift operation a T argument, followed by an Int32 argument, and returns a T "right-shifted" by the Int32:

var x = Operators<int>.RightShift(8, 3);
Console.WriteLine(x);
// output:
// 1

Sasa.Operators<T>.Subtract

The Sasa.Operators<T>.Subtract operation takes two T arguments, and returns their subtraction:

var x = Operators<int>.Subtract(1, 3);
Console.WriteLine(x);
// output:
// -2

Sasa.Operators<T>.Xor

The Sasa.Operators<T>.Xor operation takes two T arguments, and returns their logical/bitwise exclusive-or:

var x = Operators<int>.Xor(1, 3);
Console.WriteLine(x);
// output:
// 2

Most of the functions in Sasa that are defined on generic numeric types utilize this interface to avoid code duplication. For instance, Sasa.Numbers covered in a previous post, defines the UpTo extension method like so:

public static IEnumerable<T> UpTo<T>(this T start, T end, T step)
    where T : IComparable<T>
{
    if (start.CompareTo(end) > 0)
      throw new ArgumentOutOfRangeException("start", "Must be less than or equal to parameter 'end'.");
    if (step.CompareTo(default(T)) <= 0)
      throw new ArgumentOutOfRangeException("step", "Must be greater than 0.");
    if (Operators<T>.Add == null)
      throw new ArgumentException("T must provide an addition operator.");
    for (var i = start;
         i.CompareTo(end) < 0;
         i = Operators<T>.Add(i, step))
    {
        yield return i;
    }
}

Saturday, April 20, 2013

Fast Deep Copies in C# using Sasa.Dynamics

I've briefly discussed Sasa.Dynamics in my first announcement for the upcoming Sasa v0.9.4 release. There was some confusion about what Sasa.Dynamics really is though, since the post didn't go into much detail, or wasn't explained clearly enough. In short, Sasa.Dynamics is a framework for type-safe, blazingly fast reflection.

It's useful for writing all sorts of algorithms that compute results based on the structure of types. For instance, as mentioned in the above article, Sasa.Dynamics ships with two such algorithms by default: an immutable type test, and a deep copier. Both of these algorithms are fully generic and apply to any type, and they're faster than anything you can do with standard reflection. Here's how you perform a deep copy on an object:

var copy = Type<T>.Copy(originalValue);

If you don't know anything about the type you're copying, you can just supply T = object. The more specific the static type information you provide to the copier, the faster it will be. As a small proof, here are a few benchmarks I just added to Sasa's test suite:

= Type: Int32[] =
Binary formatter:            626608 ticks
Type<object>.Copy:           152474 ticks
Type<Int32[]>.Copy:          128127 ticks

= Type: List<Int32> =
Binary formatter:           1209692 ticks
Type<object>.Copy:           767455 ticks
Type<List`1>.Copy:           739828 ticks

= Type: Bar =
Binary formatter:           1066745 ticks
Type<object>.Copy:           549021 ticks
Type<Bar>.Copy:              500552 ticks

= List<Lazy<Int32>> =
Binary formatter:          22065369 ticks
Type<object>.Copy:         21339976 ticks
Type<List`1>.Copy:         21090640 ticks

= Type: Int32 =
Binary formatter:            428022 ticks
Type<object>.Copy:             7050 ticks
Type<Int32>.Copy:              1435 ticks

= Type: String =
Binary formatter:            222462 ticks
Type<object>.Copy:             4895 ticks
Type<String>.Copy:             1547 ticks

= Type: String[] =
Binary formatter:          25201559 ticks
Type<object>.Copy:           234653 ticks
Type<String[]>.Copy:         185214 ticks

You can clearly see that Type<T>.Copy is sometimes many orders of magnitude faster than a roundtrip through BinaryFormatter. The benchmarks for immutable types will be particularly fast, as you can see from entries for String and Int32. This is because immutability of a type is automatically detected (via Type<T>.MaybeMutable), and such values are never copied, just returned as-is.

Of course, it's not exactly hard to beat framework serialization, but what's notable here is that I'm doing it with fully general reflection and very little optimization effort. The more type information you can provide to Type<T>, the faster it is. For instance, there's a 3x-5x difference in Int32 and String copying when we provide the statically known type T rather than object.

I've also barely scratched the surface of the optimizations that can be done. The structural induction inherent to Sasa.Dynamics means I can also easily build a trace of the operations performed for a given type T, generate a function to do it all in one step, and cache that somewhere for when I need to rerun it. Basically, it would be a small tracing JIT for any reflection-based operation.

There are still a few cases I haven't fully optimized for deep copies, but it seems to work pretty well so far.

Sasa's binary serializer is also based on Sasa.Dynamics, but that hasn't received any review or optimization effort, so while it passes many correctness tests, the performance isn't all that great. Still, if you're interested to see what a serializer based on structural induction looks like, it's a good reference.