Wednesday, August 29, 2012

Managed Data for .NET

Ensō is an interesting new language being developed by Alex Loh, William R. Cook, and Tijs van der Storm. The overarching goal is to significantly raise the level of abstraction, partly via declarative data models.

They recently published a paper on this subject for Onwards! 2012 titled Managed Data: Modular Strategies for Data Abstraction. Instead of programmers defining concrete classes, managed data requires the programmer to define a schema describing his data model, consisting of a description of the set of fields and field types. Actual implementations of this schema are provided by "data managers", which interpret the schema and add custom behaviour. This is conceptually similar to aspect-oriented programming, but with a safer, more principled foundation.

A data manager can implement any sort of field-like behaviour. The paper describes a few basic variants:

  • BasicRecord: implements a simple record with getters and setters.
  • LockableRecord: implements locking on a record, rendering it immutable.
  • InitRecord: implements field initialization on records.
  • ObserverRecord: implements the observer pattern, notifying listeners of any field changes.
  • DataflowRecord: registers field dependencies and recalculates dependent fields on fields that change.

Managed Data for .NET

The core idea of managed data requires two basic concepts, a declarative means of describing the schema, and a means of interpreting that schema to add behaviour. .NET interfaces are a means to specify simple declarative schemas completely divorced from implementations. The following interface can be seen as the IFoo schema containing an immutable integer field and a mutable string field:

// the schema for a data object
public interface IFoo
{
  int Bar { get; }
  string Fooz { get; set; }
}

Data managers then generate concrete instances of IFoo with the desired behaviour. To fit this into a typed framework, I had to reorganize the concepts a little from what appears in the paper:

// creates data instances with custom behaviour
public sealed class DataManager
{
  // create an instance of interface type T
  public T Create<T>();
}

I have a single DataManager type which analyzes the interface T and generates an instance with all the same properties as found in T. The DataManager constructor accepts an instance of ISchemaCompiler, which is where the actual magic happens:

public interface ISchemaCompiler
{
  // next compiler in the chain
  ISchemaCompiler Next { get; set; }
  // a new type is being defined
  void Type(TypeBuilder type);
  // a new property is being defined
  void Property(TypeBuilder type, PropertyBuilder property);
  // a new setter is being defined
  void Setter(PropertyBuilder prop, MethodBuilder setter,
              ILGenerator il);
  // a new getter is being defined
  void Getter(PropertyBuilder prop, MethodBuilder getter,
              ILGenerator il);
}

So DataManager creates a dynamic type implementing an interface, and it calls into the ISchemaCompiler chain while it's generating the various properties. The schema compilers can then output IL to customize the behaviour of the various property getters and setters.

You'll note however that the IFoo schema has an immutable property "Bar". We can specify an initializer for this property using the Schema object that the DataManager uses:

var schema = new Schema();
schema.Type<IFoo>()
      .Default(x => x.Bar, x => 4);

This declares that the Bar property maps to a constant value of 4. It need not be a constant of course, since the initializer is an arbitrary delegate.

The following schema compilers are implemented and tested:

  • BasicRecord: implements the backing fields for the properties.
  • LockableRecord: unlike the paper's lockable record, this version actually calls Monitor.Enter and Monitor.Exit for use in concurrent scenarios.
  • NotifyChangedRecord: implements INotifyPropertyChanged on all properties
  • ChangesOnlyRecord: only assigns the field if the value differs.

Developing programs with managed data consists of only defining interfaces describing your business model and allowing the DataManager to provide the instances. This is obviously also excellent for mocking and unit testing purposes, so it's a win all around.

Here's a simple test program that demonstrates the use of managed data via the composition of ChangesOnlyRecord, NotifyChangedRecord and BasicRecord:

var schema = new Schema();
schema.Type<IFoo>()
      .Default(x => x.Bar, x => 4);
// construct the data manager by composing schema compilers
var record = new BasicRecord();
var dm = new DataManager(schema, new ChangesOnlyRecord
{
    Record = record,
    Next = new NotifyChangedRecord { Next = record }
});
// create instance of IFoo
var y = dm.Create<IFoo>();
var inotify = y as INotifyPropertyChanged;
var bar = y.Bar;
var fooz = y.Fooz;
int count = 0;
Assert(bar == 4);
Assert(fooz == null);
// register notification Fooz changes
inotify.PropertyChanged += (o, e) =>
{
    if (e.PropertyName == "Fooz")
    {
        fooz = y.Fooz;
        count++;
    }
};
// trigger change notification
y.Fooz = "Hello World!";
Assert(fooz == "Hello World!");
Assert(count == 1);
// no change notification since value unchanged
y.Fooz = "Hello World!";
Assert(count == 1);
// trigger second change notification
y.Fooz = "empty";
Assert(fooz == "empty");
Assert(count == 2);

Closing Thoughts

You can download the current implementation here, but note that it's still an alpha preview. I'll probably eventually integrate this with my Sasa framework under Sasa.Data, together with a few more elaborate data managers. For instance, a data manager that uses an SQL server as a backend. Say goodbye to NHibernate mapping files and LINQ attributes, and just let the data manager create and manage your tables!

4 comments:

Muigai Mwaura said...

How about adding support for IDataErrorInfo in generated properties e.g. you could configure a property like so


schema.Type<Product>()
.Validate(x => x.Price > 0, "Price must be greater than 0")


and have the builder generate the validation code?

Sandro Magi said...

Good suggestion. I was thinking of some way to add contracts to each property. Something like what you have:

schema.Type<Product>()
.Requires(x => x.Price > 0, "Price must be greater than 0.");

schema.Type<Product>()
.Invariant(x => x.ItemNo != null, "Product must have an item#.");

So preconditions, postconditions and invariants. I'm not sure exactly how I'm going to do it, but it seems like a necessary extension.

Will Cook said...

Part of the beauty of the Ruby implementation of Managed Data is that everything is interpreted. This allows interpreters to be wrapped using inheritance to create new aspects. I'm curious if you are using lots of code generation, and if so whether you find it easy to extend/wrap the code generators with new functionality.

Sandro Magi said...

I could have gone the interpreted route for the backing store as well, but I explicitly chose code gen for performance reasons. Only the ISchemaCompiler interface would need to change to accommodate interpreters. You'd basically just have to change the ILGenerator parameter to an IDictionary on objects.

As for complexity, it isn't much assuming you're familiar with code gen on the CLR. Most of the code gen happens in the shared DataManager class. For example, here's the code gen required to add INotifyPropertChanged behaviour to a setter:

Next.Setter(prop, setter, il);
// raise property changed event by calling into Events.Raise
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldfld, propertyChanged);
il.Emit(OpCodes.Ldarg_0);
il.Emit(OpCodes.Ldstr, prop.Name);
il.Emit(OpCodes.Newobj, ctorEventArgs);
il.Emit(OpCodes.Call, raise);


The getter requires a comparable amount of code. This was also the more complicated of the schema compilers. The rest of the code is just standard boilerplate to deal with the CLR's reflection and code gen abstractions.