Skip to main content

Posts

Showing posts from May, 2010

CLR: Verification for Runtime Code Generation

The CLR's lightweight code generation via DynamicMethod is pretty useful, but it's sometimes difficult to debug the generated code and ensure that it verifies. In order to verify generated code, you must save the dynamic assembly to disk and run the peverify.exe tool on it, but DynamicMethod does not have any means to do so. In order to save the assembly, there's a more laborious process of creating dynamic assemblies, modules and types, and then finally adding a method to said type. This is further complicated by the fact that a MethodBuilder and DynamicMethod don't share any common interfaces or base types for generating IL, despite both of them supporting a GetILGenerator() method . This difficulty in switching between saved codegen and pure runtime codegen led me to add a CodeGen class to Sasa, which can generate code for either case based on a bool parameter. Since no common interface is available for code generation, it also accepts a delegate to which it dispa

Peirce's Criterion: command-line tool

I've written a simple command-line tool filter out statistical outliers using the rigourous Peirce's Criterion . The algorithm has been available in Sasa for awhile, and will be in the forthcoming v0.9.3 release. I've also packaged the command-line tool binary for running Peirce's Criterion over multi-column CSV files (LGPL source available here ).

The Cost of Type.GetType()

Most framework-style software spends an appreciable amount of time dynamically loading code. Some of this code is executed quite frequently. I've recently been working on a web framework where URLs map to type names and methods, so I've been digging into these sort of patterns a great deal lately. The canonical means to map a type name to a System.Type instance is via System.Type.GetType(string). In a framework which performs a significant number of these lookups, it's not clear what sort of performance characteristics one can expect from this static framework function. Here's the source for a simple test pitting Type.GetType() against a cache backed by a Dictionary<string, Type>. All tests were run on a Core 2 Duo 2.2 GHz, .NET CLR 3.5, and all numbers indicate the elapsed CPU ticks. Type.GetType() Dictionary<string, Type> 6236070640 51351056 6236193856 51440360 6237466224 51463192 6238210488 51583336 6240645816 51599480 6242089400 51687448 6244450392 5171

LINQ Transpose Extension Method

I had recent need for a transpose operation which could swap the columns and rows of a nested IEnumerable sequence, it's simple enough to express in LINQ but after a quick search, all the solutions posted online are rather ugly. Here's a concise and elegant version expressed using LINQ query syntax: /// <summary> /// Swaps the rows and columns of a nested sequence. /// </summary> /// <typeparam name="T">The type of elements in the sequence.</typeparam> /// <param name="source">The source sequence.</param> /// <returns>A sequence whose rows and columns are swapped.</returns> public static IEnumerable<IEnumerable<T>> Transpose<T>( this IEnumerable<IEnumerable<T>> source) { return from row in source from col in row.Select( (x, i) => new KeyValuePair<int, T>(i, x)) group col.Value by col.Key into c select c as IEnume

Asymmetries in CIL

Most abstractions of interest have a natural dual, for instance, IEnumerable and IObservable, induction and co-induction, algebra and co-algebra, objects and algebraic data types, message-passing and pattern matching, etc. Programs are more concise and simpler when using the proper abstraction, be that some abstraction X or its dual. For instance, reactive programs written using the pull-based processing semantics of IEnumerable are far more unwieldy than those using the natural push-based semantics of IObsevable. As a further example, symbolic problems are more naturally expressed via pattern matching than via message-passing. This implies that any system is most useful when we ensure that every abstraction provided is accompanied by its dual. This also applies to virtual machine instruction sets like CIL, as I recently discovered while refining my safe reflection abstraction for the CLR. A CIL instruction stream embeds some necessary metadata required for the CLR's correct opera

CIL Verification and Safety

I've lamented here and elsewhere some unfortunate inconveniences and asymmetries in the CLR -- for example, we have nullable structs but lack non-nullable reference types, an issue I address in my Sasa class library . I've recently completed some Sasa abstractions for safe reflection, and an IL rewriter based on Mono.Cecil which allows C# source code to specify type constraints that are supported by the CLR but unnecessarily restricted in C#. In the process, I came across another unjustified decision regarding verification: the jmp instruction . The jmp instruction strikes me as potentially incredibly useful for alternative dispatch techniques, and yet I recently discovered that it's classified as unverifiable. This seems very odd, since the instruction is fully statically typed, and I can't think of a way its use could corrupt the VM. In short, the instruction performs a control transfer to a named method with a signature matching exactly the current method's si