Skip to content

Mono.Cecil vs Obfuscation: FIGHT

July 18, 2010

A few days ago, I wrote (I will never say “blogged”) about my encouter with an obfuscated .NET assembly. Well, last night, I decided the have a bit of fun with it. I’ve identified what obfuscator was being used (with a view to trying it on some test assemblies), but I got frustrated when I encountered one of those “register here to try an evaluation” forms (or, “give us your email so our salespeople can stalk you”), so I’m working entirely off that single test case.

Now, .NET comes with a powerful code introspection tool, System.Reflection. Reflection is great for binding variables, enumerating properties and doing all that kind of semi-seedy stuff you tend to want. It is not, however, great at editing code. Infact, it is so un-great as to not let you do it at all. Reflection works only on loaded assemblies and is really a metadata-manipulation library. If you want to get to the IL code itself, or go around renaming things, you’re going to need a more powerful tool. Enter Mono.Cecil.

Cecil is able to load up assemblies, edit/add/remove types, methods, attributes, you name it. It’s then able to save the resultant file out with no issues. It’s an absolutely amazing library, and being entirely written in C#, it works when hosted in Mono, or just as a normal .NET reference assembly. It’s one of those incredible libraries that not only is almost ridiculously, insultingly, straightforward to use, but it is enormously powerful. So, I fired up Visual Studio 2010 (which I have an interesting relationship with at best – I think maybe Microsoft hired some Apple people to help them make VS2010), compiled the Mono libraries from the latest source dump – thank christ we’ve moved on a bit, they actually had a project file and a .sln, I didn’t even need to funroll any loops. I then made a new C# console app, added Cecil and Cecil.Rocks (apparently the Cecil people suffer from nerd humour), and managed to write a deobfuscator in about an hour and a half.

It’s not perfect, but it translated a good 90% of the functions without any issues, and the ones that Reflector still complains about look like perfectly valid CIL to me. I suppose that Reflector’s decompiler still has some flaws – and I wouldn’t be at all surprised if the obfuscation process got rid of common patterns that it looks for. Since I don’t want to scare off too many non-programmer blog readers with this, I will go into the gory details beyond this “more” button…

protected override void OnKeyDown(KeyEventArgs e)
{
    if (base.Width < 15)
    {
        e.Handled = true;
    }
    base.OnKeyDown(e);
}

Recognize this? That's the function we were looking at the last time I messed around with this assembly. It looks so simple now, doesn't it? Since being deobfuscated, the DLL is now 30% smaller, and this method has gone from 29 to 11 CIL instructions. They're all like that. I have no doubt that obfuscated assemblies perform worse than their less retarded friends, even after the JIT-er has gone and peephole optimized it to bits. So, what did I do to produce such delicious results? First, I did the easy thing - renaming all the private api from blob characters to non-descriptive but at least readable symbols, like "PrivateMethod2" and "m_Boolean16". To do this, simply load up the assembly and iterate over the types:

static void Main(string[] args)
{
    Mono.Cecil.ModuleDefinition md = Mono.Cecil.ModuleDefinition.ReadModule("Target.dll");
    foreach (Mono.Cecil.TypeDefinition t in md.Types)
    {
        DeobfuscateTypeNames(t);
    }
    foreach (Mono.Cecil.TypeDefinition t in md.Types)
    {
        DeobfuscateType(t);
    }
    md.Write("Result.dll");
    return 0;
}

There are a few tiny pitfalls with this - first, the code above doesn't pick up subclasses. I have to call iterate over the collection of NestedTypes in each type to fix them up. It also picks up some kind of magical "module" type (called "<Module>"), which I suppose represents the assembly itself. I haven't really looked at it but it might contain assembly information attributes or the strong name. As I said, this whole process is ridiculously easy. Renaming types, methods and fields was a case of just looking for single character names outside the A-Z/a-z range. For types, I just named them "Private" + class/struct/enum + a number, for subtypes I gave them a slightly different name, and for fields (once all the types had been renamed - hence the two stage process), I named them "m_" followed by the typename. Renaming is simply a matter of setting the Name property on a TypeDefinition, MethodDefinition, PropertyDefinition, or FieldDefinition. Everything is hooked up via .NET metadata tokens, so you don't need to fix up any references.

My first foray into the actual method information was to hook up private fields to public properties - if you have a property of this form:

public virtual string Name
{
     get
     {
         return this.m_String5;
     }
}

It's fairly safe to assume that m_String5 should really be called m_Name. I didn't want to get too fancy with this, just fancy enough to pick up the simple case of returning a member field, but the code could obviously be made to deal with returning a field or a default, or other more complicated getter constructs:

    if (!IsObfuscatedName(property.Name) && property.GetMethod != null && property.GetMethod.HasBody)
    {
        Mono.Cecil.FieldDefinition possibleField = null;
        for (int i = 0; i < property.GetMethod.Body.Instructions.Count - 1; i++)
        {
            if (property.GetMethod.Body.Instructions[i].OpCode.Code == Code.Ldfld &&
                property.GetMethod.Body.Instructions[i + 1].OpCode.Code == Code.Ret)
            {
                Mono.Cecil.FieldDefinition potential = property.GetMethod.Body.Instructions[i].Operand as Mono.Cecil.FieldDefinition;
                if (property.DeclaringType.Fields.Contains(potential))
                {
                    possibleField = potential;
                    break;
                }
            }
        }
        if (possibleField != null)
        {
            if (IsObfuscatedName(possibleField.Name))
            {
                possibleField.Name = "m_" + property.Name;
            }
        }
    }

That code only sets field names if the property name was already sensible, and checks to make sure the field is owned by the property's type first (the IL parsing code is so simple there that it might pick up getting fields on other types). Like I said, it's almost insultingly simple. With the names made less retarded, I moved on to fixing the function flow controls. I identified two issues in the obfuscated assembly - a bit of dead code that produces an impossible if clause, and the injection of wild goose chases in the form of annoying switches.

So, let's take the first one as an example. What we are looking for is of the following form:

    L_0001: ldc.i4.1           // Loads an integer constant: 0
    L_0002: br_s L_0005        // Jumps to conditional
    L_0003: ldc.i4.0           // Loads an integer constant: 1
    L_0004: br_s L_0005        // Jumps to conditional
    L_0005: brfalse_s L_0006   // Check item on stack against 0, jump if true
    L_0006: br_s L_0007        // Jump to next statement
    L_0007:                    // Code continues

It's a simple matter of iterating through the instructions in all the methods, and searching for that pattern. We can then easily remove the instructions - one thing to bear in mind is that you will have to fix up any branches that point to the removed instructions, and we also have to make sure that any try/catch blocks or error handlers get updated as well. Here's a simple version of the code which performs this operation:

    private static void RemoveIllegalConstruct(Mono.Cecil.MethodDefinition method)
    {
        if (!method.HasBody)
            return;
        for (int i = 0; i < method.Body.Instructions.Count - 5; i++)
        {
            if (method.Body.Instructions[i].OpCode.Code == Code.Ldc_I4_1 &&
                method.Body.Instructions[i + 1].OpCode.FlowControl == FlowControl.Branch &&
                method.Body.Instructions[i + 1].Operand == method.Body.Instructions[i + 4] &&
                method.Body.Instructions[i + 2].OpCode.Code == Code.Ldc_I4_0 &&
                method.Body.Instructions[i + 3].OpCode.FlowControl == FlowControl.Branch &&
                method.Body.Instructions[i + 3].Operand == method.Body.Instructions[i + 4] &&
                method.Body.Instructions[i + 4].OpCode.Code == Code.Brfalse_S)
            {
                UpdateInstructionReferences(method, method.Body.Instructions[i], method.Body.Instructions[i + 5]);
                for (int j = 0; j < 5; j++)
                    method.Body.Instructions.RemoveAt(i);
                i--;
            }
        }
    }

    private static void UpdateInstructionReferences(Mono.Cecil.MethodDefinition method, Instruction oldTarget, Instruction newTarget)
    {
        for (int j = 0; j < method.Body.Instructions.Count; j++)
        {
            if ((method.Body.Instructions[j].OpCode.FlowControl == FlowControl.Branch ||
                method.Body.Instructions[j].OpCode.FlowControl == FlowControl.Cond_Branch) &&
                method.Body.Instructions[j].Operand == oldTarget)
                method.Body.Instructions[j].Operand = newTarget;
        }
        foreach (ExceptionHandler v in method.Body.ExceptionHandlers)
        {
            if (v.FilterEnd == oldTarget)
                v.FilterEnd = newTarget;
            if (v.FilterStart == oldTarget)
                v.FilterStart = newTarget;
            if (v.HandlerEnd == oldTarget)
                v.HandlerEnd = newTarget;
            if (v.HandlerStart == oldTarget)
                v.HandlerStart = newTarget;
            if (v.TryEnd == oldTarget)
                v.TryEnd = newTarget;
            if (v.TryStart == oldTarget)
                v.TryStart = newTarget;
        }
    }

Like I said, it's obnoxiously easy. Don't be scared by the Instruction class in Cecil - there are a lot of properties on it, but they're really just complex getters that interpret the underlying data in more useful ways, such as Operand, which is the ref class instance of whatever it's pointing to - a method, or a field, another instruction or even a type, or FlowControl, which tells you what kind of flow control operation an instruction is. Once again, everything is linked up as discrete objects - so aside from branches or filters that reference instructions that we delete, we don't need to do any fix-up at all. It's amazing.

Lastly, I will mention the final thing - something that you can't really get around. The obfuscated assembly is digitally signed with a strong key - something we can't do with our haxed up version. We can register that assembly as not needing verification on a particular computer, but we have to lose the signing information. It does mean that obfuscators that use the key for encrypting/decrypting strings, as an example, will suddenly fail to work with our modified DLL, but, if all you're after is the decompiled code, you'll be fine.

I would post the source code for the de-obfuscator, but it's probably grounds for locking me up under DMCA. Anyway, I've covered all the basics that should allow anyone to write something like this for themselves. I leave you with a puzzle; why is this code impossible to decompile? Here's a hint: I have no idea.

.method public hidebysig specialname instance void set_Comment(string 'value') cil managed
{
    .maxstack 2
    .locals init (
        [0] int32 num)
    L_0000: br.s L_0002
    L_0002: ldarg.0
    L_0003: ldarg.1
    L_0004: brfalse.s L_0025
    L_0006: ldarg.1
    L_0007: br.s L_0016
    L_0009: ldarg.0
    L_000a: ldfld class Internal.Entity Internal::m_Parent
    L_000f: callvirt instance void [System.Windows.Forms]System.Windows.Forms.Control::Invalidate()
    L_0014: br.s L_002c
    L_0016: stfld string Internal::m_Comment
    L_001b: ldarg.0
    L_001c: ldfld class Internal.Entity Internal::m_Parent
    L_0021: brfalse.s L_002c
    L_0023: br.s L_0009
    L_0025: ldstr ""
    L_002a: br.s L_0016
    L_002c: ret
}

If you want to give it a go, check out the ECMA page describing the CIL instruction set, and drop me a comment when you solve it!

From → C#, Programming

4 Comments
  1. John BURDEN permalink

    Cecil is great and very advanced. Take a look @ Reflexil which is a nice Reflector plugin (and internally use Cecil to perform assembly transformations).

    https://sourceforge.net/projects/reflexil/

    John.

    • Reflexil is also great. I’ve used it more times than I’d probably like to admit, generally to fix bugs in other peoples’ code. You’d be amazed how bad some of the developer tools we get given are. My main issues with Reflexil are that the replace-all-with-code function doesn’t let you use private methods or classes (I really think they need to set up proxy objects and hook it up after compiling), and the instruction editor panel is a bit difficult to use, especially if you want to do anything semi-complex. I did however use it in my initial attempt at deobfuscating the code, by deleting the illegal operation.

  2. Niiiice. Very cool to see Cecil used to de-obfuscate assemblies. Modern obfuscators made an art to write non sensical assemblies (metadata wise) to annoy the hell out of me.

    • Yeah. I probably wouldn’t have spent nearly as long looking at the assembly if it wasn’t obfuscated.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 27 other followers

%d bloggers like this: