For outstanding
technology
results...

Reverse Engineering a Binary Serialized File

Recently I had a project to build a completely new version of a product and it was desirable that the new version be able to read the current/old versions files. There were two major issues with this, firstly the files when opened in a plain text viewer were clearly in a binary format and secondly I had no access to the original code base. I did have knowledge that the original application was written in .net, that the the files could simply be copied from one location to the next and remain usable and they were not a ‘database’ and I had a copy of the original executable and dll binaries. These are all important facts because reverse engineering some random binary file format is difficult to say the least, the more you know about how and why it was created the more chance you have. But if you’ve know it was produced by .Net and you have the assemblies that produced it then you have another huge advantage, Reflector. If it wasn’t for reflector it wouldn’t have been feasible to reverse engineer these binary files.

The first step was to run the executable and get it set up to work on my machine, then I produced a couple of sample files.

Next I used Reflector to take a look at the original code. While Reflector is relatively simple it can disassemble to many languages (C#, VB.Net, F#, Delphi and others) and it can Optimise for various version of the .Net Framework, so there ends up being a fair sized matrix of options. I’m more comfortable with VB and I was fairly sure the app had been written to target .Net 2.0 so I went with those options. The code base was relatively small without being trivial but it didn’t take too long to find a couple of areas where IO was done and it became clear that the standard binary serializer was being used. From here I exported the code from Reflector and opened it in Studio 2010 and tried to compile. I say tried because compilation failed with some reference errors that were easily remedied but then it failed again because one of the classes had two methods with the same name. Well according to case insensitive VB they had the same name. In VB FooBar(), fooBar() and foobar() are all the same thing, where as in C# they would be 3 completely separate methods. So I ran through the export steps in Reflector again using the C# option, opened in Studio and managed to successfully build the solution.

From here I was able to identify the top level object that was being serialized, but it did appear to have a few dependencies, and those dependencies had dependencies and so on. So I added a new Class Diagram to the project, added ALL the classes, found the top level class and then arranged the diagram to show the top level object and all its dependencies. This was a little time consuming, it took 30-45 minutes, but as I was arranging I was also familiarising myself with the class names as well as the property and field names. In the end I think it was time well spent and at the end of it was able to identify all the classes that would have been serialized as a result of the top level object being serialized.

I didn’t want to be using decompiled code as the basis for my new code base for a few reasons, not least of which was that it had a whole lot of UI code in there that I had no intention of ever executing. So I created a completely new solution and created a new C# project called OldDom (or something similar). Then I found the top level class that was being serialized in the original project and copied its contents to the new project, then I found all the dependencies of that class and copied them over, lather, rinse, repeat. It was a little tedious, but I was able to do multiple classes at a time because I’d done the diagraming in the original project and could tell what dependencies where what with a quick glance. If the original DOM have of been larger it might have been easier to remove UI code from the original until I was just left with just DOM, but the way I did it ensured that I only ended up with the classes I needed.

The next step was to write a deserialize method in my new solution and make an attempt to deserialize one of the original files. Binary Serialization/Deserialization in .Net is pretty simple and has been pretty simple since .Net 1.0, its pretty much built in and it works, but there are a few gotchas. The first one is that Binary Serialization will serialize ALL the properties and Fields of a class, not just the public ones. So its not enough to simply maintain the same public interface for an object, you’ve got to maintain the same internal interface also. The second gotcha is that binary serialization embeds assembly and version information into the file format as well as type information. So if your assembly has changed version numbers and/or your types have changed namespaces since a data file was produced then plain vanilla deserialization of said file won’t work. In my case I had very deliberately moved the types (classes) to a completely different assembly and again, very deliberately, changed the namespace where the types resided. However, as is so often the case, it turns out I’m not a beautiful unique snowflake and that people had come across this kind of thing before. The BinaryFormatter implements the IFormatter Interface which has a Binder property of SerializationBinder. The SerializationBinder is an abstract base class that can be inherited and from there you can override the BindToType method. This method is passed the Assembly name and the Type name that the serializer has extracted from the binary file and in here you can use this data and return the actual type you would like to be created instead of the original.

Vanilla Deserialization:

 

   1: public static OldDOM.DeserializeTopLevelObject DeserializeTopLevelObject(string fileName)
   2: {
   3:   OldDOM.DeserializeTopLevelObject result = null;
   4:   System.IO.Stream serializationStream = System.IO.File.OpenRead(fileName);
   5:   if (serializationStream != null)
   6:   {
   7:     System.Runtime.Serialization.IFormatter formatter = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
   8:     result = (OldDOM.DeserializeTopLevelObject)formatter.Deserialize(serializationStream);
   9:     serializationStream.Close();
  10:   }
  11:   return result;
  12: }

Serialization with SerializationBinder

   1: public static class SerializationHelper
   2:  {
   3:    public static OldDOM.TopLevelObject DeserializeTopLevelObject(string fileName)
   4:    {
   5:      OldDOM.TopLevelObject result = null;
   6:      System.IO.Stream serializationStream = System.IO.File.OpenRead(fileName);
   7:      if (serializationStream != null)
   8:      {
   9:        System.Runtime.Serialization.IFormatter formatter = new System.Runtime.Serialization.Formatters.Binary.BinaryFormatter();
  10:        formatter.Binder = new Version1ToVersion2DeserializationBinder();
  11:        result = (OldDOM.TopLevelObject)formatter.Deserialize(serializationStream);
  12:        serializationStream.Close();
  13:        result.upgradeDates();
  14:      }
  15:      return result;
  16:    }
  17:  }
  18:  
  19:  sealed class Version1ToVersion2DeserializationBinder : System.Runtime.Serialization.SerializationBinder
  20:  {
  21:    public override Type BindToType(string assemblyName, string typeName)
  22:    {
  23:      //this is informed by a sample at http://msdn.microsoft.com/en-us/library/system.runtime.serialization.serializationbinder.aspx#Y544
  24:  
  25:      //its going to be asking for something like: "theOriginalExe, Version=2.1.6.9, Culture=neutral, PublicKeyToken=null";
  26:      //which is the old original assembly, we've replaced that assembly with this assembly
  27:      //so we tell the deserialiser to use this assembly instead
  28:      assemblyName = System.Reflection.Assembly.GetExecutingAssembly().FullName;
  29:      //we also moved the namespace from "OriginalNameSpace" to "OldDOM"
  30:      typeName = typeName.Replace("OriginalNameSpace.", "OldDOM.");
  31:      Type result = Type.GetType(String.Format("{0}, {1}", typeName, assemblyName));
  32:      System.Diagnostics.Debug.Assert((result != null), "We're trying to Deserialise an unsupported object, we should try to add support for this object.");
  33:      return result;
  34:    }
  35:  }
  36:  

It’s that simple.

From here I was able to then convert from my OldDom objects into my new Dom and I was away.

 

Add comment


Security code
Refresh

Make Contact

We provide a free workshop for
  • Your ideas and requirements
  • The process we use
  • Understanding of the costs

  • Call us on +61 3 8352 6222

    Redgum Technologies Pty Ltd on LinkedIn

    Companies
    we have
    worked with: