Generating unique and persisted numbers for the .Net types

In this step I talked about "anonymizing" the .Net modules created by a Roslyn C# compilation. What I forgot to mention was that an important byproduct of that processing was that I generate a unique and persisted number for each module or assembly. This will be used if / when I finally get around to implementing some Roslyn-assisted serialization / deserialization support - that will require that binary serialization of a C# field or property in an object has a unique identity for the field / property and the class that it's in and the assembly that it's in. And this unique identity must be persisted somewhere so it remains the same every time the program is compiled, and is available to other programs that receive the serialized objects and need to deserialize them.

In this step I modify Roslyn to generate and persist a unique type number for each public class or struct in a C# program. These type numbers are persisted in a LiteDB database containing "Yacks metadata" for all of the C# projects that are a part of the overall system being developed and that will be working together.

In this modification of Roslyn I'm generating and using these type numbers for the (questionable?) purpose of "anonymizing" (obfuscating) the output modules, so they do not contain the real type names. Instead they contain constructed type names. For example, a fully-qualified C# class name like Merlinia.CommonClasses.ByteArrayAndIndex becomes Merlinia.T0633$0002.

src\Compilers\Core\Portable\YacksCompilation.cs

This is a source file added to the CodeAnalysis project in Roslyn. It has gone through several iterations - in some of the previous steps in this series of articles it was called StaticFields.cs and then Yacks-CompilationData.cs.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using Merlinia.Yacks; using Merlinia.Yacks.ProjectMetadata; using LiteDB; using System; using System.Collections.Concurrent; using System.Collections.Generic; using System.Diagnostics; using System.Globalization; namespace Microsoft.CodeAnalysis { /// <summary> /// This class contains some data and methods associated with the Yacks modifications to the /// Roslyn compiler. This processing is only relevant for C# compilations, but because it is used /// in the CodeGenerator and MetadataWriter classes it is located in the CodeAnalysis project /// instead of the CSharpCodeAnalysis project. Similarly, there is a reference to this object in /// the Compilation class instead of the CSharpCompilation class. /// /// Unlike the Compilation and CSharpCompilation classes this object is very mutable. /// /// The fact that the Compilation and CSharpCompilation classes are immutable and now contain a /// reference to this object implies that that reference must be copied from one instance to the /// next when a new instance of CSharpCompilation is created based on the previous instance. /// /// Current usage of the YacksProjectDirectory database is based on the assumption that LiteDB is /// "process-safe", so multiple C# compilations can be run in parallel without problem. See here: /// https://github.com/mbdavid/LiteDB/wiki/Concurrency /// /// In order to ensure that the YacksProjectDirectory database does not end up locked following a /// Roslyn compilation this class is marked IDisposable. /// </summary> internal class YacksCompilation : IDisposable { // Name of the directory where anonymized output files are written internal const string CAnonymized = "Anonymized"; // Various usage of word "Merlinia" in emitted data internal const string CMerlinia = "Merlinia"; internal const string CMerliniaModule = CMerlinia + " module"; internal const string CYacks0002 = "Yacks0002"; internal const string CMerliniaYacks0002 = CMerlinia + "." + "Yacks0002"; // Length of numeric parts of module names such as Merlinia1585 and type names such as // Type1535$9745 internal const int CModuleNumberLength = 4; internal const int CTypeNumberLength = 4; // Constant strings associated with processing of [YacksSerialization()] attributes internal static readonly string CYacksSerializationAttribute = typeof(YacksSerializationAttribute).Name; public static readonly string CYacksSerialization = CYacksSerializationAttribute.Remove( CYacksSerializationAttribute.Length - "Attribute".Length); // References to an Array and a Dictionary specifying the constant strings which should be // "disguised" (if the two references are not null). public string[] StringyNumberToString { get; internal set; } // Index is 1-based, not 0-based public Dictionary<string, int> StringToStringyNumber { get; internal set; } // Dictionary to convert a "stringy number" to the "fake token number" needed by the metadata // writer. public readonly ConcurrentDictionary<int, int> StringyNumberToFakeTokenNumber = new ConcurrentDictionary<int, int>(); // Yacks module number for this .Net assembly. This remains -1 if the YacksProjectDirectory // "fake property" has not been specified. public int ModuleNumber { get; set; } = -1; // Switch indicating that the YacksAnonymizeModule "fake property" was used public bool AnonymizeModule { get; set; } = false; // Switch that gets set when emitting the anonymized output file public bool EmittingAnonymized { get; set; } = false; // Reference to the LiteDatabase object for the YacksProjectDirectory.db, needs to be disposed private LiteDatabase _liteDatabase = null; // Reference to the LiteDB "collection" for the YacksProjectInfo objects private LiteCollection<YacksProjectInfo> _yacksProjects = null; // YacksProjectInfo for this assembly / project, if database is open YacksProjectInfo _projectInfo = null; // The non-persisted type numbers start at 9999 and work their way downwards private int _nonPersistedTypeNumber = 10000; // Part of the IDisposable pattern private volatile bool _isDisposed = false; /// <summary> /// Method called during C# compilation initialization to open the YacksProjectDirectory /// database (or create if doesn't exist) and to read the YacksProjectInfo object for this /// assembly if it exists in the database. /// /// NB. Note that the LiteDatabase object needs to be disposed when the compilation is done to /// avoid possible problems with the database remaining locked. /// </summary> /// <param name="yacksProjectDirectory">full path for the LiteDB YacksProjectDirectory.db file</param> /// <param name="assemblyName">assembly name (equal project name) for the project being compiled</param> /// <param name="projectBaseDirectory">full path for directory containing the .csproj file</param> public void OpenYacksProjectDirectory(string yacksProjectDirectory, string assemblyName, string projectBaseDirectory) { // Open or create the YacksProjectDirectory.db database _liteDatabase = new LiteDatabase(yacksProjectDirectory); // Get (or create) the LiteDB "collection" of YacksProjectInfo objects, and index current // and future documents using ProjectName and ModuleNumber properties _yacksProjects = _liteDatabase.GetCollection<YacksProjectInfo>(YacksProjectInfo.CCollectionName); _yacksProjects.EnsureIndex((YacksProjectInfo x) => x.ProjectName); _yacksProjects.EnsureIndex((YacksProjectInfo x) => x.ModuleNumber); // Get YacksProjectInfo for this assembly, if it exists in database, or create a new one _projectInfo = ReadProjectInfo(assemblyName); if (_projectInfo == null) _projectInfo = CreateNewProjectInfo(projectBaseDirectory, assemblyName); ModuleNumber = _projectInfo.ModuleNumber; } /// <summary> /// Method to read a YacksProjectInfo object for an assembly, if it exists in the database. /// </summary> internal YacksProjectInfo ReadProjectInfo(string projectName) { return _yacksProjects?.FindOne(Query.EQ(nameof(YacksProjectInfo.ProjectName), projectName)); } /// <summary> /// Method to generate the assembly and module names used when the YacksAnonymizeModule option /// is on. /// </summary> public string GetModuleName() { return GetModuleName(ModuleNumber); } /// <summary> /// Method to generate assembly or module name used when the YacksAnonymizeModule option is /// on. /// </summary> public string GetModuleName(int moduleNumber) { return CMerlinia + moduleNumber.ToString("D" + CModuleNumberLength, CultureInfo.InvariantCulture); } /// <summary> /// Method to create a new YacksProjectInfo object with a random module number that does not /// conflict with any of the existing YacksProjectInfo objects in the YacksProjectDirectory /// database. /// </summary> private YacksProjectInfo CreateNewProjectInfo(string projectBaseDirectory, string assemblyName) { // First need to generate a new random module number between 100 and 9999, checking that // the number is not already in use for another module Random pseudoRandom = new Random(); int moduleNumber; while (true) { moduleNumber = pseudoRandom.Next(100, 9999); if (!_yacksProjects.Exists(Query.EQ(nameof(YacksProjectInfo.ModuleNumber), moduleNumber))) break; } // Create new YacksProjectInfo object and insert it into the database YacksProjectInfo projectInfo = new YacksProjectInfo(assemblyName, projectBaseDirectory, moduleNumber); _yacksProjects.Insert(projectInfo); // Id field generated automatically return projectInfo; } /// <summary> /// Method to get a persisted type number for a type defined in the current project. /// </summary> /// <param name="fullyQualifiedTypeName">name starts with "Merlinia" and is maybe "mangled"</param> /// <returns>type number, 1 - 9999, or -1 if something wrong</returns> internal int GetTypeNumber(string fullyQualifiedTypeName) { // Check the YacksProjectDirectory database has been opened and that the YacksProjectInfo // object for the current project has been read or created if (_projectInfo == null) return -1; // Check the List<> of YacksTypeInfo objects exists, create it if not YacksTypeInfo typeInfo; if (_projectInfo.TypeInfoList == null) _projectInfo.TypeInfoList = new List<YacksTypeInfo>(); else { // Find the requested YacksTypeInfo object if its on the list typeInfo = _projectInfo.TypeInfoList.Find((YacksTypeInfo x) => x.TypeName == fullyQualifiedTypeName); if (typeInfo != null) return typeInfo.TypeNumber; } // Update the LastTypeNumber field in the YacksProjectInfo object for this project and // create a new YacksTypeInfo object and add it to the list _projectInfo.LastTypeNumber += 1; typeInfo = new YacksTypeInfo(fullyQualifiedTypeName, _projectInfo.LastTypeNumber); _projectInfo.TypeInfoList.Add(typeInfo); Debug.Assert(_projectInfo.LastTypeNumber == _projectInfo.TypeInfoList.Count); _yacksProjects.Update(_projectInfo); // Check persisted type number hasn't reached the non-persisted type numbers return CheckTypeNumbersHaveNotCollided(typeInfo.TypeNumber); } /// <summary> /// Method to get a type number that should not be persisted, i.e., for a non-public type. /// </summary> /// <returns>type number, 9999 - 1, or -1 if something wrong</returns> internal int GetNonPersistedTypeNumber() { // Only valid if Yacks metadata database is OK if (_projectInfo == null) return -1; // Count down, 9999, 9998, ... _nonPersistedTypeNumber -= 1; // Check non-persisted type number hasn't reached the persisted type numbers return CheckTypeNumbersHaveNotCollided(_nonPersistedTypeNumber); } /// <summary> /// Method to check that the persisted type numbers and the non-persisted type numbers have /// not reached each other - which should be totally impossible, would require a project with /// 10000 classes and structs. /// </summary> private int CheckTypeNumbersHaveNotCollided(int typeNumberToReturn) { return _projectInfo.LastTypeNumber < _nonPersistedTypeNumber ? typeNumberToReturn : -1; } #region IDisposable stuff // This copied from here: http://msdn.microsoft.com/en-us/library/system.idisposable.aspx /// <summary> /// Method to implement IDisposable. Do not make this method virtual - a derived class should /// not be able to override this method. /// </summary> public void Dispose() { // Call the following method Dispose(true); // This object will be cleaned up by the Dispose() method below. Therefore, we call // GC.SuppressFinalize to take this object off the finalization queue and prevent // finalization code for this object from executing a second time. GC.SuppressFinalize(this); } /// <summary> /// Dispose(bool disposing) executes in two distinct scenarios. If isDisposing equals true, /// the method has been called directly or indirectly by a user's code. Managed and unmanaged /// resources can be disposed. If isDisposing equals false, the method has been called by the /// runtime from inside the finalizer and you should not reference other objects - only /// unmanaged resources can be disposed. /// </summary> protected virtual void Dispose(bool isDisposing) { // Check to see if Dispose() has already been called if (!_isDisposed) { // If isDisposing equals true, dispose all managed and unmanaged resources if (isDisposing) { try { // Call the Dispose() method for the LiteDB LiteDatabase object if (_liteDatabase != null) { _liteDatabase.Dispose(); _liteDatabase = null; } } catch (Exception e) { Console.WriteLine("Exception in Dispose() method processing: " + e.Message); } } // Note disposing has been done _isDisposed = true; } } #endregion IDisposable stuff } }

src\Compilers\Core\Portable\YacksMetadata-ProjectInfo.cs

This file and the next one are also C# source files added to Roslyn. They define some "Yacks metadata" that is persisted in a LiteDB database.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using System.Collections.Generic; namespace Merlinia.Yacks.ProjectMetadata { /// <summary> /// This class represents a deserialized YacksProjectInfo "document" from the LiteDB /// YacksProjectDirectory.db database. The document collection name is "YacksProjects", and the /// documents are indexed by both ProjectName and ModuleNumber fields. /// /// This is only relevant for C# compilations, but because it is used in the MetadataWriter /// class this definition is in the CodeAnalysis project instead of the CSharpCodeAnalysis /// project. /// </summary> internal class YacksProjectInfo { // Name of the LiteDB "collection" containing these documents / objects public const string CCollectionName = "YacksProjects"; // Needed by LiteDB public int Id { get; set; } // Project name. For .Net projects this is the assembly name, for example // "Merlinia.CommonClasses.MArrays". Should not contain blanks? public string ProjectName { get; set; } // Directory that the .csproj file is in. (There must be one, and only one, .csproj file in // the directory.) // This path is relative to the root folder of the repository check-out, and has trailing "\". // (The path can be made non-relative due to the fact that the YacksProjectDirectory.db file // is known to be in the root folder of the repository check-out.) public string BaseDirectory { get; set; } // Module number is a unique random four-digit number in the range 0100 - 9999 for each // "module". (Module number 0000 is invalid, and module numbers 0001 - 0099 are reserved for // special purposes.) // Module numbers are used for several purposes such as generating the identities of // serialized objects and as part of the diagnostic logging system. // For a .Net program a "module" is a .Net assembly, which must contain one (and only one) // .Net module. See here for more about .Net assembly vs. module: // https://stackoverflow.com/questions/9271805/net-module-vs-assembly public int ModuleNumber { get; set; } // Last (persisted) type number that has been assigned to a type in this module, i.e., next // type number should be this number plus one. This is only relevant for persisted type // numbers, which is normally only needed for public classes and structs when the // YacksAnonymizeModule option is on or if a [YacksSerialization()] attribute that specifies // "B" (binary serialization) has been used. // This is somewhat redundant since the number of YacksTypeInfo objects on TypeInfoList should // give the same result, at least for the current implementation. public int LastTypeNumber { get; set; } = 0; // List of (some of) the types in the project. Types which do not need a persisted type number // are normally not included in this list. A value of null implies an empty list. So far, the // implementation implies that YacksTypeInfo objects are exactly ordered by their TypeNumber // field, so the indexing for TypeInfoList is a valid way to access a YacksTypeInfo object // with a known TypeNumber. public List<YacksTypeInfo> TypeInfoList { get; set; } = null; // Constructor needed by LiteDB / Bson public YacksProjectInfo() {} // Normal constructor public YacksProjectInfo(string projectName, string baseDirectory, int moduleNumber) { ProjectName = projectName; BaseDirectory = baseDirectory; ModuleNumber = moduleNumber; } } }

src\Compilers\Core\Portable\YacksMetadata-TypeInfo.cs

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) namespace Merlinia.Yacks.ProjectMetadata { /// <summary> /// Class containing information about a .Net type, i.e. a C# class or struct. The /// YacksProjectInfo object (normally) contains a List{} of these objects. /// /// This is only relevant for C# compilations, but because it is used in the MetadataWriter /// class this definition is in the CodeAnalysis project instead of the CSharpCodeAnalysis /// project. /// </summary> internal class YacksTypeInfo { // Type name. For .Net projects this is the fully qualified "mangled / decorated" name, i.e., // ending with "`1", "`2", etc. for a generic type name. public string TypeName { get; set; } // Type number is a unique four-digit number in the range 0001 - 9999 for each type number // that needs to be persisted. (Type numbers that don't need to be persisted are generated // downwards from 9999, and may be different for each new compilation.) // Type numbers are used for several purposes such as obfuscating type names in the output // module if the YacksAnonymizeModule option is on, and to identify a type in binary // serialized data. public int TypeNumber { get; set; } // Constructor needed by LiteDB / Bson public YacksTypeInfo() {} // Normal constructor public YacksTypeInfo(string typeName, int typeNumber) { TypeName = typeName; TypeNumber = typeNumber; } } }

src\Compilers\Core\Portable\PEWriter\MetadataWriter.cs

This source file is part of the CodeAnalysis project in Roslyn. In the Visual Studio Solution Explorer it can be found under CodeAnalysis - PEWriter.

Compared with the previous steps another method has been modified. This was at line 2747 for the revision of Roslyn that I was working with.

private void PopulateTypeDefTableRows() { var typeDefs = this.GetTypeDefs(); metadata.SetCapacity(TableIndex.TypeDef, typeDefs.Count); foreach (INamedTypeDefinition typeDef in typeDefs) { //Yacks07: Anonymize the namespace and type names if necessary TypeAttributes typeAttributes = GetTypeAttributes(typeDef); StringHandle namespaceHandle; StringHandle typeNameHandle; AnonymizeTypeName(typeDef, typeAttributes, out namespaceHandle, out typeNameHandle); //INamespaceTypeDefinition namespaceType = typeDef.AsNamespaceTypeDefinition(Context); //string mangledTypeName = GetMangledName(typeDef); ITypeReference baseType = typeDef.GetBaseClass(Context); metadata.AddTypeDefinition( //attributes: GetTypeAttributes(typeDef), attributes: typeAttributes, //@namespace: (namespaceType != null) ? GetStringHandleForNamespaceAndCheckLength(namespaceType, mangledTypeName) : default(StringHandle), @namespace: namespaceHandle, //name: GetStringHandleForNameAndCheckLength(mangledTypeName, typeDef), name: typeNameHandle, baseType: (baseType != null) ? GetTypeHandle(baseType) : default(EntityHandle), fieldList: GetFirstFieldDefinitionHandle(typeDef), methodList: GetFirstMethodDefinitionHandle(typeDef)); } }

src\Compilers\Core\Portable\PEWriter\MetadataWriter.Yacks.cs

This is a source file which has been added to the CodeAnalysis project. Two additional methods have been added.

/// <summary> /// Method to either "anonymize" a type name (C# class or struct name) if applicable and /// possible, or else to do standard processing to emit the type definition. (Much of the code /// in this method is copied from original code in the /// MetadataWriter.PopulateTypeDefTableRows() method.) /// </summary> private void AnonymizeTypeName(INamedTypeDefinition typeDef, TypeAttributes typeAttributes, out StringHandle namespaceHandle, out StringHandle typeNameHandle) { // Test if applicable to "anonymize" the type name INamespaceTypeDefinition namespaceType = typeDef.AsNamespaceTypeDefinition(Context); string mangledTypeName = GetMangledName(typeDef); if (EmittingAnonymized(namespaceType)) { // Get the persisted Yacks metadata type number for this type if possible, or for non- // public types get a non-persisted type number TypeAttributes twoBits = typeAttributes & TypeAttributes.NestedPrivate; bool isPublic = twoBits != 0 && twoBits != TypeAttributes.NestedPrivate; YacksCompilation yacksCompilation = module.CommonCompilation._YacksCompilation; int typeNumber = isPublic ? yacksCompilation.GetTypeNumber( namespaceType.NamespaceName + "." + mangledTypeName) : yacksCompilation.GetNonPersistedTypeNumber(); if (typeNumber != -1) { // The namespace name and type name get "anonymized". All namespace names are reduced // to just "Merlinia" and type names become "Tnnnn$mmmm", where nnnn is the project // number and mmmm is the type number within the project. namespaceHandle = metadata.GetOrAddString(YacksCompilation.CMerlinia); int moduleNumber = module.CommonCompilation._YacksCompilation.ModuleNumber; string typeName = "T" + moduleNumber.ToString("D" + YacksCompilation.CModuleNumberLength, CultureInfo.InvariantCulture) + "$" + typeNumber.ToString("D" + YacksCompilation.CTypeNumberLength, CultureInfo.InvariantCulture); typeNameHandle = metadata.GetOrAddString(typeName); return; } } // Type name does not get anonymized - do standard processing namespaceHandle = (namespaceType != null) ? GetStringHandleForNamespaceAndCheckLength(namespaceType, mangledTypeName) : default(StringHandle); typeNameHandle = GetStringHandleForNameAndCheckLength(mangledTypeName, typeDef); } /// <summary> /// Method to test if Yacks modifications are in effect and if an "anonymized" module is /// currently being built and if the namespace name starts with "Merlinia" or "Test" (but not /// "Merlinia.Yacks0002"). /// </summary> private bool EmittingAnonymized(INamespaceTypeDefinition namespaceType) { if (!EmittingAnonymized() || namespaceType == null) return false; string namespaceName = namespaceType.NamespaceName; if (!(namespaceName.StartsWith(YacksCompilation.CMerlinia, StringComparison.Ordinal) || namespaceName.StartsWith("Test", StringComparison.Ordinal))) return false; if (namespaceName.StartsWith(YacksCompilation.CMerliniaYacks0002, StringComparison.Ordinal)) return false; return true; }

The results so far ...

The following screen shot (with highlighting added) shows how JetBrains dotPeek displays the modules produced by my modified Roslyn compiler when I compile my MArrays library assembly. And note that these two modules are produced by a single compilation, as explained here.

ModRos 7 Snap1

Now if you're of the opinion that this isn't really very useful, then I do understand your point of view. Unfortunately, I have to get through some more anonymizing / obfuscating steps before I'm able to turn my attention to more interesting goals, like adding serialization / deserialization support and support for improved diagnostic logging.

You must login to post a comment.
Loading comment... The comment will be refreshed after 00:00.

Be the first to comment.