Generating unique and persisted numbers for the fields in C# classes and structs

In several previous steps I described how I've modified the Roslyn compiler to generate unique numbers for the C# projects (modules / assemblies) and for the .Net types in these programs. Now the turn has come to the fields defined in the C# classes and structs in the programs.

An important aspect of this is that for the public fields these unique field numbers are persisted in a LiteDB database, so they remain the same for every compilation. This is necessary to ensure that other programs that reference these public fields via their field numbers will still work unchanged after rebuilding selected modules.

Once again, I'm demonstrating the implementation of these unique persisted numbers via their usage in generating anonymized (obfuscated) names in the output module. But the long-range intention is that these numbers will be used for more productive purposes, for example expediting serialization / deserialization of objects, and maybe improved support for diagnostic logging.

(In some ways one can consider my goals in modifying Roslyn to be somewhat similar to what one can do with an aspect-oriented programming tool like PostSharp.)

In the rest of this article I'll just list most (but not all) of the code involved in generating and persisting these field numbers, and in using them to generate anonymized field names in the output module.

Note: The code shown below is somewhat obsolete. A newer version is available for download - see this article.

src\Compilers\Core\Portable\YacksCompilation.cs

See previous articles for previous (and now obsolete) listings of this source file. Here are some of the changes made in this iteration.

Around line 53:

// Length of numeric parts of module names such as "Merlinia1585" and type names such as // "T535$0045" or "t535$0045" and field names such as "F0003" or "f0003". These must be kept // coordinated with the following max values. internal const int CModuleNumberLength = 4; internal const int CTypeNumberLength = 4; internal const int CFieldNumberLength = 4; // Max values for the numeric parts of type names such as "T535$0045" or "t535$0045" and field // names such as "F0003" or "f0003". These must be kept coordinated with the above values for // number of digits. private const int CTypeNumberMax = 9999; private const int CFieldNumberMax = 9999;

Around line 103:

// The non-persisted type numbers and field numbers for this project start at 0001 and work // their way up, limit being 9999 private int _nonPersistedTypeNumber = 0; private int _nonPersistedFieldNumber = 0;

Around line 210:

/// <summary> /// Method to get a persisted type number for a type defined in the current project. /// </summary> /// <param name="fullyQualifiedTypeName">name starts with "Merlinia" and is maybe "mangled"</param> /// <param name="createSwitch">true = create YacksTypeInfo for type if necessary, /// false = just return YacksTypeInfo if it already exists</param> /// <returns>type number, 1 - 9999, or -1 if something wrong</returns> internal int GetTypeNumber(string fullyQualifiedTypeName, bool createSwitch) { // Check the YacksProjectDirectory database has been opened and that the YacksProjectInfo // object for the current project has been read or created if (_projectInfo == null) return -1; // Check the List<> of YacksTypeInfo objects exists, create it if not YacksTypeInfo typeInfo; if (_projectInfo.TypeInfoList == null) _projectInfo.TypeInfoList = new List<YacksTypeInfo>(); else { // Find the YacksTypeInfo object, return it if found typeInfo = _projectInfo.GetTypeInfo(fullyQualifiedTypeName); if (typeInfo != null) return typeInfo.TypeNumber; } // Quit now if createSwitch not on if (!createSwitch) return -1; // Update the LastTypeNumber field in the YacksProjectInfo object for this project and // create a new YacksTypeInfo object and add it to the list _projectInfo.LastTypeNumber += 1; typeInfo = new YacksTypeInfo(fullyQualifiedTypeName, _projectInfo.LastTypeNumber); _projectInfo.TypeInfoList.Add(typeInfo); Debug.Assert(_projectInfo.LastTypeNumber == _projectInfo.TypeInfoList.Count); _yacksProjects.Update(_projectInfo); // Check type number hasn't reached max value return CheckTypeNumberHasNotReachedMax(typeInfo.TypeNumber); } /// <summary> /// Method to get a type number that should not be persisted, i.e., for a non-public type. /// </summary> /// <returns>type number, 1 - 9999, or -1 if something wrong</returns> internal int GetNonPersistedTypeNumber() { // Only valid if Yacks metadata database is OK if (_projectInfo == null) return -1; // Increment the non-persisted type number and check it hasn't reached max value _nonPersistedTypeNumber += 1; return CheckTypeNumberHasNotReachedMax(_nonPersistedTypeNumber); } /// <summary> /// Method to get a YacksTypeInfo object for a type defined in some project other than the /// current project. The module number for the project is also returned. /// </summary> /// <param name="projectName">name of project as used when YacksProjectInfo was added to metadata, /// for example "Merlinia.CommonClasses.MArrays"</param> /// <param name="fullyQualifiedTypeName">name starts with "Merlinia" and is maybe "mangled"</param> /// <returns>moduleNumber: module number or zero if something wrong, /// typeInfo: YacksTypeInfo object, or null if something wrong</returns> internal (int moduleNumber, YacksTypeInfo typeInfo) GetTypeInfo(string projectName, string fullyQualifiedTypeName) { // Read the YacksProjectInfo object for the assembly if it exists in the // YacksProjectDirectory database (if it is open) YacksProjectInfo projectInfo = ReadProjectInfo(projectName); if (projectInfo != null) { // Get the YacksTypeInfo for the specified type, return it if OK YacksTypeInfo typeInfo = projectInfo.GetTypeInfo(fullyQualifiedTypeName); if (typeInfo != null) return (projectInfo.ModuleNumber, typeInfo); } // Error return return (0, null); } /// <summary> /// Method to get a persisted field number for a field defined in a type in the current /// project. /// </summary> /// <param name="typeNumber">type number for the containing type</param> /// <param name="fieldName">name of the field</param> /// <returns>field number, 1 - 9999, or -1 if something wrong</returns> internal int GetFieldNumber(int typeNumber, string fieldName) { YacksTypeInfo typeInfo = _projectInfo.GetTypeInfo(typeNumber); // Check the List<> of YacksFieldInfo objects exists, create it if not YacksFieldInfo fieldInfo; if (typeInfo.FieldInfoList == null) typeInfo.FieldInfoList = new List<YacksFieldInfo>(); else { fieldInfo = typeInfo.GetFieldInfo(fieldName); if (fieldInfo != null) return fieldInfo.FieldNumber; } // Update the LastFieldNumber field in the YacksTypeInfo object for the containing type and // create a new YacksFieldInfo object and add it to the list typeInfo.LastFieldNumber += 1; fieldInfo = new YacksFieldInfo(fieldName, typeInfo.LastFieldNumber); typeInfo.FieldInfoList.Add(fieldInfo); Debug.Assert(typeInfo.LastFieldNumber == typeInfo.FieldInfoList.Count); _yacksProjects.Update(_projectInfo); // Check field number hasn't reached max value return CheckFieldNumberHasNotReachedMax(fieldInfo.FieldNumber); } /// <summary> /// Method to get a field number that should not be persisted, i.e., for a non-public field, /// including a field in a non-public type. /// </summary> /// <returns>type number, 1 - 9999, or -1 if something wrong</returns> internal int GetNonPersistedFieldNumber() { // Only valid if Yacks metadata database is OK if (_projectInfo == null) return -1; // Increment the non-persisted field number and check it hasn't reached max value _nonPersistedFieldNumber += 1; return CheckFieldNumberHasNotReachedMax(_nonPersistedFieldNumber); } /// <summary> /// Method to get a persisted field number for a field defined in some project other than the /// current project. /// </summary> /// <param name="typeInfo">YacksTypeInfo object for the containing type</param> /// <param name="fieldName">name of the field</param> /// <returns>field number, 1 - 9999, or -1 if something wrong</returns> internal int GetFieldNumber(YacksTypeInfo typeInfo, string fieldName) { // Get the YacksFieldInfo for the specified field, return field number if OK YacksFieldInfo fieldInfo = typeInfo.GetFieldInfo(fieldName); return fieldInfo?.FieldNumber ?? -1; } /// <summary> /// Method to read a YacksProjectInfo object for an assembly, if it exists in the database. /// This makes use of a Dictionary{} to keep track of the YacksProjectInfo objects that have /// already been read. /// </summary> private YacksProjectInfo ReadProjectInfo(string projectName) { YacksProjectInfo projectInfo; if (!_projectInfoDictionary.TryGetValue(projectName, out projectInfo)) { projectInfo = _yacksProjects?.FindOne(Query.EQ(nameof(YacksProjectInfo.ProjectName), projectName)); if (projectInfo != null) _projectInfoDictionary.Add(projectName, projectInfo); } return projectInfo; } /// <summary> /// Method to check that the type numbers have not reached their max value - which should be /// totally impossible, would require a project with over 10000 classes and structs. /// </summary> private static int CheckTypeNumberHasNotReachedMax(int typeNumberToReturn) { return typeNumberToReturn <= CTypeNumberMax ? typeNumberToReturn : -1; } /// <summary> /// Method to check that the field numbers have not reached their max value - which should be /// totally impossible, would require a project with 10000 non-public fields altogether. /// </summary> private static int CheckFieldNumberHasNotReachedMax(int fieldNumberToReturn) { return fieldNumberToReturn <= CFieldNumberMax ? fieldNumberToReturn : -1; }

src\Compilers\Core\Portable\YacksMetadata-TypeInfo.cs

This file was listed in a previous step. In this article I'll just list some of the changes since then.

// Type number is a unique four-digit number in the range 0001 - 9999 for each type number // that needs to be persisted. (Type numbers that don't need to be persisted are also // generated in the range 0001 - 9999, and may be different for each new compilation, and // they must be used such that they do not conflict with the persisted type numbers.) // Type numbers are used for several purposes such as obfuscating type names in the output // module if the YacksAnonymizeModule option is on, and to identify a type in binary // serialized data (persisted type numbers only). public int TypeNumber { get; set; } // Last (persisted) field number that has been assigned to a field in this type, i.e., next // field number should be this number plus one. This is only relevant for persisted field // numbers, which is normally only needed for public fields when the YacksAnonymizeModule // option is on or if a [YacksSerialization()] attribute that specifies "B" (binary // serialization) has been used. // This is somewhat redundant since the number of YacksFieldInfo objects on FieldInfoList // should give the same result, at least for the current implementation. public int LastFieldNumber { get; set; } = 0; // List of (some of) the fields in the class or struct. Fields which do not need a persisted // field number are normally not included in this list. A value of null implies an empty // list. So far, the implementation implies that YacksFieldInfo objects are exactly ordered // by their FieldNumber field, so the indexing for FieldInfoList is a valid way to access a // YacksFieldInfo object with a known FieldNumber. public List<YacksFieldInfo> FieldInfoList { get; set; } = null;

/// <summary> /// Method to find a specified YacksFieldInfo object if it's on FieldInfoList list (if it /// exists). /// </summary> /// <param name="fieldName">name of field</param> /// <returns>YacksFieldInfo object or null</returns> public YacksFieldInfo GetFieldInfo(string fieldName) { return FieldInfoList?.Find((YacksFieldInfo x) => x.FieldName == fieldName); }

src\Compilers\Core\Portable\YacksMetadata-FieldInfo.cs

This is a new file, added to persist the information needed to keep track of field numbers.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) namespace Merlinia.Yacks.ProjectMetadata { /// <summary> /// Class containing information about a field in a .Net type, i.e. in a C# class or struct. The /// YacksTypeInfo object (normally) contains a List{} of these objects. /// /// This is only relevant for C# compilations, but because it is used in the MetadataWriter /// class this definition is in the CodeAnalysis project instead of the CSharpCodeAnalysis /// project. /// </summary> internal class YacksFieldInfo { // Field name public string FieldName { get; set; } // Field number is a unique four-digit number in the range 0001 - 9999 for each field number // that needs to be persisted. (Field numbers that don't need to be persisted are also // generated in the range 0001 - 9999, and may be different for each new compilation, and // they must be used such that they do not conflict with the persisted field numbers.) // Field numbers are used for several purposes such as obfuscating field names in the output // module if the YacksAnonymizeModule option is on, and to identify a field in binary // serialized data (persisted field numbers only). public int FieldNumber { get; set; } // Constructor needed by LiteDB / BSON public YacksFieldInfo() {} // Normal constructor public YacksFieldInfo(string fieldName, int fieldNumber) { FieldName = fieldName; FieldNumber = fieldNumber; } } }

src\Compilers\Core\Portable\PEWriter\MetadataWriter.cs

This source file is part of the CodeAnalysis project in Roslyn. In the Visual Studio Solution Explorer it can be found under CodeAnalysis - PEWriter.

Compared with the previous steps two more methods have been modified.

private void PopulateFieldTableRows() { var fieldDefs = this.GetFieldDefs(); metadata.SetCapacity(TableIndex.Field, fieldDefs.Count); foreach (IFieldDefinition fieldDef in fieldDefs) { if (fieldDef.IsContextualNamedEntity) { ((IContextualNamedEntity)fieldDef).AssociateWithMetadataWriter(this); } //Yacks09: Anonymize the field name if necessary FieldAttributes fieldAttributes = GetFieldAttributes(fieldDef); StringHandle fieldNameHandle = AnonymizeFieldName(fieldDef, fieldAttributes); metadata.AddFieldDefinition( //attributes: GetFieldAttributes(fieldDef), attributes: fieldAttributes, //name: GetStringHandleForNameAndCheckLength(fieldDef.Name, fieldDef), name: fieldNameHandle, signature: GetFieldSignatureIndex(fieldDef)); } }

private void PopulateMemberRefTableRows() { var memberRefs = this.GetMemberRefs(); metadata.SetCapacity(TableIndex.MemberRef, memberRefs.Count); foreach (ITypeMemberReference memberRef in memberRefs) { //Yacks09: May need to resolve anonymized field and method names if emitting // anonymized module StringHandle memberNameHandle = GetMemberReference(memberRef); metadata.AddMemberReference( parent: GetMemberReferenceParent(memberRef), //name: GetStringHandleForNameAndCheckLength(memberRef.Name, memberRef), name: memberNameHandle, signature: GetMemberReferenceSignatureHandle(memberRef)); } }

src\Compilers\Core\Portable\PEWriter\MetadataWriter.Yacks.cs

This is a source file which has been added to the CodeAnalysis project. Several methods previously shown have been modified and several additional methods have been added.

/// <summary> /// Method to either "anonymize" a type name (C# class or struct name) if applicable and /// possible, or else to do standard processing to emit the type definition. (Some of the code /// in this method is copied from original code in the /// MetadataWriter.PopulateTypeDefTableRows() method.) /// </summary> private void AnonymizeTypeName(INamedTypeDefinition typeDef, TypeAttributes typeAttributes, out StringHandle namespaceHandle, out StringHandle typeNameHandle) { // Test if applicable to "anonymize" the type name INamespaceTypeDefinition namespaceType = typeDef.AsNamespaceTypeDefinition(Context); string mangledTypeName = GetMangledName(typeDef); if (namespaceType != null && EmittingAnonymized(namespaceType.NamespaceName)) { // Get the persisted Yacks metadata type number for this type if possible, or for non- // public types get a non-persisted type number TypeAttributes twoBits = typeAttributes & TypeAttributes.NestedPrivate; bool isPublic = twoBits != 0 && twoBits != TypeAttributes.NestedPrivate; YacksCompilation yacksCompilation = module.CommonCompilation._YacksCompilation; int typeNumber = isPublic ? yacksCompilation.GetTypeNumber(FullyQualifiedTypeName( namespaceType.NamespaceName, mangledTypeName), true) : yacksCompilation.GetNonPersistedTypeNumber(); if (typeNumber != -1) { // The namespace name and type name get "anonymized". All namespace names are reduced // to just "Merlinia". See comments on GetHandleForAnonymizedTypeName() re the type // names. namespaceHandle = metadata.GetOrAddString(YacksCompilation.CMerlinia); typeNameHandle = GetHandleForAnonymizedTypeName(yacksCompilation.ModuleNumber, typeNumber, isPublic); return; } } // Type name does not get anonymized - do standard processing namespaceHandle = (namespaceType != null) ? GetStringHandleForNamespaceAndCheckLength(namespaceType, mangledTypeName) : default(StringHandle); typeNameHandle = GetStringHandleForNameAndCheckLength(mangledTypeName, typeDef); } /// <summary> /// Method to process emitting references to type names. Special processing is needed if an /// anonymized module is being emitted, and it is referencing types in another anonymized /// module. (Some of the code in this method is copied from original code in the /// MetadataWriter.PopulateTypeRefTableRows() method.) /// </summary> private void GetTypeNameReference(INamespaceTypeReference namespaceTypeRef, out EntityHandle resolutionScopeHandle, out StringHandle typeNameHandle, out StringHandle namespaceHandle) { // Some common processing ... IUnitReference unitReference = namespaceTypeRef.GetUnit(Context); string mangledTypeName = GetMangledName(namespaceTypeRef); string namespaceName = namespaceTypeRef.NamespaceName; resolutionScopeHandle = this.GetResolutionScopeHandle(unitReference); // Test if applicable to "anonymize" the type name reference if (EmittingAnonymized(namespaceName)) { // Get the persisted type info for this type from Yacks metadata database if possible (int moduleNumber, YacksTypeInfo typeInfo) = module.CommonCompilation._YacksCompilation.GetTypeInfo(unitReference.Name, FullyQualifiedTypeName(namespaceName, mangledTypeName)); if (typeInfo != null) { // The namespace name and type name have been "anonymized", and must be referenced as // such. The namespace name was reduced to just "Merlinia". See comments on // GetHandleForAnonymizedTypeName() re the type name. typeNameHandle = GetHandleForAnonymizedTypeName(moduleNumber, typeInfo.TypeNumber, true); namespaceHandle = metadata.GetOrAddString(YacksCompilation.CMerlinia); return; } } // The type name has not been anonymized - do standard processing typeNameHandle = this.GetStringHandleForNameAndCheckLength(mangledTypeName, namespaceTypeRef); namespaceHandle = this.GetStringHandleForNamespaceAndCheckLength(namespaceTypeRef, mangledTypeName); } /// <summary> /// Method to format an anonymized type name and then to add it to the PE module #Strings /// stream and return the "handle" for the string. An anonymized type name is "Tnnnn$mmmm" or /// "tnnnn$mmmm", where nnnn is the project number and mmmm is the persisted or non-persisted /// type number within the project, respectively. /// </summary> private StringHandle GetHandleForAnonymizedTypeName(int moduleNumber, int typeNumber, bool isPublic) { return metadata.GetOrAddString((isPublic ? "T" : "t") + moduleNumber.ToString("D" + YacksCompilation.CModuleNumberLength, CultureInfo.InvariantCulture) + "$" + typeNumber.ToString("D" + YacksCompilation.CTypeNumberLength, CultureInfo.InvariantCulture)); } /// <summary> /// Method to either "anonymize" a field name in a C# class or struct if applicable and /// possible, or else to do standard processing to emit the field definition. (Some of the /// code in this method is copied from original code in the /// MetadataWriter.PopulateFieldTableRows() method.) /// </summary> private StringHandle AnonymizeFieldName(IFieldDefinition fieldDef, FieldAttributes fieldAttributes) { // Test if applicable to "anonymize" the field name INamedTypeDefinition containingTypeDef = fieldDef.ContainingTypeDefinition as INamedTypeDefinition; string namespaceName = containingTypeDef?.AsNamespaceTypeDefinition(Context)?.NamespaceName; if (namespaceName != null && EmittingAnonymized(namespaceName)) { // Get the type number for the containing type. If this is not available it indicates // the field is defined in a non-public type. YacksCompilation yacksCompilation = module.CommonCompilation._YacksCompilation; int typeNumber = yacksCompilation.GetTypeNumber(FullyQualifiedTypeName(namespaceName, GetMangledName(containingTypeDef)), false); // Get the persisted Yacks metadata field number for this field if possible, or for non- // public fields get a non-persisted field number bool isPublic = typeNumber != -1 && (fieldAttributes & FieldAttributes.Public) == FieldAttributes.Public; int fieldNumber = isPublic ? yacksCompilation.GetFieldNumber(typeNumber, fieldDef.Name) : yacksCompilation.GetNonPersistedFieldNumber(); if (fieldNumber != -1) { // The field name gets "anonymized", becoming "Fnnnn" or "fnnnn" return GetHandleForAnonymizedFieldName(fieldNumber, isPublic); } } // Field name does not get anonymized - do standard processing return GetStringHandleForNameAndCheckLength(fieldDef.Name, fieldDef); } /// <summary> /// Method to process emitting references to member names, i.e. field and method names. /// Special processing is needed if an anonymized module is being emitted, and it is /// referencing fields and methods in another anonymized module. (Some of the code in this /// method is copied from original code in the MetadataWriter.PopulateMemberRefTableRows() /// method.) /// /// This currently only works for field references, not method references. /// </summary> private StringHandle GetMemberReference(ITypeMemberReference memberRef) { // Some common processing ... //ITypeReference containingTypeRef = memberRef.GetContainingType(Context); INamedTypeReference containingTypeRef = memberRef.GetContainingType(Context) as INamedTypeReference; string namespaceName = containingTypeRef?.AsNamespaceTypeReference?.NamespaceName; ISymbol memberSymbol = memberRef as ISymbol; // Test if applicable to "anonymize" the member name reference if (namespaceName != null && memberSymbol != null && EmittingAnonymized(namespaceName)) { // Get the persisted type info for the containing type from Yacks metadata database if // possible YacksCompilation yacksCompilation = module.CommonCompilation._YacksCompilation; (int moduleNumber, YacksTypeInfo typeInfo) = yacksCompilation.GetTypeInfo( memberSymbol.ContainingModule.Name, FullyQualifiedTypeName(namespaceName, GetMangledName(containingTypeRef))); if (typeInfo != null) { // Process fields and methods separately IFieldReference fieldReference = memberRef as IFieldReference; if (fieldReference != null) { Debug.Assert(memberSymbol.Kind == SymbolKind.Field); int fieldNumber = yacksCompilation.GetFieldNumber(typeInfo, memberRef.Name); if (fieldNumber != -1) // Should not be possible, but just in case ... return GetHandleForAnonymizedFieldName(fieldNumber, true); } } } // The member name has not been anonymized - do standard processing return GetStringHandleForNameAndCheckLength(memberRef.Name, memberRef); } /// <summary> /// Method to format an anonymized field name and then to add it to the PE module #Strings /// stream and return the "handle" for the string. An anonymized field name is "Fnnnn" or /// "fnnnn", where nnnn is the persisted field number within the type or the non-persisted /// field number within the project, respectively. /// </summary> private StringHandle GetHandleForAnonymizedFieldName(int fieldNumber, bool isPublic) { return metadata.GetOrAddString((isPublic ? "F" : "f") + fieldNumber.ToString( "D" + YacksCompilation.CFieldNumberLength, CultureInfo.InvariantCulture)); } /// <summary> /// Method to combine namespace name and mangled type name into a fully-qualified type name. /// </summary> private static string FullyQualifiedTypeName(string namespaceName, string mangledTypeName) { return namespaceName + "." + mangledTypeName; }

Testing

As usual, I used a (slightly modified) library assembly Merlinia.CommonClasses.MArrays to test the modified Roslyn compiler. These screen shots (with highlighting added) show the Field metadata for the non-anonymized and the anonymized versions of the MArrays assembly as displayed by JetBrains dotPeek.

ModRos 9 Snap1

ModRos 9 Snap2

To test the modified referencing I compiled a program TestDynamicBitArrays, which tests parts of the Merlinia.CommonClasses.MArrays library assembly. Both the non-anonymized and the anonymized versions of the two programs worked.

Here are a couple of screen shots (with highlighting added) that show the MemberRef metadata for the non-anonymized and the anonymized versions of the TestDynamicBitArrays program as displayed by JetBrains dotPeek.

ModRos 9 Snap3

ModRos 9 Snap4

You must login to post a comment.
Loading comment... The comment will be refreshed after 00:00.

Be the first to comment.