If at first you don't succeed ...

This article series continues to document my slow but sure (emphasis on "slow") adventures in modifying the Microsoft Roslyn compiler to get it to do some strange and wonderful (?) things.

The previous article was a sort of "proof of concept" set of modifications that resulted in the string constants (string literals) in a C# program getting "disguised" - a very, very easy-to-undo kind of obfuscation. In this article I'll document my next iteration, a somewhat better implementation of the same basic principles, but in a way that is closer to something that could be used in a production setting.

The biggest difference is that my test program (Program.cs) now invokes Roslyn in a way that allows the input to be a .csproj file, so a compilation for a complete C# project with multiple source files can be run. This also means that if my test program's usage of the modified Roslyn compiler is working, then it is very likely that an invocation of the modified Roslyn compiler from MSBuild will also work.

Another difference is that instead of generating an in-storage C# source file and then using Roslyn to parse it to produce a syntax tree I now use SyntaxFactory to directly create the syntax tree for the fictitious source file. Plus some other improvements to make the processing less kludgy here and there.

The rest of this article is mostly just a listing of the various modifications.

Note: The code shown below is somewhat obsolete. A newer version is available for download - see this article.


////------------------------------------------------------------------------------------------ // // Merlinia Project Yacks // Copyright © Merlinia 2018, All Rights Reserved. // Licensed under the Apache License, Version 2.0. // (Just to be compatible with the Microsoft Roslyn license.) // ////------------------------------------------------------------------------------------------ using System; using System.IO; // From local build of (modified) Roslyn using Microsoft.CodeAnalysis; using Microsoft.CodeAnalysis.Emit; using Microsoft.CodeAnalysis.MSBuild; namespace ModifyRoslynProject3 { /// <summary> /// Program to invoke the modified Roslyn compiler in a way that (very slightly) resembles what /// happens when MSBuild does a C# compilation by launching csc.exe. It is also (very slightly) /// similar to what happens when Visual Studio launches csc.exe to do a build - but is in no way /// similar to how Visual Studio uses the Roslyn compiler to parse C# statements in the IDE. /// /// The modified Roslyn compiler that is exercised by this program expects that there is a /// YacksProjectInfo.json file in the same folder as the .csproj file. /// </summary> public class Program { /// <summary> /// Entry point for the program. /// /// One argument is needed: the full path and filename for the .csproj file. /// /// This is partly based on these: /// https://joshvarty.com/2014/09/12/learn-roslyn-now-part-6-working-with-workspaces/ /// https://stackoverflow.com/a/35235773/253938 /// </summary> public static void Main(string[] args) { if (args.Length != 1) { DisplayErrorOrInfo("Exactly one argument is needed."); return; } string fileName = args[0]; if (!File.Exists(fileName)) { DisplayErrorOrInfo(string.Format("File {0} does not exist.", fileName)); return; } // Create an MSBuildWorkspace to represent the project, and display some info just to be // sure that it worked. If no "documents" are being listed it may be due to not having // included the NuGet package Microsoft.Build in the references. MSBuildWorkspace msBuildWorkspace = MSBuildWorkspace.Create(); Project msBuildWorkspaceProject = msBuildWorkspace.OpenProjectAsync(fileName).Result; Console.WriteLine("Project name = {0}", msBuildWorkspaceProject.Name); foreach (Document projectDocument in msBuildWorkspaceProject.Documents) Console.WriteLine(" " + projectDocument.Name); // Create a CSharpCompilation object to perform the compilation of the project. It is first // at this point that code common with usage of csc.exe finally gets hit, and first at // this point that a few of the Yacks modifications to Roslyn come into play. Compilation roslynCompilation = msBuildWorkspaceProject.GetCompilationAsync().Result; // Do the C# compilation and produce an output file. (Emitting to file is available through // an extension method in the Microsoft.CodeAnalysis namespace.) EmitResult emitResult = roslynCompilation.Emit(msBuildWorkspaceProject.OutputFilePath); // If our compilation failed, we can discover exactly why if (!emitResult.Success) { foreach (Diagnostic roslynDiagnostic in emitResult.Diagnostics) Console.WriteLine(roslynDiagnostic.ToString()); } DisplayErrorOrInfo("End of test program, hit any key to terminate."); } /// <summary> /// Method to display an error or information message on the console window. /// </summary> private static void DisplayErrorOrInfo(string textString) { Console.WriteLine(textString); Console.ReadKey(); } } }


This is a source file added to the CodeAnalysis project. It contains a class with information about the constant strings that are being "disguised", and adds a field to the Compilation class referencing this object.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using System.Collections.Concurrent; using System.Collections.Generic; namespace Microsoft.CodeAnalysis { /// <summary> /// This class contains some data associated with the Yacks modifications to the Roslyn compiler. /// It is only relevant for C# compilations, but because it is used in the CodeGenerator and /// MetadataWriter classes there is a reference to it in the Compilation class instead of the /// CSharpCompilation class. /// /// Adding this class and these fields results in some RS0016 errors, which can then be "fixed" /// by hovering the mouse over the field name (with the squiggly underline) and selecting "Add to /// public API" from the light bulb menu. /// </summary> public class Yacks_CompilationData { // References to an Array and a Dictionary specifying the constant strings which should be // "disguised" (if the two references are not null). public string[] StringyNumberToString { get; } // Index is 1-based, not 0-based public Dictionary<string, int> StringToStringyNumber { get; } // Dictionary to convert a "stringy number" to the "fake token number" needed by the metadata // writer. internal readonly ConcurrentDictionary<int, int> _StringyNumberToFakeTokenNumber = new ConcurrentDictionary<int, int>(); // Constructor public Yacks_CompilationData(string[] stringyNumberToString, Dictionary<string, int> stringToStringyNumber) { StringyNumberToString = stringyNumberToString; StringToStringyNumber = stringToStringyNumber; } } /// <summary> /// Add a reference to the above object to the Compilation class, and thus to the derived /// CSharpCompilation class. /// </summary> partial class Compilation { // Reference to a data object containing some Yacks-related information. This field should be // copied from one instance of CSharpCompilation to the next one for at least some of the // situations where CSharpCompilation is recreated due to it being immutable. If this field // is null it implies no Yacks-related modifications are in force. public Yacks_CompilationData YacksData { get; set; } } }


This is a new file added to the CSharpCodeAnalysis project to contain code called via modifications to the CSharpCompilation class.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using Merlinia.Yacks; using System; using System.Collections.Generic; using System.Diagnostics; using System.Globalization; using System.IO; using System.Linq; using Microsoft.CodeAnalysis.CSharp.Syntax; using static Microsoft.CodeAnalysis.CSharp.SyntaxFactory; using Newtonsoft.Json; namespace Microsoft.CodeAnalysis.CSharp { /// <summary> /// This source file contains a couple of methods that are called when a C# compilation is /// getting initialized. /// </summary> partial class CSharpCompilation { //The length of the "stringy number", as used when naming the static classes: // "public static class Yacksxxxxx" internal const int CStringyNumberLength = 5; // This is a one-time switch, does not get copied to a new instance of CSharpCompilation internal bool _DisguiseStringConstants = false; /// <summary> /// Method to read the YacksProgramInfo.json file that should be in the same folder as the /// .csproj file. /// </summary> private void ReadYacksProgramInfo() { const string CYacksProgramInfo = "YacksProgramInfo.json"; // Get path to where the .csproj file is SourceFileResolver sourceFileResolver = _options.SourceReferenceResolver as SourceFileResolver; if (sourceFileResolver == null) return; // Something seriously wrong if (sourceFileResolver.BaseDirectory == null) return; // Something else seriously wrong // Check the YacksProgramInfo.json file exists - this should emit an error message, but I // haven't figured out how to do that yet string programInfoFileName = Path.Combine(sourceFileResolver.BaseDirectory, CYacksProgramInfo); Debug.Assert(File.Exists(programInfoFileName), CYacksProgramInfo + " file not found."); // Read and deserialize the YacksProgramInfo.json file YacksProgramInfo programInfo = JsonConvert.DeserializeObject<YacksProgramInfo>(File.ReadAllText(programInfoFileName)); // Test if disguising of string constants is wanted, prepare for that processing if so FYacksOptionDisguiseStringConstants flagToTest = _options.OptimizationLevel == OptimizationLevel.Debug ? FYacksOptionDisguiseStringConstants.fDebug : FYacksOptionDisguiseStringConstants.fRelease; if ((programInfo.OptionDisguiseStringConstants & flagToTest) != 0) { Debug.Assert(this.SyntaxTrees.Length == 0); // Syntax trees get added later _DisguiseStringConstants = true; } } /// <summary> /// Method to analyze the C# program via the Roslyn TokenWalker class, and to find the string /// literals in the program that will become string constants. Then a Roslyn SyntaxTree is /// created to describe a fictional C# C# source file containing one static class definition /// for each string constant. /// /// This also produces an array of the string literals (referred to as "stringys") which /// provides a "stringy number" to stringy collection. Finally, it produces a Dictionary to /// convert stringys into their stringy number. /// </summary> private IEnumerable<SyntaxTree> CreateSourceForStaticStringObjects( IEnumerable<SyntaxTree> syntaxTrees) { Debug.Assert(syntaxTrees.Any()); // Must be at least one syntax tree // Use the Roslyn CSharpSyntaxWalker facility to find all of the literal strings in the // program TokenWalker tokenWalker = new TokenWalker(); foreach (SyntaxTree syntaxTree in syntaxTrees) tokenWalker.Visit(syntaxTree.GetRoot()); // Don't generate anything if no literal strings if (tokenWalker._AllLiteralStrings.Length == 0) return syntaxTrees; // Display a warning if there are too many literal strings if (tokenWalker._AllLiteralStrings.Length >= YacksCore.C64K - 1) Console.WriteLine("Yacks: Warning: Too many string literals."); // Create a Yacks_CompilationData object and anchor it in the current (about to be // replaced) instance of CSharpCompilation int numberClasses = Math.Min(tokenWalker._AllLiteralStrings.Length, YacksCore.C64K - 1); YacksData = new Yacks_CompilationData(tokenWalker._AllLiteralStrings, new Dictionary<string, int>(numberClasses)); // Use SyntaxFactory to build a SyntaxTree for the static classes that will implement the // disguised strings. Determining how to do this was only possible due to this great tool: // http://roslynquoter.azurewebsites.net/ // This requires "using static Microsoft.CodeAnalysis.CSharp.SyntaxFactory;" // First build the syntax nodes for the static class definition for each literal string. // The "stringy numbers" are one-based, so stringy number 1 corresponds to // tokenWalker._AllLiteralStrings[0]. At the same time create the Dictionary of strings to // stringy numbers. MemberDeclarationSyntax[] classDeclarationArray = new MemberDeclarationSyntax[numberClasses]; for (int stringyNumber = 1; stringyNumber <= numberClasses; stringyNumber++) { string literalString = tokenWalker._AllLiteralStrings[stringyNumber - 1]; YacksData.StringToStringyNumber.Add(literalString, stringyNumber); // Class name = "Yacksnnnnn", where nnnnn is the stringy number string className = YacksCore.CYacks + stringyNumber.ToString("D" + CStringyNumberLength, CultureInfo.InvariantCulture); // This is arbitrary xxx value below, could alternatively be a random number int shiftOrUnshift = stringyNumber; // Build the syntax tree for one static class, corresponding to these C# statements: // // public static class Yacksnnnnn // { // public static readonly string s; // // static Yacksnnnnn() // { // s = YacksCore.M0002("", xxx); // } // } // classDeclarationArray[stringyNumber - 1] = ClassDeclaration(className) .WithModifiers( TokenList( new []{ Token(SyntaxKind.PublicKeyword), Token(SyntaxKind.StaticKeyword)})) .WithMembers( List<MemberDeclarationSyntax>( new MemberDeclarationSyntax[]{ FieldDeclaration( VariableDeclaration( PredefinedType( Token(SyntaxKind.StringKeyword))) .WithVariables( SingletonSeparatedList<VariableDeclaratorSyntax>( VariableDeclarator( Identifier("s"))))) .WithModifiers( TokenList( new [] { Token(SyntaxKind.PublicKeyword), Token(SyntaxKind.StaticKeyword), Token(SyntaxKind.ReadOnlyKeyword) })), ConstructorDeclaration( Identifier(className)) .WithModifiers( TokenList( Token(SyntaxKind.StaticKeyword))) .WithBody( Block( SingletonList<StatementSyntax>( ExpressionStatement( AssignmentExpression( SyntaxKind.SimpleAssignmentExpression, IdentifierName("s"), InvocationExpression( MemberAccessExpression( SyntaxKind.SimpleMemberAccessExpression, IdentifierName(YacksCore.CYacksCore), IdentifierName(YacksCore.CM0002))) .WithArgumentList( ArgumentList( SeparatedList<ArgumentSyntax>( new SyntaxNodeOrToken[] { Argument( LiteralExpression( SyntaxKind.StringLiteralExpression, Literal(""))), Token(SyntaxKind.CommaToken), Argument( LiteralExpression( SyntaxKind.NumericLiteralExpression, Literal(shiftOrUnshift)))}))))))))})); } // Now build the outer syntax tree with "using" and "namespace" statements, and enclosing // all of the static classes defined above SyntaxTree newSyntaxTree = CSharpSyntaxTree.Create( CompilationUnit() .WithUsings( SingletonList<UsingDirectiveSyntax>( UsingDirective( QualifiedName( IdentifierName("Merlinia"), IdentifierName("Yacks"))))) .WithMembers( SingletonList<MemberDeclarationSyntax>( NamespaceDeclaration( QualifiedName( IdentifierName("Merlinia"), IdentifierName("Yacks0002"))) .WithMembers( List<MemberDeclarationSyntax>( classDeclarationArray)))) .NormalizeWhitespace() ); // Create and return a list of syntax trees with the additional syntax tree for the static // classes added to the end List<SyntaxTree> syntaxTreeList = syntaxTrees.ToList(); syntaxTreeList.Add(newSyntaxTree); return syntaxTreeList; } } /// <summary> /// Class used to find all of the string literal tokens in a syntax tree, and to filter out the /// ones that are not string constants (typically string literals used in C# attributes). This is /// largely based on this: /// https://joshvarty.com/2014/07/26/learn-roslyn-now-part-4-csharpsyntaxwalker/ /// </summary> internal class TokenWalker : CSharpSyntaxWalker { // HashSet used to accumulate some of the literal strings in the program, but only one entry // in the case of duplicate strings private readonly HashSet<string> _hashSet = new HashSet<string>(); // Above HashSet converted to a string array private string[] _allLiteralStrings = null; // Property to return the above HashSet as a string array. Should only be used after all of // the string literals in all source files have been found. internal string[] _AllLiteralStrings { get { if (_allLiteralStrings == null) { _allLiteralStrings = new string[_hashSet.Count]; _hashSet.CopyTo(_allLiteralStrings); } else { Debug.Assert(_hashSet.Count == _allLiteralStrings.Length); } return _allLiteralStrings; } } /// <summary> /// Constructor. It is necessary to call the base constructor and specify /// SyntaxWalkerDepth.Token, otherwise the VisitToken() method does not get called. /// </summary> internal TokenWalker() : base(SyntaxWalkerDepth.Token) {} /// <summary> /// Method called by CSharpSyntaxWalker for each token in the syntax tree. /// </summary> public override void VisitToken(SyntaxToken syntaxToken) { // Only interested in string literals if (syntaxToken.Kind() == SyntaxKind.StringLiteralToken && syntaxToken.ValueText.Length > 1) // Not worth disguising 1-character strings { // Only interested in string literals that are part of an expression SyntaxNode parentNodeParent = syntaxToken.Parent?.Parent; if (parentNodeParent != null && parentNodeParent.Kind() == SyntaxKind.Argument) { // Add this string to the HashSet, if we haven't already encountered it once _hashSet.Add(syntaxToken.ValueText); } else { // Check not unexpected syntax that needs to be further analyzed Debug.Assert(parentNodeParent != null && parentNodeParent.Kind() == SyntaxKind.AttributeArgument); } } base.VisitToken(syntaxToken); // Probably not necessary? } } }

To better understand what some of the above code is doing please see the previous article in this series.


This is almost exactly the same as in the previous iteration, except that I did figure out how to make one project that could be used by both the modified Roslyn compiler and the code that it compiles.

////------------------------------------------------------------------------------------------ // // Merlinia Project Yacks // Copyright © Merlinia 2018, All Rights Reserved. // Licensed under the Apache License, Version 2.0. // (Just to be compatible with the Microsoft Roslyn license.) // ////------------------------------------------------------------------------------------------ using System; using System.Diagnostics.CodeAnalysis; namespace Merlinia.Yacks { /// <summary> /// Library assembly containing some basic methods used by programs produced by the modified /// Roslyn compiler. /// /// This is built as a .Net Standard 1.3 project so it can be used by the Modified Roslyn /// compiler, and also as a .Net Framework 2.0 project, mostly for the sake of nostalgia. /// /// The names of the methods, fields, etc. in this program are pre-obfuscated. Sorry about that. /// </summary> public static class YacksCore { // Used to make generating source code that calls this method more self-defining public const string CYacks = "Yacks"; public const string CYacksCore = nameof(YacksCore); public const string CM0002 = nameof(M0002); // Limit + 1 for the shiftOrUnshift argument of M0002() public const int C64K = UInt16.MaxValue + 1; /// <summary> /// Method to "disguise" or "undisguise" a string using a version of the Caesar Cipher. /// /// The shiftOrUnshift argument (b) is an arbitrary "key value", and must be a non-zero /// integer between -65535 and 65535 (inclusive). To undisguise the disguised string you use /// the negative value. For example, if you disguise with -42, then you undisguise with +42, /// or vice-versa. /// /// The basic code comes from here: https://stackoverflow.com/a/48474433/253938 /// </summary> /// <param name="a">input string</param> /// <param name="b">shiftOrUnshift argument, see above</param> /// <returns>undisguised string</returns> [SuppressMessage("Microsoft.Naming", "CA1704:IdentifiersShouldBeSpelledCorrectly", MessageId = "a")] [SuppressMessage("Microsoft.Naming", "CA1704:IdentifiersShouldBeSpelledCorrectly", MessageId = "b")] public static string M0002(string a, int b) { char[] d = new char[a.Length]; for (int i = 0; i < a.Length; i++) d[i] = Convert.ToChar((Convert.ToInt32(a[i]) + b + C64K) % C64K); return new string(d); } } }


This is another new file added to the CSharpCodeAnalysis project. Its expected purpose and usage will first be apparent if/when I come farther with my modifications of Roslyn.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using System; namespace Microsoft.CodeAnalysis.CSharp { /// <summary> /// Flags-style enum indicating whether optional "disguising" of string constants is wanted. /// </summary> [Flags] internal enum FYacksOptionDisguiseStringConstants { fNone = 0, fDebug = 1 << 0, fRelease = 1 << 1, fBoth = fDebug | fRelease } /// <summary> /// This class represents the deserialized YacksProgramInfo.json file. /// </summary> internal class YacksProgramInfo { // Project name - this is the name of the .csproj file with no path and no ".csproj" public string ProjectName { get; set; } // Module number is a unique random four-digit number (0100 - 9999) for each "module". (The // module numbers 0001 - 0099 are reserved for special purposes.) // Module numbers are used for several purposes such as generating the identities of // serialized objects and as part of the diagnostic logging system. // For a .Net program a "module" is a .Net assembly. public int ModuleNumber { get; set; } // Option to control whether string constants should be "disguised" public FYacksOptionDisguiseStringConstants OptionDisguiseStringConstants { get; set; } } }


Now for the modifications made to the CSharpCompilation class. In the Visual Studio Solution Explorer this file can be found under CSharpCodeAnalysis - Compilation.

Unfortunately, because the CSharpCompilation class is immutable, it implies that adding a new reference to the (base) class means that you have to be careful to copy the reference to the new instance of the class whenever it is mutated.

public static CSharpCompilation Create( string assemblyName, IEnumerable<SyntaxTree> syntaxTrees = null, IEnumerable<MetadataReference> references = null, CSharpCompilationOptions options = null) { //Yacks03: After creating CSharpCompilation object, call Yacks method to read the // YacksProgramInfo.json file. /* return Create( assemblyName, options ?? s_defaultOptions, syntaxTrees, references, previousSubmission: null, returnType: null, hostObjectType: null, isSubmission: false); */ CSharpCompilation cSharpCompilation = Create( assemblyName, // Switch non-comment vs commented-out for next two lines to turn off multi-threading, // making it easier to trace Roslyn code with the debugger options ?? s_defaultOptions, //(options ?? s_defaultOptions).WithConcurrentBuild(false), syntaxTrees, references, previousSubmission: null, returnType: null, hostObjectType: null, isSubmission: false); cSharpCompilation.ReadYacksProgramInfo(); return cSharpCompilation; }

The above code is at around line 210 in the revision of Roslyn that I was working with.

private CSharpCompilation( string assemblyName, CSharpCompilationOptions options, ImmutableArray<MetadataReference> references, CSharpCompilation previousSubmission, Type submissionReturnType, Type hostObjectType, bool isSubmission, ReferenceManager referenceManager, bool reuseReferenceManager, SyntaxAndDeclarationManager syntaxAndDeclarations, AsyncQueue<CompilationEvent> eventQueue = null, //Yacks03: Added parameter to pass Yacks_CompilationData object on to new instance Yacks_CompilationData yacksData = null) : base(assemblyName, references, SyntaxTreeCommonFeatures(syntaxAndDeclarations.ExternalSyntaxTrees), isSubmission, eventQueue) {

This is the start of the class constructor, around line 316 for the revision of Roslyn that I was working with.

//Yacks03: Copy Yacks_CompilationData object from previous instance, or set to null YacksData = yacksData; }

And the above code is the completion of the constructor, around line 371.

public new CSharpCompilation Clone() { return new CSharpCompilation( this.AssemblyName, _options, this.ExternalReferences, this.PreviousSubmission, this.SubmissionReturnType, this.HostObjectType, this.IsSubmission, _referenceManager, reuseReferenceManager: true, syntaxAndDeclarations: _syntaxAndDeclarations, //Yacks03: Added argument to pass Yacks_CompilationData object on to new instance yacksData: this.YacksData); } private CSharpCompilation Update( ReferenceManager referenceManager, bool reuseReferenceManager, SyntaxAndDeclarationManager syntaxAndDeclarations) { return new CSharpCompilation( this.AssemblyName, _options, this.ExternalReferences, this.PreviousSubmission, this.SubmissionReturnType, this.HostObjectType, this.IsSubmission, referenceManager, reuseReferenceManager, syntaxAndDeclarations, //Yacks03: Added argument to pass Yacks_CompilationData object on to new instance yacksData: this.YacksData); }

A single line was added to both the Clone() and Update() methods, around lines 409 and 426.

public new CSharpCompilation WithReferences(IEnumerable<MetadataReference> references) { // References might have changed, don't reuse reference manager. // Don't even reuse observed metadata - let the manager query for the metadata again. return new CSharpCompilation( this.AssemblyName, _options, ValidateReferences<CSharpCompilationReference>(references), this.PreviousSubmission, this.SubmissionReturnType, this.HostObjectType, this.IsSubmission, referenceManager: null, reuseReferenceManager: false, syntaxAndDeclarations: _syntaxAndDeclarations, //Yacks03: Added argument to pass Yacks_CompilationData object on to new instance yacksData: this.YacksData); }

Similarly, a single line was added to the WithReferences() method, around line 479.

//Yacks03: If string constants should be disguised then generate a syntax tree for the // (fictive) C# source code that defines the static objects that will implement the // string constants. // Should be OK to ignore warnings about multiple enumeration of IEnumerable. if (_DisguiseStringConstants) trees = CreateSourceForStaticStringObjects(trees); syntaxAndDeclarations = syntaxAndDeclarations.AddSyntaxTrees(trees); return Update(_referenceManager, reuseReferenceManager, syntaxAndDeclarations); }

Finally, something a bit more interesting. The above code was inserted near the end of the AddSyntaxTrees() method, around line 722.

That completes the modifications to class CSharpCompilation.

The remaining modifications are very similar to, or exactly the same as, the modifications made in the previous article in this series. Plus some added code to copy references to the Yacks_CompilationData around.


This source file is part of the CSharpCodeAnalysis project in Roslyn. In the Visual Studio Solution Explorer it can be found under CSharpCodeAnalysis - Compiler.

//Yacks03: Add extra argument to pass reference to the Yacks_CompilationData object // (or null) Debug.Assert(compilation.YacksData != null); // Temporary - fails if modifying off ILBuilder builder = new ILBuilder(moduleBuilder, localSlotManager, optimizations, compilation.YacksData);

One line is modified in the GenerateMethodBody() method, around line 1352 in the revision of Roslyn I was working with.


This source file is part of the CSharpCodeAnalysis project in Roslyn. In the Visual Studio Solution Explorer it can be found under CSharpCodeAnalysis - CodeGen.

private void EmitConstantExpression(TypeSymbol type, ConstantValue constantValue, bool used, SyntaxNode syntaxNode) { if (used) // unused constant has no side-effects { // Null type parameter values must be emitted as 'initobj' rather than 'ldnull'. if (((object)type != null) && (type.TypeKind == TypeKind.TypeParameter) && constantValue.IsNull) { EmitInitObj(type, used, syntaxNode); } else { //Yacks02: Call local method as front-end to _builder.EmitConstantValue() //_builder.EmitConstantValue(constantValue); EmitConstantValue(constantValue, syntaxNode); } } }

This code is at line 2843 in the revision of Roslyn I was working with. One line has been commented out, and one line has been added.


This is a source file which has been added to the CSharpCodeAnalysis project.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using Merlinia.Yacks; using System; using System.Diagnostics; using System.Linq; using Microsoft.CodeAnalysis.CSharp.Syntax; namespace Microsoft.CodeAnalysis.CSharp.CodeGen { partial class CodeGenerator { /// <summary> /// Method to either do standard processing for a constant expression value, or to emit a /// ldstr opcode for a disguised string because we are processing the constructor of one of /// the static objects that implements disguised strings. /// </summary> private void EmitConstantValue(ConstantValue constantValue, SyntaxNode syntaxNode) { // Get reference to the Yacks_CompilationData object (or null) Yacks_CompilationData yacksDataOpt = ((CSharpCompilation)_module.CommonCompilation).YacksData; Debug.Assert(yacksDataOpt != null); // This is temporary - fails if modifying turned off // Determine if we are processing the constructor of a static object implementing a // disguised string int stringyNumber = 0; int shiftOrUnshift = 0; if (constantValue.Discriminator != ConstantValueTypeDiscriminator.String || constantValue.StringValue == null || constantValue.StringValue.Length != 0 || yacksDataOpt == null || !GetStringyNumberIfRelevant(syntaxNode, yacksDataOpt, ref stringyNumber, ref shiftOrUnshift)) { // Standard processing for all constant values other than the "" strings in the static // object constructors generated to implement undisguising of disguised strings _builder.EmitConstantValue(constantValue); return; } // This syntax node represents the "" string in a static object constructor associated with // a disguised string. Disguise the string associated with the stringy number and emit a // ldstr opcode to load it. This results in the disguised string being added to the #US // (user string) "metadata stream" in the PE file. _builder.ProcessAssignmentInStaticClassConstructorRhs( YacksCore.M0002(yacksDataOpt.StringyNumberToString[stringyNumber - 1], -shiftOrUnshift), stringyNumber); } /// <summary> /// Method to examine the syntax node and its parent nodes to determine if this node /// represents the "" string in the "s = YacksCore.M0002("", ??);" statement in the /// constructor of one of the static classes that implement a disguised string. /// /// The processing in this method can be compared with the generation of the syntax tree nodes /// in the CreateSourceForStaticStringObjects() method. Some of this processing is redundant, /// but is done just to be slavishly consistent with the definition of the syntax tree. /// </summary> /// <returns>true = statement recognized and stringy number extracted</returns> private static bool GetStringyNumberIfRelevant(SyntaxNode syntaxNode, Yacks_CompilationData yacksData, ref int stringyNumber, ref int shiftOrUnshift) { if (!(syntaxNode is LiteralExpressionSyntax) || syntaxNode.ChildTokens().Count() != 1) return false; SyntaxToken argumentToken1 = syntaxNode.GetFirstToken(); if (argumentToken1.Kind() != SyntaxKind.StringLiteralToken || argumentToken1.ValueText != "") return false; if (!(syntaxNode.Parent is ArgumentSyntax)) return false; ArgumentListSyntax argumentListNode = syntaxNode.Parent?.Parent as ArgumentListSyntax; if (argumentListNode == null || argumentListNode.Arguments.Count != 2) return false; LiteralExpressionSyntax argument2 = argumentListNode.Arguments[1].Expression as LiteralExpressionSyntax; if (argument2 == null || argument2.ChildTokens().Count() != 1) return false; SyntaxToken argumentToken2 = argument2.GetFirstToken(); if (argumentToken2.Kind() != SyntaxKind.NumericLiteralToken || !int.TryParse(argumentToken2.ValueText, out shiftOrUnshift)) return false; InvocationExpressionSyntax invocationExpressionNode = argumentListNode.Parent as InvocationExpressionSyntax; if (invocationExpressionNode == null) return false; MemberAccessExpressionSyntax memberAccessExpressionNode = invocationExpressionNode.Expression as MemberAccessExpressionSyntax; if (memberAccessExpressionNode == null) return false; IdentifierNameSyntax identifierNameNode1 = memberAccessExpressionNode.Expression as IdentifierNameSyntax; if (identifierNameNode1 == null || identifierNameNode1.Identifier.Text != YacksCore.CYacksCore) return false; IdentifierNameSyntax identifierNameNode2 = memberAccessExpressionNode.Name as IdentifierNameSyntax; if (identifierNameNode2 == null || identifierNameNode2.Identifier.Text != YacksCore.CM0002) return false; ConstructorDeclarationSyntax constructorDeclarationNode = invocationExpressionNode.Parent?.Parent?.Parent?.Parent as ConstructorDeclarationSyntax; if (constructorDeclarationNode == null || constructorDeclarationNode.Modifiers.Count != 1 || constructorDeclarationNode.Modifiers[0].Text != "static") return false; string className = constructorDeclarationNode.Identifier.Text; if (className.Length != YacksCore.CYacks.Length + CSharpCompilation.CStringyNumberLength || !className.StartsWith(YacksCore.CYacks, StringComparison.Ordinal) || !int.TryParse(className.Substring(YacksCore.CYacks.Length, CSharpCompilation.CStringyNumberLength), out stringyNumber) || stringyNumber <= 0 || stringyNumber > yacksData.StringyNumberToString.Length) return false; return true; } } }


This file is in the CodeAnalysis project. In the Visual Studio Solution Explorer it can be found under CodeAnalysis - CodeGen.

The constructor for this class is modified to accept and store a reference to the Yacks_CompilationData object.

//Yacks03: Add parameter to allow passing reference to Yacks_CompilationData object (or // null) into this object internal ILBuilder(ITokenDeferral module, LocalSlotManager localSlotManager, OptimizationLevel optimizations, Yacks_CompilationData yacksDataOpt) { Debug.Assert(BitConverter.IsLittleEndian); this.module = module; this.LocalSlotManager = localSlotManager; _emitState = default(EmitState); _scopeManager = new LocalScopeManager(); leaderBlock = _currentBlock = _scopeManager.CreateBlock(this); _labelInfos = new SmallDictionary<object, LabelInfo>(ReferenceEqualityComparer.Instance); _optimizations = optimizations; //Yacks03: Note reference to Yacks_CompilationData object (or null) _yacksDataOpt = yacksDataOpt; }

This code is at around line 67 for the revision of Roslyn I was workng with.


This source file is part of the CodeAnalysis project in Roslyn. In the Visual Studio Solution Explorer it can be found under CodeAnalysis - CodeGen.

I modified this file in two different places.

internal void EmitToken(Cci.IReference value, SyntaxNode syntaxNode, DiagnosticBag diagnostics, bool encodeAsRawToken = false) { uint token = module?.GetFakeSymbolTokenForIL(value, syntaxNode, diagnostics) ?? 0xFFFF; // Setting the high bit indicates that the token value is to be interpreted literally rather than as a handle. if (encodeAsRawToken) { token |= Cci.MetadataWriter.LiteralMethodDefinitionToken; } this.GetCurrentWriter().WriteUInt32(token); //Yacks02: Test if this is LHS of assignment statement in static object constructor for // disguised strings, record the fake token if so ProcessPossibleAssignmentInStaticClassConstructorLhs(value, token); }

This is at line 46 in the revision of Roslyn that I was working with. One line of code has been added.

internal void EmitStringConstant(string value) { if (value == null) { EmitNullConstant(); } else { //Yacks02: Implement constant string reference, maybe to disguised string EmitStringOrLoadDisguisedString(value); } }

This is around line 682 in the revision of Roslyn that I was working with. (If you've been following along you may recognize this code from my first article about modifying Roslyn - this is where I added code to convert "Hello world" programs into "Hello universe" programs.) For this modification I've removed several lines and replaced them by a call to a method that is in the following file.


This is a source file which has been added to the CodeAnalysis project.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using System.Diagnostics; using System.Reflection.Metadata; namespace Microsoft.CodeAnalysis.CodeGen { partial class ILBuilder { // Reference to Yacks_CompilationData object (or null) private readonly Yacks_CompilationData _yacksDataOpt; // If not zero, this is a "stringy number", and it implies that the constructor of one of the // static objects associated with disguised strings is currently being processed by this // instance of ILBuilder private int _stringyNumberOrZero = 0; /// <summary> /// Method to do two things related to emitting processing for the "s = /// YacksCore.M0002("", ??);" statement in the constructor of one of the static classes that /// implement a disguised string. /// 1. A ldstr opcode is emitted to load the disguised string (instead of the "" string). /// 2. The "stringy number" is noted so that when the assignment to the "s" field gets /// emitted (LHS gets emitted immediately after emitting RHS), the "fake token number" will /// be recorded as the token associated with the disguised string. /// </summary> internal void ProcessAssignmentInStaticClassConstructorRhs(string disguisedString, int stringyNumber) { Debug.Assert(_stringyNumberOrZero == 0); // RHS then LHS processed for assignment EmitLdstrForConstantString(disguisedString); _stringyNumberOrZero = stringyNumber; } /// <summary> /// Method to record the fake token numbers associated with syntax nodes for the static s /// field in the static objects defined to implement the disguised strings. /// </summary> private void ProcessPossibleAssignmentInStaticClassConstructorLhs(Cci.IReference fieldSymbol, uint fakeToken) { if (_stringyNumberOrZero != 0 && fieldSymbol is IFieldSymbol) { Debug.Assert(_yacksDataOpt != null && (fakeToken & 0xff000000) == 0); if (!_yacksDataOpt._StringyNumberToFakeTokenNumber.TryAdd(_stringyNumberOrZero, (int)fakeToken)) Debug.Assert(false, "stringy number already in dictionary"); _stringyNumberOrZero = 0; } } /// <summary> /// This method replaces the standard processing for emitting a ldstr opcode for a constant /// string in method EmitStringConstant(). /// </summary> private void EmitStringOrLoadDisguisedString(string stringValue) { // Test if string is "disguised" and should be fetched via static object int stringyNumber; if (stringValue.Length <= 1 || _yacksDataOpt == null || _yacksDataOpt.StringToStringyNumber == null || !_yacksDataOpt.StringToStringyNumber.TryGetValue(stringValue, out stringyNumber)) { // Standard processing for non-disguised string EmitLdstrForConstantString(stringValue); return; } // At this point we would like to emit a ldsfld (load static field) opcode referencing the // static field "s" in the associated static class for this disguised string. But the "fake // token" for field "s" may not have been recorded yet. So we do something very kludgy: we // emit a ldstr opcode followed by the stringy number as a negative number. Then, later, in // Microsoft.Cci.MetadataWriter.WriteInstructions(), we replace the ldstr with an ldsfld, // and emit the fake token number (which must now be known) converted to an entity handle. EmitOpCode(ILOpCode.Ldstr); this.GetCurrentWriter().WriteInt32(-stringyNumber); } /// <summary> /// Method containing code moved here from the standard processing in method /// EmitStringConstant(). /// </summary> private void EmitLdstrForConstantString(string stringValue) { EmitOpCode(ILOpCode.Ldstr); EmitToken(stringValue); } } }

We're almost done now, but as mentioned in comments in the above code there is one final place in Roslyn that needs to be modified.


This source file is also part of the CodeAnalysis project in Roslyn. In the Visual Studio Solution Explorer it can be found under CodeAnalysis - PEWriter.

case OperandType.InlineString: { writer.Offset = offset; int pseudoToken = ReadInt32(generatedIL, offset); //Yacks02: Test for temporary ldstr that needs to be replaced by ldsfld if (ProcessPossibleDisguisedStringReference(pseudoToken, generatedIL, ref offset, writer)) break; UserStringHandle handle;

This code is at line 3207 in the revision of Roslyn I was working with. One if statement and one break statement have been added.


This is another source file which has been added to the CodeAnalysis project.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using System.Collections.Immutable; using System.Diagnostics; using System.Reflection.Metadata; using System.Reflection.Metadata.Ecma335; using Microsoft.CodeAnalysis; namespace Microsoft.Cci { internal partial class MetadataWriter { /// <summary> /// Method to perform some kludgy processing needed to get loading the reference to the static /// field "s" in the static objects that implement disguised strings to work. /// </summary> /// <returns>true = kludgy processing done, false = not disguised string reference</returns> private bool ProcessPossibleDisguisedStringReference(int pseudoToken, ImmutableArray<byte> generatedIL, ref int localOffset, BlobWriter blobWriter) { // Test if the pseudo/fake token is actually a (negative) stringy number, exit if not if (pseudoToken >= 0) return false; Yacks_CompilationData yacksData = module.CommonCompilation.YacksData; Debug.Assert(yacksData != null); // Replace the ldstr opcode with ldsfld opcode in BlobWriter's data area Debug.Assert(ReadByte(generatedIL, localOffset - 1) == (byte)ILOpCode.Ldstr); blobWriter.Offset = localOffset - 1; blobWriter.WriteByte((byte)ILOpCode.Ldsfld); // Look up the fake/pseudo token for the "s" field, convert it to a handle and emit it int fakeToken; if (!yacksData._StringyNumberToFakeTokenNumber.TryGetValue(-pseudoToken, out fakeToken)) Debug.Assert(false, "stringy number not in dictionary"); blobWriter.WriteInt32( MetadataTokens.GetToken(ResolveEntityHandleFromPseudoToken(fakeToken))); localOffset += 4; return true; // No further processing for this opcode } } }

Incidentally, this is indeed some rather kludgy code, but it is more-or-less copied from some existing code in MetadataWriter.cs,  around line 3180. 

Testing and disclaimers

I've tested this modified Roslyn compiler on a C# project containing three source files and about 600 lines of code, and examined the output with JetBrains dotPeek. The few string constants in the program were indeed disguised. Further testing is needed, of course.

As for disclaimers, please see the previous article.


You must login to post a comment.
Loading comment... The comment will be refreshed after 00:00.

Be the first to comment.