Why "disguising" strings?

This is my second adventure in trying to learn how to modify Roslyn and getting an idea of what is possible and how difficult it is.

As a kind of arbitrary goal I decided to see if I could modify all C# literal string constants so they weren't immediately obvious when examining the resulting .dll or .exe file, for example with a tool such as JetBrains dotPeek. In other words, a kind of obfuscation, although a very, very, very simple obfuscation - undisguising the strings will be child's play for anyone with any knowledge of how .Net IL (intermediate language) works.

Note: The code shown below is somewhat obsolete. A newer version is available for download - see this article.

ModifyRoslynProject2\Program.cs

The program that runs the Roslyn compiler is based on the principles discussed in the first article in this series,

////------------------------------------------------------------------------------------------ // // Merlinia Project Yacks // Copyright © Merlinia 2018, All Rights Reserved. // Licensed under the Apache License, Version 2.0. // (Just to be compatible with the Microsoft Roslyn license.) // ////------------------------------------------------------------------------------------------ using Merlinia.Yacks; using System; using System.Collections.Generic; using System.Diagnostics.CodeAnalysis; using System.Globalization; using System.IO; using System.Text; // From the Roslyn build output using Microsoft.CodeAnalysis; using Microsoft.CodeAnalysis.CSharp; using Microsoft.CodeAnalysis.Emit; namespace ModifyRoslynProject2 { /// <summary> /// Program to perform a C# complication using the modified Roslyn compiler. /// </summary> public static class Program { public static void Main() { DoRoslynCompile(@"..\CompilerInput\Test1.cs", "Test1.exe"); Console.WriteLine("Hit any key to terminate this program."); Console.ReadKey(); } /// <summary> /// Method to perform a Roslyn compilation of a C# program, using the modified Roslyn compiler /// modules that are referenced by this project. This is based on /// https://joshvarty.com/2016/01/16/learn-roslyn-now-part-16-the-emit-api/ /// </summary> private static void DoRoslynCompile(string sourceFile, string outputFile) { try { Console.WriteLine("Beginning Roslyn compilation for " + sourceFile); SyntaxTree syntaxTree1 = CSharpSyntaxTree.ParseText(File.ReadAllText(sourceFile)); // Create C# source code for the static objects that implement disguised strings, and // also generate the Dictionary and Array collections needed by the modified compiler string[] arrayOfStringys; // Index is 1-based, not 0-based Dictionary<string, int> dictionaryOfStringys; SyntaxTree syntaxTree2 = CreateSourceForStaticStringObjects(syntaxTree1, out arrayOfStringys, out dictionaryOfStringys); /* un-comment this to compile with no attempts to disguise constant strings arrayOfStringys = null; dictionaryOfStringys = null; syntaxTree2 = null; */ SyntaxTree[] syntaxTrees = syntaxTree2 == null ? new [] { syntaxTree1 } : new [] { syntaxTree1, syntaxTree2 }; PortableExecutableReference mscorlib = MetadataReference.CreateFromFile(typeof(object).Assembly.Location); // Needed because YacksCore.M0002() is referenced in the generated static classes for // the strings to be disguised PortableExecutableReference yacksCore = MetadataReference.CreateFromFile(@"..\..\YacksCore\bin\Merlinia.YacksCore.dll"); // Set this to false to avoid problems when tracing parts of Roslyn bool concurrentBuildOption = true; CSharpCompilationOptions roslynOptions = new CSharpCompilationOptions( outputFile.EndsWith(".exe", StringComparison.OrdinalIgnoreCase) ? OutputKind.ConsoleApplication : OutputKind.DynamicallyLinkedLibrary, concurrentBuild: concurrentBuildOption); CSharpCompilation roslynCompilation = CSharpCompilation.Create("MyCompilation", syntaxTrees: syntaxTrees, references: new [] { mscorlib, yacksCore }, options: roslynOptions); // Set two static fields in class YacksForRoslyn.StaticFields (which has been added to // modified Roslyn) for the Array and Dictionary collections needed by EmitExpression // and ILBuilderEmit to convert strings to be disguised into "stringy numbers". See // comments in class YacksForRoslyn.StaticFields. YacksForRoslyn.StaticFields.StringyNumberToString = arrayOfStringys; // May be null YacksForRoslyn.StaticFields.StringToStringyNumber = dictionaryOfStringys; // May be null // Emitting to file is available through an extension method in the // Microsoft.CodeAnalysis namespace EmitResult emitResult = roslynCompilation.Emit(@"..\CompilerOutput\" + outputFile); // If our compilation failed, we can discover exactly why if (!emitResult.Success) { foreach (Diagnostic roslynDiagnostic in emitResult.Diagnostics) Console.WriteLine(roslynDiagnostic.ToString()); } Console.WriteLine("End of Roslyn compilation."); } catch (Exception e) { Console.WriteLine(e.Message); } } /// <summary> /// Method to analyze a C# program via the Roslyn TokenWalker class, and to find all string /// literals in the program. Then a temporary C# source file is created in-storage containing /// one static class definition for each string literal, and this is parsed to yield a new /// additional Roslyn syntax tree. /// /// This also produces an array of the string literals (referred to as "stringys") which /// provides a "stringy number" to stringy collection. Finally, it produces a Dictionary to /// convert stringys into their stringy number. /// </summary> private static SyntaxTree CreateSourceForStaticStringObjects(SyntaxTree inputSyntaxTree, out string[] arrayOfStringys, out Dictionary<string, int> dictionaryOfStringys) { // Use the Roslyn CSharpSyntaxWalker facility to find all of the literal strings in the // program TokenWalker tokenWalker = new TokenWalker(); tokenWalker.Visit(inputSyntaxTree.GetRoot()); // Don't generate anything if no literal strings if (tokenWalker._AllLiteralStrings.Length == 0) { arrayOfStringys = null; dictionaryOfStringys = null; return null; } arrayOfStringys = tokenWalker._AllLiteralStrings; // Display a warning if there are too many literal strings if (tokenWalker._AllLiteralStrings.Length >= YacksCore.C64K - 1) Console.WriteLine("Yacks: Warning: Too many string literals."); // Prepare to generate the C# source for the static class definitions StringBuilder stringBuilder = new StringBuilder(); stringBuilder.Append("using Merlinia.Yacks; \r\n\r\n" + "namespace Merlinia.Yacks0002 \r\n" + "{ \r\n\r\n"); // Add a class definition for each literal string. The "stringy numbers" are one-based, so // stringy number 1 corresponds to tokenWalker._AllLiteralStrings[0]. At the same time // create the Dictionary of strings to stringy numbers. dictionaryOfStringys = new Dictionary<string, int>(); for (int stringyNumber = 1; stringyNumber <= Math.Min(tokenWalker._AllLiteralStrings.Length, YacksCore.C64K - 1); stringyNumber++) { string literalString = tokenWalker._AllLiteralStrings[stringyNumber - 1]; dictionaryOfStringys.Add(literalString, stringyNumber); string s = "public static class ?1 \r\n" + "{ \r\n" + " public static readonly string s; \r\n\r\n" + " static ?1() \r\n" + " { \r\n" + " s = " + YacksCore.CYacksCore + "." + YacksCore.CM0002 + "(\"\", ?2); \r\n" + " } \r\n" + "} \r\n\r\n"; stringBuilder.Append(s.Replace("?1", "Yacks" + stringyNumber.ToString("D" + YacksForRoslyn.StaticFields.CStringyNumberLength, CultureInfo.InvariantCulture)) .Replace("?2", stringyNumber.ToString(CultureInfo.InvariantCulture))); } // Complete the C# source file and parse it, producing a secondary syntax tree stringBuilder.Append("} \r\n"); //Console.WriteLine(stringBuilder.ToString()); return CSharpSyntaxTree.ParseText(stringBuilder.ToString()); } } /// <summary> /// Class used to find all of the string literal tokens in a syntax tree. This is largely based /// on this: https://joshvarty.com/2014/07/26/learn-roslyn-now-part-4-csharpsyntaxwalker/ /// </summary> internal class TokenWalker : CSharpSyntaxWalker { // HashSet used to accumulate all literal strings in the program, but only one entry in the // case of duplicate strings private readonly HashSet<string> _hashSet = new HashSet<string>(); // Above HashSet converted to a string array private string[] _allLiteralStrings = null; // Property to return the above HashSet as a string array internal string[] _AllLiteralStrings { get { if (_allLiteralStrings == null) { _allLiteralStrings = new string[_hashSet.Count]; _hashSet.CopyTo(_allLiteralStrings); } return _allLiteralStrings; } } /// <summary> /// Constructor. It is necessary to call the base constructor and specify /// SyntaxWalkerDepth.Token, otherwise the VisitToken() method does not get called. /// </summary> internal TokenWalker() : base(SyntaxWalkerDepth.Token) {} /// <summary> /// Method called by CSharpSyntaxWalker for each token in the syntax tree. /// </summary> public override void VisitToken(SyntaxToken syntaxToken) { if (syntaxToken.Kind() == SyntaxKind.StringLiteralToken && syntaxToken.ValueText.Length > 1) // Not worth disguising 1-character strings { // Add this string to the HashSet, if we haven't already encountered it once _hashSet.Add(syntaxToken.ValueText); } base.VisitToken(syntaxToken); // Probably not necessary? } } }

The first parts of the above code (the Main() and DoRoslynCompile() methods) are fairly straight-forward, and based on what I learned when I did the first project.

The CreateSourceForStaticStringObjects() method, together with the TokenWalker class, are the important new parts. This code uses Roslyn's CSharpSyntaxWalker facility to find all of the string literals in the C# program. The list of literal strings is then used to create a new additional C# program containing one very small static class to implement the undisguising of each string literal. These extra classes look like this:

public static class Yacks00001 { public static readonly string s; static Yacks00001() { s = YacksCore.M0002("", 1); } } public static class Yacks00002 { public static readonly string s; static Yacks00002() { s = YacksCore.M0002("", 2); } }

This obviously doesn't make much sense, but the trick is that the "" strings in the constructors are going to be replaced by disguised strings when they get compiled by the modified Roslyn compiler, and the YacksCore.M0002() method is a method that undisguises (or disguises) strings.

Speaking of YacksCore, here's what it looks like:

YacksCore.cs

////------------------------------------------------------------------------------------------ // // Merlinia Project Yacks // Copyright © Merlinia 2018, All Rights Reserved. // Licensed under the Apache License, Version 2.0. // (Just to be compatible with the Microsoft Roslyn license.) // ////------------------------------------------------------------------------------------------ using System; using System.Diagnostics.CodeAnalysis; namespace Merlinia.Yacks { /// <summary> /// Library assembly containing some basic methods used by programs produced by the modified /// Roslyn compiler. /// /// The names of the methods, fields, etc. in this program are pre-obfuscated. Sorry about that. /// </summary> public static class YacksCore { // Used to make generating source code that calls this method more self-defining public const string CYacksCore = nameof(YacksCore); public const string CM0002 = nameof(M0002); // Limit + 1 for the shiftOrUnshift argument of M0002() public const int C64K = UInt16.MaxValue + 1; /// <summary> /// Method to "disguise" or "undisguise" a string using a version of the Caesar Cipher. /// /// The shiftOrUnshift argument (b) is an arbitrary "key value", and must be a non-zero /// integer between -65535 and 65535 (inclusive). To undisguise the disguised string you use /// the negative value. For example, if you disguise with -42, then you undisguise with +42, /// or vice-versa. /// /// The basic code comes from here: https://stackoverflow.com/a/48474433/253938 /// </summary> /// <param name="a">input string</param> /// <param name="b">shiftOrUnshift argument, see above</param> /// <returns>undisguised string</returns> public static string M0002(string a, int b) { char[] d = new char[a.Length]; for (int i = 0; i < a.Length; i++) d[i] = Convert.ToChar((Convert.ToInt32(a[i]) + b + C64K) % C64K); return new string(d); } } }

This is in a separate project, and gets compiled to a .Net Framework assembly (a .dll), partly because I still haven't embraced the .Net Standard paradigm.

All of the above code is external to Roslyn. But I did promise that I'd get into modifying Roslyn, so here goes:

src\Compilers\Core\Portable\YacksForRoslyn.StaticFields.cs

This is a small source file that I added to the CodeAnalysis project in Roslyn.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using System.Collections.Generic; namespace YacksForRoslyn { /// <summary> /// This class is added to the Roslyn CodeAnalysis project to provide a kludgy way to communicate /// some information between the modified Roslyn compiler and the outside world. This use of /// static fields is not expected to work if the compiler is being used in a GUI environment, but /// the Roslyn modifications being made are only intended for use when doing builds, not when /// using Visual Studio for program development. /// /// Adding this class and these fields results in some RS0016 errors, which can then be "fixed" /// by hovering the mouse over the field name (with the squiggly underline) and selecting "Add to /// public API" from the light bulb menu. /// </summary> public static class StaticFields { //The length of the "stringy number", as used when naming the static classes: // "public static class Yacksxxxxx" public const int CStringyNumberLength = 5; // Static references to an Array and a Dictionary specifying the constant strings which // should be "disguised" (if the two references are not null). public static string[] StringyNumberToString = null; // Index is 1-based, not 0-based public static Dictionary<string, int> StringToStringyNumber = null; } }

In addition to adding the above source file to Roslyn, I made changes to four places in three classes in Roslyn. In each case, to reduce maintenance problems when re-applying my changes to future revisions of Roslyn, I limited myself to just adding or replacing a few lines in the Roslyn source file, and then placing the bulk of my code in a separate source file. This was possible by using the C# "partial class" facility, which was already being used on the three classes in question.

src\Compilers\CSharp\Portable\CodeGen\EmitExpression.cs

This source file is part of the CSharpCodeAnalysis project in Roslyn. In the Visual Studio Solution Explorer it can be found under CSharpCodeAnalysis - CodeGen.

private void EmitConstantExpression(TypeSymbol type, ConstantValue constantValue, bool used, SyntaxNode syntaxNode) { if (used) // unused constant has no side-effects { // Null type parameter values must be emitted as 'initobj' rather than 'ldnull'. if (((object)type != null) && (type.TypeKind == TypeKind.TypeParameter) && constantValue.IsNull) { EmitInitObj(type, used, syntaxNode); } else { //Yacks02: Call local method as front-end to _builder.EmitConstantValue() //_builder.EmitConstantValue(constantValue); EmitConstantValue(constantValue, syntaxNode); } } }

This code is at line 2839 in the revision of Roslyn I was working with. One line has been commented out, and one line has been added.

src\Compilers\CSharp\Portable\CodeGen\EmitExpression.Yacks.cs

This is a source file which has been added to the CSharpCodeAnalysis project.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using System; using Microsoft.CodeAnalysis.CSharp.Syntax; namespace Microsoft.CodeAnalysis.CSharp.CodeGen { partial class CodeGenerator { /// <summary> /// Method to either do standard processing for a constant expression value, or to emit a /// ldstr opcode for a disguised string. /// </summary> private void EmitConstantValue(ConstantValue constantValue, SyntaxNode syntaxNode) { int stringyNumber; int shiftOrUnshift; if (constantValue.Discriminator != ConstantValueTypeDiscriminator.String || constantValue.StringValue == null || constantValue.StringValue.Length != 0 || !GetStringyNumberIfRelevant(syntaxNode, out stringyNumber, out shiftOrUnshift)) { // Standard processing for all constant values other than the "" strings in the static // object constructors generated to implement undisguising of disguised strings _builder.EmitConstantValue(constantValue); return; } // This syntax node represents the "" string in a static object constructor associated with // a disguised string. Disguise the string associated with the stringy number and emit a // ldstr opcode to load it. This results in the disguised string being added to the #US // (user string) "metadata stream" in the PE file. _builder.ProcessAssignmentInStaticClassConstructorRhs( YacksCore.M0002(YacksForRoslyn.StaticFields.StringyNumberToString[stringyNumber - 1], -shiftOrUnshift), stringyNumber); } /// <summary> /// Method to examine the syntax node and its parent nodes to determine if this node /// represents the "" string in the "s = YacksCore.M0002("", ??);" statement in the /// constructor of one of the static classes that implement a disguised string. /// </summary> /// <returns>true = statement recognized and stringy number extracted</returns> private static bool GetStringyNumberIfRelevant(SyntaxNode syntaxNode, out int stringyNumber, out int shiftOrUnshift) { stringyNumber = 0; // Just to satisfy "out" keyword shiftOrUnshift = 0; if (!(syntaxNode.Parent is ArgumentSyntax)) return false; ArgumentListSyntax argumentListNode = syntaxNode.Parent.Parent as ArgumentListSyntax; if (argumentListNode == null) return false; InvocationExpressionSyntax invocationExpressionNode = syntaxNode.Parent.Parent.Parent as InvocationExpressionSyntax; if (invocationExpressionNode == null) return false; MemberAccessExpressionSyntax memberAccessExpressionNode = invocationExpressionNode.Expression as MemberAccessExpressionSyntax; if (memberAccessExpressionNode == null) return false; IdentifierNameSyntax identifierNameNode1 = memberAccessExpressionNode.Expression as IdentifierNameSyntax; if (identifierNameNode1 == null || identifierNameNode1.Identifier.Text != YacksCore.CYacksCore) return false; IdentifierNameSyntax identifierNameNode2 = memberAccessExpressionNode.Name as IdentifierNameSyntax; if (identifierNameNode2 == null || identifierNameNode2.Identifier.Text != YacksCore.CM0002) return false; if (argumentListNode.Arguments.Count != 2 || argumentListNode.Arguments[0].ToString() != "\"\"") return false; if (!int.TryParse(argumentListNode.Arguments[1].ToString(), out shiftOrUnshift)) return false; const string CStaticYacks = "static Yacks"; ConstructorDeclarationSyntax constructorDeclarationNode = invocationExpressionNode.Parent?.Parent?.Parent?.Parent as ConstructorDeclarationSyntax; if (constructorDeclarationNode == null || !constructorDeclarationNode.ToString().StartsWith(CStaticYacks, StringComparison.Ordinal)) return false; if (!int.TryParse(constructorDeclarationNode.ToString().Substring(CStaticYacks.Length, YacksForRoslyn.StaticFields.CStringyNumberLength), out stringyNumber) || stringyNumber <= 0 || stringyNumber > YacksForRoslyn.StaticFields.StringyNumberToString.Length) return false; return true; } } /// <summary> /// This is an extract from the YacksCore library assembly which contains some basic methods used /// by programs produced by the modified Roslyn compiler. /// /// This is an unfortunate violation of DRY (don't repeat yourself), but an attempt to convert /// the YacksCore project to .Net Standard caused problems. I may try again after Roslyn is /// upgraded to .Net Standard 2.0. /// /// The names of the methods, fields, etc. in this program are pre-obfuscated. Sorry about that. /// </summary> internal static class YacksCore { // Used to make generating source code that calls this method more self-defining public const string CYacksCore = nameof(YacksCore); public const string CM0002 = nameof(M0002); /// <summary> /// Method to "disguise" or "undisguise" a string using a version of the Caesar Cipher. /// /// The shiftOrUnshift argument (b) is an arbitrary "key value", and must be a non-zero /// integer between -65535 and 65535 (inclusive). To undisguise the disguised string you use /// the negative value. For example, if you disguise with -42, then you undisguise with +42, /// or vice-versa. /// /// The basic code comes from here: https://stackoverflow.com/a/48474433/253938 /// </summary> /// <param name="a">input string</param> /// <param name="b">shiftOrUnshift argument, see above</param> /// <returns>undisguised string</returns> public static string M0002(string a, int b) { const int c = UInt16.MaxValue + 1; char[] d = new char[a.Length]; for (int i = 0; i < a.Length; i++) d[i] = Convert.ToChar((Convert.ToInt32(a[i]) + b + c) % c); return new string(d); } } }

The purpose of all this code is to identify the assignment statement in the constructor of the static classes generated by the CreateSourceForStaticStringObjects() method. This is needed for two purposes, which will be revealed if you keep reading.

src\Compilers\Core\Portable\CodeGen\ILBuilderEmit.cs

This source file is part of the CodeAnalysis project in Roslyn. In the Visual Studio Solution Explorer it can be found under CodeAnalysis - CodeGen.

I modified this file in two different places.

internal void EmitToken(Cci.IReference value, SyntaxNode syntaxNode, DiagnosticBag diagnostics, bool encodeAsRawToken = false) { uint token = module?.GetFakeSymbolTokenForIL(value, syntaxNode, diagnostics) ?? 0xFFFF; // Setting the high bit indicates that the token value is to be interpreted literally rather than as a handle. if (encodeAsRawToken) { token |= Cci.MetadataWriter.LiteralMethodDefinitionToken; } this.GetCurrentWriter().WriteUInt32(token); //Yacks02: Test if this is LHS of assignment statement in static object constructor for // disguised strings, record the fake token if so ProcessPossibleAssignmentInStaticClassConstructorLhs(value, token); }

This is at line 48 in the revision of Roslyn that I was working with. One line of code has been added.

internal void EmitStringConstant(string value) { if (value == null) { EmitNullConstant(); } else { //Yacks02: Implement constant string reference, maybe to disguised string EmitStringOrLoadDisguisedString(value); } }

This is around line 682 in the revision of Roslyn that I was working with. (If you've been following along you may recognize this code from my first article about modifying Roslyn - this is where I added code to convert "Hello world" programs into "Hello universe" programs.) For this modification I've removed several lines and replaced them by a call to a method that is in the following file.

src\Compilers\Core\Portable\CodeGen\ILBuilderEmit.Yacks.cs

This is a source file which has been added to the CodeAnalysis project.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using System.Collections.Concurrent; using System.Diagnostics; using System.Reflection.Metadata; namespace Microsoft.CodeAnalysis.CodeGen { partial class ILBuilder { // Dictionary to convert a "stringy number" to the "fake token number" needed by the metadata // writer. This is static, so it is common for all instances of ILBuilder. internal static readonly ConcurrentDictionary<int, int> _StringyNumberToFakeTokenNumber = new ConcurrentDictionary<int, int>(); // If not zero, this is a "stringy number", and the constructor of one of the static objects // associated with disguised strings is currently being processed. This is not static, so one // of these fields exists for each instance of ILBuilder. private int _stringyNumberOrZero = 0; /// <summary> /// Method to do two things related to emitting processing for the "s = /// YacksCore.M0002("", ??);" statement in the constructor of one of the static classes that /// implement a disguised string. /// 1. A ldstr opcode is emitted to load the disguised string (instead of the "" string). /// 2. The "stringy number" is noted so that when the assignment to the "s" field gets /// emitted (LHS gets emitted immediately after emitting RHS), the "fake token number" will /// be recorded as the token associated with the disguised string. /// </summary> internal void ProcessAssignmentInStaticClassConstructorRhs(string disguisedString, int stringyNumber) { Debug.Assert(_stringyNumberOrZero == 0); // RHS then LHS processed for assignment EmitLdstrForConstantString(disguisedString); _stringyNumberOrZero = stringyNumber; } /// <summary> /// Method to record the fake token numbers associated with syntax nodes for the static s /// field in the static objects defined to implement the disguised strings. /// </summary> private void ProcessPossibleAssignmentInStaticClassConstructorLhs(Cci.IReference fieldSymbol, uint fakeToken) { if (_stringyNumberOrZero != 0 && fieldSymbol is IFieldSymbol) { Debug.Assert((fakeToken & 0xff000000) == 0); if (!_StringyNumberToFakeTokenNumber.TryAdd(_stringyNumberOrZero, (int)fakeToken)) Debug.Assert(false, "stringy number already in dictionary"); _stringyNumberOrZero = 0; } } /// <summary> /// This method replaces the standard processing for emitting a ldstr opcode for a constant /// string in method EmitStringConstant(). /// </summary> private void EmitStringOrLoadDisguisedString(string stringValue) { // Test if string is "disguised" and should be fetched via static object int stringyNumber; if (stringValue.Length <= 1 || YacksForRoslyn.StaticFields.StringToStringyNumber == null || !YacksForRoslyn.StaticFields.StringToStringyNumber.TryGetValue(stringValue, out stringyNumber)) { // Standard processing for non-disguised string EmitLdstrForConstantString(stringValue); return; } // At this point we would like to emit a ldsfld (load static field) opcode referencing the // static field "s" in the associated static class for this disguised string. But the "fake // token" for field "s" may not have been recorded yet. So we do something very kludgy: we // emit a ldstr opcode followed by the stringy number as a negative number. Then, later, in // Microsoft.Cci.MetadataWriter.WriteInstructions(), we replace the ldstr with an ldsfld, // and emit the fake token number (which must now be known) converted to an entity handle. EmitOpCode(ILOpCode.Ldstr); this.GetCurrentWriter().WriteInt32(-stringyNumber); } /// <summary> /// Method containing code moved here from the standard processing in method /// EmitStringConstant(). /// </summary> private void EmitLdstrForConstantString(string stringValue) { EmitOpCode(ILOpCode.Ldstr); EmitToken(stringValue); } } }

We're almost done now, but as mentioned in comments in the above code there is one final place in Roslyn that needs to be modified.

src\Compilers\Core\Portable\PEWriter\MetadataWriter.cs

This source file is also part of the CodeAnalysis project in Roslyn. In the Visual Studio Solution Explorer it can be found under CodeAnalysis - PEWriter.

case OperandType.InlineString: { writer.Offset = offset; int pseudoToken = ReadInt32(generatedIL, offset); //Yacks02: Test for temporary ldstr that needs to be replaced by ldsfld if (ProcessPossibleDisguisedStringReference(pseudoToken, generatedIL, ref offset, writer)) break; UserStringHandle handle;

This code is at line 3207 in the revision of Roslyn I was working with. One if statement and one break statement have been added.

src\Compilers\Core\Portable\PEWriter\MetadataWriter.Yacks.cs

This is another source file which has been added to the CodeAnalysis project.

// Copyright (c) Merlinia A/S. All Rights Reserved. Licensed under the Apache License, Version 2.0. (Just to be compatible with the Microsoft Roslyn license.) using System.Collections.Immutable; using System.Diagnostics; using System.Reflection.Metadata; using System.Reflection.Metadata.Ecma335; using Microsoft.CodeAnalysis.CodeGen; namespace Microsoft.Cci { internal partial class MetadataWriter { /// <summary> /// Method to perform some kludgy processing needed to get loading the reference to the static /// field "s" in the static objects that implement disguised strings to work. /// </summary> /// <returns>true = kludgy processing done, false = not disguised string reference</returns> private bool ProcessPossibleDisguisedStringReference(int pseudoToken, ImmutableArray<byte> generatedIL, ref int localOffset, BlobWriter blobWriter) { // Test if the pseudo/fake token is actually a (negative) stringy number, exit if not if (pseudoToken >= 0) return false; // Replace the ldstr opcode with ldsfld opcode in BlobWriter's data area Debug.Assert(ReadByte(generatedIL, localOffset - 1) == (byte)ILOpCode.Ldstr); blobWriter.Offset = localOffset - 1; blobWriter.WriteByte((byte)ILOpCode.Ldsfld); // Look up the fake/pseudo token for the "s" field, convert it to a handle and emit it int fakeToken; if (!ILBuilder._StringyNumberToFakeTokenNumber.TryGetValue(-pseudoToken, out fakeToken)) Debug.Assert(false, "stringy number not in dictionary"); blobWriter.WriteInt32( MetadataTokens.GetToken(ResolveEntityHandleFromPseudoToken(fakeToken))); localOffset += 4; return true; // No further processing for this opcode } } }

Incidentally, this is indeed some rather kludgy code, but it is more-or-less copied from some existing code in MetadataWriter.cs,  around line 3180.

So, does it work?

Here's the test program in Test1.cs that I was using when testing this modification:

using System; namespace ConsoleApplication2 { class Program { static void Main() { AnotherClass anotherClass = new AnotherClass(); anotherClass.WriteAString(); Console.WriteLine("Hello world!"); Console.WriteLine("Hello world again!"); Console.ReadKey(); } } class AnotherClass { internal void WriteAString() { Console.WriteLine("Hello everybody!!"); Console.WriteLine("Hello world again!"); } } }

If I disable the disguising of strings and compile this program, and then double-click on Test1.exe I get this:

ModRos 2 Snap1

And if I use the JetBrains dotPeek program to examine the Test1.exe file I can see this:

ModRos 2 Snap2

When I enable the disguising of strings and recompile the program, and double-click on the Test1.exe file I get this:

ModRos 2 Snap1

But now when I use dotPeek this is what I see:

ModRos 2 Snap3

Success!

Disclaimers

There are a lot of disclaimers I'd like to attach to this article, including, but not limited to, the following:

  1. I haven't tested this code with any real programs, so there may be bugs that prevent it from working in the real world.

  2. The so-called disguising of strings that is done here is trivial to reverse-engineer and then do an undisguising.

  3. This adds some extra processing time to every reference to a string literal. The references now have to check if the static object has been initialized or not, and for the first reference the string has to be undisguised. I have not tried to measure this added overhead.

  4. As indicated in the comments on file YacksForRoslyn.StaticFields.cs, this is practially guaranteed to not work in a GUI situation, where the Roslyn compiler is being used by Visual Studio or another IDE while the programmer is entering code. So it should only be used for non-interactive builds.

  5. The whole idea of inserting modifications into Roslyn source code is going to be difficult to maintain as Roslyn is constantly being updated. And the more modifications I make the more it will become a nightmare, unless I figure out some way to automatate the re-insertion of my modifications from time to time.

  6. There may be moral problems associated with using an open source program to produce obfuscated programs. I leave this question as an exercise for the reader.
You must login to post a comment.
Loading comment... The comment will be refreshed after 00:00.

Be the first to comment.