Wednesday, January 22, 2020

Constant reservation and Git hooks using C#

Let me tell you a story. Once upon a time, there were two developers: Sam and Bob. They worked with a project where a database was. When a developer wanted to make changes to the database structure they had to create a step file stepNNN.sql, where NNN was some number. To avoid collisions of the numbers between different developers they had a simple Web application. Each developer before starting to write an SQL file should go to the application and reserve a new number for their modifications.

That was a time for Sam and Bob to make changes in the database. And Sam obediently went to the Web application and reserved number 333. But Bob forgot to do it. He just used 333 for his new step file. It happened that Bob was the first who committed his changes into the version control system. When Sam was ready to commit it appeared that step333.sql already existed. He contacted Bob, explained to him that step 333 was already reserved and asked Bob to fix the collision. But Bob answered:

- Hey, man. You know, my code is already in the 'master' branch, many developers already took it. And also it is already on production. Could you just fix your code instead?

Have you noticed it? The person who followed all the rules was the one who was punished. Sam had to change his files, modify his local database, etc. Personally, I hate such situations. Let's see how we can avoid it.

General idea


How can we prevent such things from happening? What if Bob was not able to commit his changes if he has not reserved the corresponding number on the Web application?

And it can be implemented. We can use Git hooks to execute custom code before each commit. This code will check all changes that a developer wants to commit. If these changes contain a new step file, the code will contact the Web application and check if the number of the step file is reserved by the current developer. And if the number is not reserved, the code will prevent the commit.

This is the main idea. Now let's dig into details.

Git hooks on C#


Git does not limit you which language you should use to write hooks. As a C# developer, I'd like to use well-known C# for this purpose. Can I do it?

Yes, I can. I took the main idea from this article of Max Hamulyák. It requires us to use dotnet-script global tool. This tool requires .NET Core 2.1 + SDK to be installed on the developer machine. I think it is not unreasonable to have it installed if you are doing .NET development. Installation of the dotnet-script is very straightforward:

> dotnet tool install -g dotnet-script

Now we can write Git hooks using C#. To do it in the folder of your project go to .git\hooks directory and create pre-commit file (without any extension):

#!/usr/bin/env dotnet-script


Console.WriteLine("Git hook");

From this moment on every time you run git commit command you'll see Git hook message in your console.

Several processors for one hook


Well, it was a start. Now we can write anything in the pre-commit file. But I don't like this idea very much.

First, the writing of a script file is not very convenient. I'd prefer to use my favorite IDE with all its features. And I want to split complex code across several files.

But there is one more thing I don't like. Consider the following situation. You created a pre-commit file with some checks. But later you decided to add some more checks. You'll have to open the file, decide where to insert new code, decide how to interact with old code, etc. Personally, I prefer to write new code, not modify existing code.

Let's deal with these problems one at a time.

Call of external code


Here is what we'll do. We'll create some folder (e.g. gitHookAssemblies). In this folder, I'll place some .NET Core assembly (e.g. GitHooks). My script in the pre-commit file will just call some method from this assembly.

public class RunHooks
{
    public static void RunPreCommitHook()
    {
        Console.WriteLine("Git hook from assembly");
    }
}

I can create the assembly in my favorite IDE with using any tools I want.

Now in the pre-commit file, I can write:

#!/usr/bin/env dotnet-script

#r "../../gitHookAssemblies/GitHooks.dll"

GitHooks.RunHooks.RunPreCommitHook();


See how cool it is! Now I must only make changes in the GitHooks assembly. The code of pre-commit file will never change. Any time I need some new check, I'll change the code of RunPreCommitHook method, recompile the assembly and place it into the gitHookAssemblies folder. And that's it!

Well, not quite.

Fighting with cache 


Let's try to follow this process. Let's change the message for Console.WriteLine to something different, recompile the assembly and put in into gitHookAssemblies folder. After that call git commit again. What will we see? The old message. Our changes were not found. Why is that?

Let's say, that your project is in the c:\project folder. It means that Git hooks are stored in the c:\project\.git\hooks folder. Now, if you are on Windows 10, go to the c:\Users\<UserName>\AppData\Local\Temp\scripts\c\project\.git\hooks\ folder. Here <UserName> should be the name of your current user. What do we have here? When we run the pre-commit script, in this folder will be created a compiled version of the script. Here you can also find all referenced assemblies (including our GitHooks.dll). And in the execution-cache sub-folder you can find SHA256 file. I can suggest, that this file contains SHA256 hash of our pre-commit file. Any time we run the script, runtime compares the current hash of the file with the stored hash. If they are equal, the stored version of the compiled script will be used.

It means, that as we never change our pre-commit file, changes in the GitHooks.dll will never go to the cache and will never be used.

What can we do about it? Well, Reflection will help. I'll rewrite my script file to use Reflection instead of direct reference to the GitHooks assembly. Here is how our pre-commit file will look like:

#!/usr/bin/env dotnet-script

#r "nuget: System.Runtime.Loader, 4.3.0"

using System.IO;
using System.Runtime.Loader;

var hooksDirectory = Path.Combine(Environment.CurrentDirectory, "gitHookAssemblies");

var assemblyPath = Path.Combine(hooksDirectory, "GitHooks.dll");

var assembly = AssemblyLoadContext.Default.LoadFromAssemblyPath(assemblyPath);

if(assembly == null)
{
    Console.WriteLine($"Can't load assembly from '{assemblyPath}'.");
}

var collectorsType = assembly.GetType("GitHooks.RunHooks");

if(collectorsType == null)
{
    Console.WriteLine("Can't find entry type.");
}

var method = collectorsType.GetMethod("RunPreCommitHook", System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Static);

if(method == null)
{
    Console.WriteLine("Can't find method for pre-commit hooks.");
}

method.Invoke(null, new object[0]);

Now we can update GitHook.dll in our gitHookAssemblies folder at any moment and all changes will be executed by the same script. No need to change it at all.

It sounds fine, but still, there is one more problem we need to solve before going further. I'm talking about references.

Referencing assemblies


Everything looks fine when the only thing our RunHooks.RunPreCommitHook does is writing some string to console. But, frankly speaking, usually, we do not have much interest in writing strings. We need to do more complex things. And to do them we need to reference other assemblies and NuGet packages. Let's see how to do it.

I'll modify my RunHooks.RunPreCommitHook to use some LibGit2Sharp package:

public static void RunPreCommitHook()
{
    using var repo = new Repository(Environment.CurrentDirectory);

    Console.WriteLine(repo.Info.WorkingDirectory);
}

Now, if I try to run git commit, I'll get the following error message:

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> System.IO.FileLoadException: Could not load file or assembly 'LibGit2Sharp, Version=0.26.0.0, Culture=neutral, PublicKeyToken=7cbde695407f0333'. General Exception (0x80131500)

So we need some way to provide all referenced assemblies. The main idea here is the following. I'll place all assemblies required for the execution to the same gitHookAssemblies folder alongside with GitHooks.dll. To get all referenced assemblies in a .NET Core project you can use dotnet publish command. In our case, we need to place in this folder LibGit2Sharp.dll and git2-7ce88e6.dll.

Also, we have to modify our pre-commit script. We'll add the following code:

#!/usr/bin/env dotnet-script

#r "nuget: System.Runtime.Loader, 4.3.0"

using System.IO;
using System.Runtime.Loader;

var hooksDirectory = Path.Combine(Environment.CurrentDirectory, "gitHookAssemblies");

var assemblyPath = Path.Combine(hooksDirectory, "GitHooks.dll");

AssemblyLoadContext.Default.Resolving += (context, assemblyName) => {
    var assemblyPath = Path.Combine(hooksDirectory, $"{assemblyName.Name}.dll");
    if(File.Exists(assemblyPath))
    {
        return AssemblyLoadContext.Default.LoadFromAssemblyPath(assemblyPath);
    }

    return null;
};

...

This code will try to find all unknown assemblies in the gitHookAssemblies folder.

Now we can run git commit and it will execute without problems.

Improve extensibility


Now our pre-commit is complete. We don't need to modify it anymore. But in case of any changes, we'll need to modify RunHooks.RunPreCommitHook method. We just moved this problem to another level. Personally, I'd prefer to have some sort of plug-in system. Every time I need to add some action that must be executed before commit, I just write another plug-in and don't modify anything. Is it hard to implement?

Not at all. Let's use MEF. Here is how it works.

First, we define an interface for all hook handlers:

public interface IPreCommitHook
{
    bool Process(IList<string> args);
}

Each Git hook can get some string arguments passed by Git. These arguments will be in the args parameter. The Process method will return true if it allows changes to be committed, and false otherwise.

We definitely can define similar interfaces for other hooks, but in this article, we'll concentrate on pre-commit hook.

Now we implement this interface:

[Export(typeof(IPreCommitHook))]
public class MessageHook : IPreCommitHook
{
    public bool Process(IList<string> args)
    {
        Console.WriteLine("Message hook...");

        if(args != null)
        {
            Console.WriteLine("Arguments are:");
            foreach(var arg in args)
            {
                Console.WriteLine(arg);
            }
        }

        return true;
    }
}

Such classes can be defined in different assemblies if we want. Literally, there are no limitations. Attribute Export must be taken from System.ComponentModel.Composition NuGet package.

And we'll define a helper method that will collect all implementations of IPreCommitHook interface marked with Export attribute, run them all and return if any of them does not allow to continue the commit. I placed this code into separate GitHooksCollector assembly, but it is not so important:

public class Collectors
{
    private class PreCommitHooks
    {
        [ImportMany(typeof(IPreCommitHook))]
        public IPreCommitHook[] Hooks { get; set; }
    }

    public static int RunPreCommitHooks(IList<string> args, string directory)
    {
        var catalog = new DirectoryCatalog(directory, "*Hooks.dll");
        var container = new CompositionContainer(catalog);
        var obj = new PreCommitHooks();
        container.ComposeParts(obj);

        bool success = true;

        foreach(var hook in obj.Hooks)
        {
            success &= hook.Process(args);
        }

        return success ? 0 : 1;
    }
}

This code also uses System.ComponentModel.Composition NuGet package. First, we say that we'll look into all assemblies which name corresponds to the *Hooks.dll pattern in the directory folder. You may use any pattern you want here. Then we collect all exported implementations of IPreCommitHook interface into PreCommitHooks object. And finally, we run all handlers and compute aggregated execution result.

The last thing to do is to slightly change pre-commit file:

#!/usr/bin/env dotnet-script

#r "nuget: System.Runtime.Loader, 4.3.0"

using System.IO;
using System.Runtime.Loader;

var hooksDirectory = Path.Combine(Environment.CurrentDirectory, "gitHookAssemblies");

var assemblyPath = Path.Combine(hooksDirectory, "GitHooksCollector.dll");

AssemblyLoadContext.Default.Resolving += (context, assemblyName) => {
    var assemblyPath = Path.Combine(hooksDirectory, $"{assemblyName.Name}.dll");
    if(File.Exists(assemblyPath))
    {
        return AssemblyLoadContext.Default.LoadFromAssemblyPath(assemblyPath);
    }
    return null;
};

var assembly = AssemblyLoadContext.Default.LoadFromAssemblyPath(assemblyPath);
if(assembly == null)
{
    Console.WriteLine($"Can't load assembly from '{assemblyPath}'.");
}

var collectorsType = assembly.GetType("GitHooksCollector.Collectors");
if(collectorsType == null)
{
    Console.WriteLine("Can't find collector's type.");
}

var method = collectorsType.GetMethod("RunPreCommitHooks", System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Static);
if(method == null)
{
    Console.WriteLine("Can't find collector's method for pre-commit hooks.");
}

int exitCode = (int) method.Invoke(null, new object[] { Args, hooksDirectory });

Environment.Exit(exitCode);

And don't forget to place all participating assemblies into the gitHookAssemblies folder.

Wow, that was a long preamble. But now we have a pretty robust solution for writing Git hooks using C#. All we need is to modify the content of gitHookAssemblies folder. The content of this folder can be placed under version control system and thus distributed across all developers.

Anyway, it is time to solve our initial problem.

Web service for constants registration


We wanted to make sure that developers will not be able to commit changes if they forgot to register corresponding constants on a Web service. Let's create a simple Web service for our needs. I'll use ASP.NET Core Web service with Windows authentication. But actually, there are many variants can be used here.

using System.Collections.Generic;
using System.Linq;
using Microsoft.AspNetCore.Authorization;
using Microsoft.AspNetCore.Mvc;

namespace ListsService.Controllers
{
    public sealed class ListItem<T>
    {
        public ListItem(T value, string owner)
        {
            Value = value;
            Owner = owner;
        }

        public T Value { get; }
        public string Owner { get; }
    }

    public static class Lists
    {
        public static List<ListItem<int>> SqlVersions = new List<ListItem<int>>
        {
            new ListItem<int>(1, @"DOMAIN\Iakimov")
        };

        public static Dictionary<int, List<ListItem<int>>> AllLists = new Dictionary<int, List<ListItem<int>>>
        {
            {1, SqlVersions}
        };
    }

    [Authorize]
    public class ListsController : Controller
    {
        [Route("/api/lists/{listId}/ownerOf/{itemId}")]
        [HttpGet]
        public IActionResult GetOwner(int listId, int itemId)
        {
            if (!Lists.AllLists.ContainsKey(listId))
                return NotFound();

            var item = Lists.AllLists[listId].FirstOrDefault(li => li.Value == itemId);
            if(item == null)
                return NotFound();

            return Json(item.Owner);

        }
    }
}

Here I use static class Lists as a storage mechanism for testing purposes only. Each list will have an integer identifier. Each list will contain integer items with information about people who registered them. Method GetOwner of ListController class allows getting some identifier of the person who registered the corresponding list item.

Checking SQL step files


Now we are ready to check if we can commit a new SQL step file or not. Let's say that we store SQL step files the following way. In the main folder of the project, we have sql sub-folder. In this folder, every developer can create verXXX folder where XXX is some number that must be registered in the Web service. And inside verXXX folder should be one or several .sql files that provide modifications to the database. We'll not discuss the problem of the order of execution of these .sql files here. It is not relevant to our discussion. All we want to do is the following. If a developer wants to commit any new file inside some sql/verXXX folder we must check if constant XXX was registered by this developer.

Here is the code of corresponding Git hook:

[Export(typeof(IPreCommitHook))]
public class SqlStepsHook : IPreCommitHook
{
    private static readonly Regex _expr = new Regex("\\bver(\\d+)\\b");

    public bool Process(IList<string> args)
    {
        using var repo = new Repository(Environment.CurrentDirectory);

        var items = repo.RetrieveStatus()
            .Where(i => !i.State.HasFlag(FileStatus.Ignored))
            .Where(i => i.State.HasFlag(FileStatus.NewInIndex))
            .Where(i => i.FilePath.StartsWith(@"sql"));

        var versions = new HashSet<int>(
            items
            .Select(i => _expr.Match(i.FilePath))
            .Where(m => m.Success)
            .Select(m => m.Groups[1].Value)
            .Select(d => int.Parse(d))
            );

        foreach(var version in versions)
        {
            if (!ListItemOwnerChecker.DoesCurrentUserOwnListItem(1, version))
                return false;
        }

        return true;
    }
}

Here we use Repository class from LibGit2Sharp NuGet package. The items variable will contain all new files in the Git index located inside sql folder. You can improve the procedure of finding such files if you wish. Into the versions variable we collect all different XXX constants from verXXX folders. And, finally, method ListItemOwnerChecker.DoesCurrentUserOwnListItem checks if the version is registered by the current user on the Web service in the list 1.

Implementation of ListItemOwnerChecker.DoesCurrentUserOwnListItem is quite simple:

class ListItemOwnerChecker
{
    public static string GetListItemOwner(int listId, int itemId)
    {
        var handler = new HttpClientHandler
        {
            UseDefaultCredentials = true
        };

        var client = new HttpClient(handler);

        var response = client.GetAsync($"https://localhost:44389/api/lists/{listId}/ownerOf/{itemId}")
            .ConfigureAwait(false)
            .GetAwaiter()
            .GetResult();

        if (response.StatusCode == System.Net.HttpStatusCode.NotFound)
        {
            return null;
        }

        var owner = response.Content
            .ReadAsStringAsync()
            .ConfigureAwait(false)
            .GetAwaiter()
            .GetResult();

        return JsonConvert.DeserializeObject<string>(owner);
    }

    public static bool DoesCurrentUserOwnListItem(int listId, int itemId)
    {
        var owner = GetListItemOwner(listId, itemId);

        if (owner == null)
        {
            Console.WriteLine($"There is no item '{itemId}' in the list '{listId}' registered on the lists service.");
            return false;
        }

        if (owner != WindowsIdentity.GetCurrent().Name)
        {
            Console.WriteLine($"Item '{itemId}' in the list '{listId}' registered by '{owner}' and you are '{WindowsIdentity.GetCurrent().Name}'.");
            return false;
        }

        return true;
    }
}

Here we ask the Web service for the identifier of the user who registered required constant (GetListItemOwner method). Then we compare it with the name of the current Windows user. This is only one way to implement this functionality from many possible. For example, you can use the name or e-mail of a user from the Git config.

And that is it. Just build the corresponding assembly and place it into the gitHookAssemblies folder with all referenced assemblies. Everything will work automatically.

Checking enum values


Well, it's great. Now nobody can commit new changes for SQL database without registering the corresponding constant in the Web service first. But we can use this method in other places where some constants should be reserved.

For example, somewhere in the code, there can be an enum. Every developer can add some member into the enum and assign some integer value for the member:

enum Constants
{
    Val1 = 1,
    Val2 = 2,
    Val3 = 3
}

We want to avoid collisions of values for members of this enum. This is why we require to register corresponding integer constant in the Web service first. How hard is it to implement the check of registration for such constants?

Here is the code of new Git hook:

[Export(typeof(IPreCommitHook))]
public class ConstantValuesHook : IPreCommitHook
{
    public bool Process(IList<string> args)
    {
        using var repo = new Repository(Environment.CurrentDirectory);

        var constantsItem = repo.RetrieveStatus()
            .Staged
            .FirstOrDefault(i => i.FilePath == @"src/GitInteraction/Constants.cs");

        if (constantsItem == null)
            return true;

        if (!constantsItem.State.HasFlag(FileStatus.NewInIndex)
            && !constantsItem.State.HasFlag(FileStatus.ModifiedInIndex))
            return true;

        var initialContent = GetInitialContent(repo, constantsItem);
        var indexContent = GetIndexContent(repo, constantsItem);

        var initialConstantValues = GetConstantValues(initialContent);
        var indexConstantValues = GetConstantValues(indexContent);

        indexConstantValues.ExceptWith(initialConstantValues);

        if (indexConstantValues.Count == 0)
            return true;

        foreach (var version in indexConstantValues)
        {
            if (!ListItemOwnerChecker.DoesCurrentUserOwnListItem(2, version))
                return false;
        }

        return true;
    }

    ...
}

First, we check if the corresponding file with our enum was modified. Then we extract the content of this file from Git storage (previously committed version) and from Git index using GetInitialContent and GetIndexContent methods. Here are their implementations:

private string GetInitialContent(Repository repo, StatusEntry item)
{
    var blob = repo.Head.Tip[item.FilePath]?.Target as Blob;

    if (blob == null)
        return null;

    using var content = new StreamReader(blob.GetContentStream(), Encoding.UTF8);

    return content.ReadToEnd();
}

private string GetIndexContent(Repository repo, StatusEntry item)
{
    var id = repo.Index[item.FilePath]?.Id;
    if (id == null)
        return null;

    var itemBlob = repo.Lookup<Blob>(id);
    if (itemBlob == null)
        return null;

    using var content = new StreamReader(itemBlob.GetContentStream(), Encoding.UTF8);

    return content.ReadToEnd();
}

Then we extract integer values of the enum members from both versions of the enum. It is done in the GetConstantValues method. I have used Roslyn to implement this functionality. You can take it from Microsoft.CodeAnalysis.CSharp NuGet package.

private ISet<int> GetConstantValues(string fileContent)
{
    if (string.IsNullOrWhiteSpace(fileContent))
        return new HashSet<int>();

    var tree = CSharpSyntaxTree.ParseText(fileContent);

    var root = tree.GetCompilationUnitRoot();

    var enumDeclaration = root
        .DescendantNodes()
        .OfType<EnumDeclarationSyntax>()
        .FirstOrDefault(e => e.Identifier.Text == "Constants");

    if(enumDeclaration == null)
        return new HashSet<int>();

    var result = new HashSet<int>();

    foreach (var member in enumDeclaration.Members)
    {
        if(int.TryParse(member.EqualsValue.Value.ToString(), out var value))
        {
            result.Add(value);
        }
    }

    return result;
}

When using Roslyn I faced the following problem. When I wrote my code the latest version of Microsoft.CodeAnalysis.CSharp NuGet package was 3.4.0. I placed the assembly into the gitHookAssemblies folder, but the code said that it can't find the corresponding version of the assembly. Here is the reason. You see, dotnet-script also uses Roslyn for work. It means, that some version of Microsoft.CodeAnalysis.CSharp assembly was already loaded into the domain. For me, it was version 3.3.1. When I started to use this version of the NuGet package the problem vanished.

Finally, in the Process method of our hook handler, we choose all new values and check their owners on our Web service.

Points of interest


Here we are. Our system to check the constant reservations is built. In the end, I'd like to talk about some problems that we should think about.

1. We created a pre-commit hook file, but we have not talked about how to place it into .git\hooks folder on the computers of all developers. We can use --template parameter of git init command. Or something like this:

git config init.templatedir git_template_dir

git init

Or we can use core.hooksPath Git configuration option if you have Git 2.9 or later:

git config core.hooksPath git_template_dir

Or we can make it a part of the build process for our project.

2. The same question comes about the installation of dotnet-script. We either can pre-install it on all developer machines with some version of .NET Core, or we can install it as a part of the build process.

3. Personally, I see the biggest problem with the location of referenced assemblies. We agreed to place all of them into gitHookAssemblies folder, but I'm not sure it can help in all situations. For example, LibGit2Sharp package comes with many native libraries for different operating systems. Here I used git2-7ce88e6.dll suitable for Win-x64. But if different developers use different operating systems we can face some problems.

4. We said almost nothing about the implementation of the Web service. Here we used Windows authentication, but there are many possible options. Also, the Web service should provide some UI for the reservation of new constants and for the creation of new lists.

5. Maybe you have noticed, that usage of async operation in our Git hook handlers was awkward. I think, better support for such operations should be implemented.

Conclusion


In this article, we learned how to build a robust system for writing Git hooks using .NET languages. On this basis, we wrote several hook handlers that allow us to check the reservation of different constants and prevent commits in case of violations.

I hope this information will be helpful to you. Good luck!

P.S. You can find the code for the article on GitHub.

1 comment: