Wednesday, December 27, 2017

Finding typos and usage of obsolete properties in JSON

JSON format is very widespread now. Many Web APIs return their results in this format. Also, many APIs accept incoming requests in the same format. Structure of incoming JSON request can be very complex. It is not uncommon to make a typo in such a document. In this article I'd like to discuss, how can we detect these typos and inform users about them in a friendly form.

Let's start with a simple example. I have the following class:

public class Range
{
    public int? From { get; set; }
    public int? To { get; set; }
}

I want to deserialize a user request in the form of JSON string into this object:

var settings = new JsonSerializerSettings
{
    Converters =
    {
        new StringEnumConverter {CamelCaseText = false}
    },

    ContractResolver = new CamelCasePropertyNamesContractResolver()
};

var result = JsonConvert.DeserializeObject<Range>(jsonString, settings);

Console.WriteLine("Range is from: " + result.From);
Console.WriteLine("Range is to: " + result.To);

How do you think, what will be the result of execution of this code, if jsonString is:

{
    form: 3,
    to: 5
}

Here is the result:

Range is from:
Range is to: 5

The reason for this strange result is that instead of FROM we wrote FORM in our JSON.

In this simple example, it is rather easy to find out why the result differs from the expected. But consider a case, when you have very long JSON with deep nesting. In this case, it is not so easy to identify the problem. I suggest helping a user to find these problems by providing useful warning messages when some typo occurs.

Looking for typos


How can we understand if something is a typo or not? In general if during deserialization we face some property in the JSON, which does not have a corresponding property in the object model, we can talk about a typo.

By default Json.Net just ignore such problems. But we can change this behavior by modifying MissingMemberHandling property of serializer settings. If we set the value of this property to MissingMemberHandling.Error, it will make serializer to throw an exception if there is no member for JSON property. We can handle this exception using Error event of serializer settings:

var settings = new JsonSerializerSettings
{
    Converters =
    {
        new StringEnumConverter {CamelCaseText = false}
    },

    ContractResolver = new CamelCasePropertyNamesContractResolver(),
    MissingMemberHandling = MissingMemberHandling.Error,
    Error = (sender, args) =>
    {
        ...
    }
};

The only thing we should do here is to distinguish errors raised by missing member from all other sorts of error. Unfortunately, Json.Net does not give us a lot of help here. The only thing we can do is to check message in the exception:

var discriminator = new Regex("^Could not find member '[^']*' on object of type '[^']*'");

var messages = new List<string>();

var settings = new JsonSerializerSettings
{
    Converters =
    {
        new StringEnumConverter {CamelCaseText = false}
    },

    ContractResolver = new CamelCasePropertyNamesContractResolver(),
    MissingMemberHandling = MissingMemberHandling.Error,
    Error = (sender, args) =>
    {
        if (discriminator.IsMatch(args.ErrorContext.Error.Message))
        {
            args.ErrorContext.Handled = true;
            messages.Add($"Property {args.ErrorContext.Member} ({args.ErrorContext.Path}) is not defined on objects of '{args.CurrentObject.GetType().Name}' class.");
        }
    }
};

var result = JsonConvert.DeserializeObject<Range>(jsonString, settings);

foreach (var message in messages)
{
    Console.WriteLine(message);
}

Console.WriteLine("-----------------------------------");

Console.WriteLine("Range is from: " + result.From);
Console.WriteLine("Range is to: " + result.To);


Please notice, that we set args.ErrorContext.Handled to true. It allows the serializer to continue its work.

I want to emphasize that this is a very fragile way to distinguish types of errors. If Json.Net team decide to change error message or they implement internationalization support, this code will be broken.

Nevertheless, now we have our error message:

Property form (form) is not defined on objects of 'Range' class.

And even better, we have information about where exactly the typo was from args.ErrorContext.Path property. Try to deserialize the following array of ranges (you should use JsonConvert.DeserializeObject<Range[]> now):

[
    {
        from: 1,
        to: 3
    },
    {
        form: 3,
        to: 5
    },
    {
        from: 5,
        to: 10
    }
]

You'll get the following warning message:

Property form ([1].form) is not defined on objects of 'Range' class.

As you can see, we have the exact path to the typo: the second element in the root array.

It looks great! Are we done? Not yet. There are couple things to do.

Discriminator fields


Let's consider a slightly more complex example. I want to deserialize objects belonging to a hierarchy of classes:

public abstract class Value
{
    public Value[] Values { get; set; }
}

public class IntValue : Value
{
    public int Value { get; set; }
}

public class StringValue : Value
{
    public string Value { get; set; }
}

To do these things I must implement custom converter:

public enum ValueType
{
    Integer,
    String
}

public class ValueJsonConverter : JsonConverter
{
    public override bool CanWrite => false;

    public override bool CanConvert(Type objectType)
    {
        return typeof(Value).IsAssignableFrom(objectType);
    }

    public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
    {
        throw new NotSupportedException("Custom converter should only be used while deserializing.");
    }

    public override object ReadJson(JsonReader reader, Type objectType, object existingValue,
        JsonSerializer serializer)
    {
        if (reader.TokenType == JsonToken.Null)
            return null;

        // Load JObject from stream
        JObject jObject = JObject.Load(reader);
        if (jObject == null)
            return null;

        ValueType valueType;
        if (Enum.TryParse(jObject.Value<string>("type"), true, out valueType))
        {
            switch (valueType)
            {
                case ValueType.String:
                    var stringValueModel = new StringValue();
                    serializer.Populate(jObject.CreateReader(), stringValueModel);
                    return stringValueModel;
                case ValueType.Integer:
                    var intValueModel = new IntValue();
                    serializer.Populate(jObject.CreateReader(), intValueModel);
                    return intValueModel;
                default:
                    throw new ArgumentException($"Unknown value type '{valueType}'");
            }
        }

        throw new ArgumentException("Unable to parse value object");
    }
}

Now I can use it to deserialize objects of Value class:

var jsonString = @"
[
    {
        type: 'integer',
        value: 3
    },
    {
        type: 'string',
        value: 'aaa'
    }
]
";

var discriminator = new Regex("^Could not find member '[^']*' on object of type '[^']*'");

var messages = new List<string>();

var settings = new JsonSerializerSettings
{
    Converters =
    {
        new StringEnumConverter {CamelCaseText = false},
        new ValueJsonConverter()
    },

    ContractResolver = new CamelCasePropertyNamesContractResolver(),
    MissingMemberHandling = MissingMemberHandling.Error,
    Error = (sender, args) =>
    {
        if (discriminator.IsMatch(args.ErrorContext.Error.Message))
        {
            args.ErrorContext.Handled = true;
            messages.Add($"Property {args.ErrorContext.Member} ({args.ErrorContext.Path}) is not defined on objects of '{args.CurrentObject.GetType().Name}' class.");
        }
    }
};

var result = JsonConvert.DeserializeObject<Value[]>(jsonString, settings);

foreach (var message in messages)
{
    Console.WriteLine(message);
}

What do you think will be the result of execution of this method? Here it is:

Property type (type) is not defined on objects of 'IntValue' class.
Property type (type) is not defined on objects of 'StringValue' class.

Indeed, 'type' property is not a member of Value class or its descendants. We just use it for discrimination of different classes.

So there must be a way to exclude some properties from our warning messages. Here is how we'll make it.

First of all, I'll extract logiс of handling typos into a separate class:

public class TyposHandler
{
    private static readonly Regex Discriminator = new Regex("^Could not find member '[^']*' on object of type '[^']*'");

    private readonly List<string> _messages = new List<string>();
    private readonly List<Predicate<ErrorEventArgs>> _ignored = new List<Predicate<ErrorEventArgs>>();

    public IReadOnlyList<string> Messages => _messages;

    public void Handle(object sender, ErrorEventArgs args)
    {
        if (!Discriminator.IsMatch(args.ErrorContext.Error?.Message ?? ""))
            return;

        args.ErrorContext.Handled = true;

        if (!_ignored.Any(p => p(args)))
        {
            _messages.Add($"Property {args.ErrorContext.Member} ({args.ErrorContext.Path}) is not defined on objects of '{args.CurrentObject.GetType().Name}' class.");
        }
    }

    public void Ignore(Predicate<ErrorEventArgs> selector)
    {
        if (selector == null) throw new ArgumentNullException(nameof(selector));
        _ignored.Add(selector);
    }
}

It has Ignore method, which allows defining a predicate for ignoring some missing values. Here is how we can use it:

var jsonString = @"
[
    {
        type: 'integer',
        value: 3
    },
    {
        type: 'string',
        value: 'aaa'
    }
]
";

var handler = new TyposHandler();
handler.Ignore(e => e.CurrentObject is Value && e.ErrorContext.Member.ToString() == "type");

var settings = new JsonSerializerSettings
{
    Converters =
    {
        new StringEnumConverter {CamelCaseText = false},
        new ValueJsonConverter()
    },

    ContractResolver = new CamelCasePropertyNamesContractResolver(),
    MissingMemberHandling = MissingMemberHandling.Error,
    Error = handler.Handle
};

var result = JsonConvert.DeserializeObject<Value[]>(jsonString, settings);

foreach (var message in handler.Messages)
{
    Console.WriteLine(message);
}

Now we don't have any warning messages for 'type' property.

Incorrect path


Let me add one missing property to the JSON I want to deserialize:

[
    {
        type: 'integer',
        value: 3,
        unknown: 'aaa'
    },
    {
        type: 'string',
        value: 'aaa'
    }
]

Now I'll have the following warning message:

Property unknown (unknown) is not defined on objects of 'IntValue' class.

Do you see the problem? The path (unknown) is incorrect. It should be ([0].unknown). What is the reason for the problem?

The reason is in our ValueJsonConverter class. There we create a new standalone JObject:

JObject jObject = JObject.Load(reader);

and then populate our model from properties of this object:

serializer.Populate(jObject.CreateReader(), model);

If you look at the implementation of Path property of a JToken object, you'll see that it relies on the path of the parent token. But the object we created using JObject.Load does not have a parent. It is standalone. It means, that we lost context here.

To fix this problem we'll introduce a stack of paths:

var jsonString = @"
[
    {
        type: 'integer',
        value: 3,
        unknown: 'aaa'
    },
    {
        type: 'string',
        value: 'aaa'
    }
]
";

var paths = new Stack<string>();

var handler = new TyposHandlerWithPath(paths);
handler.Ignore(e => e.CurrentObject is Value && e.ErrorContext.Member.ToString() == "type");

var settings = new JsonSerializerSettings
{
    Converters =
    {
        new StringEnumConverter {CamelCaseText = false},
        new ValueJsonConverterWithPath(paths)
    },

    ContractResolver = new CamelCasePropertyNamesContractResolver(),
    MissingMemberHandling = MissingMemberHandling.Error,
    Error = handler.Handle
};

var result = JsonConvert.DeserializeObject<Value[]>(jsonString, settings);

foreach (var message in handler.Messages)
{
    Console.WriteLine(message);
}

We'll pass this stack to our handler of typos and to any values converter we use. Here is how we use this stack in the ReadJson method of a value converter:

if (reader.TokenType == JsonToken.Null)
    return null;

var path = reader.Path;

// Load JObject from stream
JObject jObject = JObject.Load(reader);
if (jObject == null)
    return null;

ValueType valueType;
if (Enum.TryParse(jObject.Value<string>("type"), true, out valueType))
{
    switch (valueType)
    {
        case ValueType.String:
            var stringValueModel = new StringValue();
            _pathsStack.Push(path);
            serializer.Populate(jObject.CreateReader(), stringValueModel);
            _pathsStack.Pop();
            return stringValueModel;
        case ValueType.Integer:
            var intValueModel = new IntValue();
            _pathsStack.Push(path);
            serializer.Populate(jObject.CreateReader(), intValueModel);
            _pathsStack.Pop();
            return intValueModel;
        default:
            throw new ArgumentException($"Unknown value type '{valueType}'");
    }
}

throw new ArgumentException($"Unable to parse value object");

We'll push current path into the stack before calling serializer.Populate and pop it after the call. Now the stack will contain all parts of the full path from the root of JSON.

Here is how we use it in our typos handler. Take a look at the GetPath method:

public class TyposHandlerWithPath
{
    private readonly Stack<string> _paths;

    private static readonly Regex Discriminator = new Regex("^Could not find member '[^']*' on object of type '[^']*'");

    private readonly List<string> _messages = new List<string>();
    private readonly List<Predicate<ErrorEventArgs>> _ignored = new List<Predicate<ErrorEventArgs>>();

    public IReadOnlyList<string> Messages => _messages;

    public TyposHandlerWithPath(Stack<string> paths)
    {
        _paths = paths;
    }

    public void Handle(object sender, ErrorEventArgs args)
    {
        if (!Discriminator.IsMatch(args.ErrorContext.Error?.Message ?? ""))
            return;

        args.ErrorContext.Handled = true;

        if (!_ignored.Any(p => p(args)))
        {
            _messages.Add($"Property {args.ErrorContext.Member} ({GetPath(args.ErrorContext.Path)}) is not defined on objects of '{args.CurrentObject.GetType().Name}' class.");
        }
    }

    private string GetPath(string path)
    {
        var pathBuilder = new StringBuilder();

        foreach (var pathPart in _paths.Reverse())
        {
            AddPathPart(pathBuilder, pathPart);
        }

        if (!string.IsNullOrWhiteSpace(path))
        {
            AddPathPart(pathBuilder, path);
        }

        return pathBuilder.ToString();
    }

    private void AddPathPart(StringBuilder pathBuilder, string pathPart)
    {
        if (pathBuilder.Length == 0)
            pathBuilder.Append(pathPart);
        else if (pathPart.StartsWith("["))
            pathBuilder.Append(@"\" + pathPart);
        else
            pathBuilder.Append(@"." + pathPart);
    }

    public void Ignore(Predicate<ErrorEventArgs> selector)
    {
        if (selector == null) throw new ArgumentNullException(nameof(selector));
        _ignored.Add(selector);
    }
}

Here we combine current path with all previously stored paths in the stack. It allows us to reconstruct correct path to any JSON property. In our case we'll get the following warning message:

Property unknown ([0].unknown) is not defined on objects of 'IntValue' class.

Now it is time to consider the last problem we have.

Obsolete properties


What can I say? Things change. Even APIs. Some methods of interaction become obsolete. In .NET there is ObsoleteAttribute, which you can use to mark members that should not be used anymore. How to do it in JSON?

The problem here is that usage of an obsolete property is not a typo. There is an existing property in .NET type we want to deserialize. How to inform a serializer that usage of this property is not allowed? We will throw an exception.

The Error property of JsonSerializerSettings class allows us to set a handler for all exceptions (at least for JsonSerializationException exceptions). If serializer tries to set a value to an obsolete property, we'll throw our exception derived from JsonSerializationException. Then we'll catch this exception in the Error handler and process it.

But how to throw an exception while setting a property? We will use ContractResolver here. Now we are setting it to a standard one:

ContractResolver = new CamelCasePropertyNamesContractResolver()

But let's create our own implementation of contract resolver:

public class ObsoletePropertiesContractResolver : CamelCasePropertyNamesContractResolver
{
    protected override IValueProvider CreateMemberValueProvider(MemberInfo member)
    {
        var provider = base.CreateMemberValueProvider(member);

        if (member.GetCustomAttributes(typeof(ObsoleteAttribute)).Any())
            return new ObsoletePropertyValueProvider(provider, member);

        return provider;
    }
}

public class ObsoletePropertyValueProvider : IValueProvider
{
    private readonly IValueProvider _valueProvider;
    private readonly MemberInfo _memberInfo;

    public ObsoletePropertyValueProvider(
        IValueProvider valueProvider, 
        MemberInfo memberInfo)
    {
        _valueProvider = valueProvider;
        _memberInfo = memberInfo;
    }

    public void SetValue(object target, object value)
    {
        _valueProvider.SetValue(target, value);
        throw new ObsoletePropertyException(_memberInfo.DeclaringType, _memberInfo.Name);
    }

    public object GetValue(object target)
    {
        return _valueProvider.GetValue(target);
    }
}

[Serializable]
public class ObsoletePropertyException : JsonSerializationException
{
    public Type MemberType { get; }
    public string PropertyName { get; }

    public ObsoletePropertyException(Type memberType, string propertyName)
    {
        MemberType = memberType;
        PropertyName = propertyName;
    }
}

As you can see, we return our own value provider for all properties marked with Obsolete attribute. This provider throws our exception after setting a value to the property. Now we can catch it:

public class TyposAndObsoleteHandlerWithPath
{
    private static readonly Regex Discriminator = new Regex("^Could not find member '[^']*' on object of type '[^']*'");

    private readonly Stack<string> _paths;

    private readonly List<string> _messages = new List<string>();
    private readonly List<Predicate<ErrorEventArgs>> _ignored = new List<Predicate<ErrorEventArgs>>();
    private readonly List<Func<Type, string, string>> _obsoleteMessages = new List<Func<Type, string, string>>();

    public TyposAndObsoleteHandlerWithPath(Stack<string> paths)
    {
        _paths = paths ?? throw new ArgumentNullException(nameof(paths));
    }

    public IReadOnlyList<string> Messages => _messages;

    public void Handle(object sender, ErrorEventArgs args)
    {
        if (args.ErrorContext.Error is ObsoletePropertyException)
        {
            HandleObsoleteProperty(args, (ObsoletePropertyException) args.ErrorContext.Error);
            args.ErrorContext.Handled = true;
            return;
        }

        if(!Discriminator.IsMatch(args.ErrorContext.Error?.Message ?? ""))
            return;

        args.ErrorContext.Handled = true;

        if (!_ignored.Any(p => p(args)))
        {
            _messages.Add($"Property {args.ErrorContext.Member} ({GetPath(args.ErrorContext.Path)}) is not defined on objects of '{args.CurrentObject.GetType().Name}' class.");
        }
    }

    private void HandleObsoleteProperty(ErrorEventArgs args, ObsoletePropertyException errorContextError)
    {
        var message = _obsoleteMessages
            .Select(p => p(errorContextError.MemberType, errorContextError.PropertyName))
            .FirstOrDefault(m => !string.IsNullOrWhiteSpace(m));

        if(!string.IsNullOrWhiteSpace(message))
            _messages.Add($"Property {args.ErrorContext.Member} ({GetPath(args.ErrorContext.Path)}) is obsolete on objects of '{args.CurrentObject.GetType().Name}' class. {message}");
        else
            _messages.Add($"Property {args.ErrorContext.Member} ({GetPath(args.ErrorContext.Path)}) is obsolete on objects of '{args.CurrentObject.GetType().Name}' class.");
    }

    private string GetPath(string path)
    {
        var pathBuilder = new StringBuilder();

        foreach (var pathPart in _paths.Reverse())
        {
            AddPathPart(pathBuilder, pathPart);
        }

        if (!string.IsNullOrWhiteSpace(path))
        {
            AddPathPart(pathBuilder, path);
        }

        return pathBuilder.ToString();
    }

    private void AddPathPart(StringBuilder pathBuilder, string pathPart)
    {
        if (pathBuilder.Length == 0)
            pathBuilder.Append(pathPart);
        else if (pathPart.StartsWith("["))
            pathBuilder.Append(@"\" + pathPart);
        else
            pathBuilder.Append(@"." + pathPart);
    }

    public void Ignore(Predicate<ErrorEventArgs> selector)
    {
        if (selector == null) throw new ArgumentNullException(nameof(selector));
        _ignored.Add(selector);
    }

    public void AddObsoleteMessage(Func<Type, string, string> messageProvider)
    {
        if (messageProvider == null) throw new ArgumentNullException(nameof(messageProvider));
        _obsoleteMessages.Add(messageProvider);
    }
}

Here we also add custom messages for obsolete properties. These messages must explain, how to achieve the same result without the usage of the specific obsolete property. In fact, we could extract a message from the Obsolete attribute. But this message usually relates to .NET API, not to JSON API. This is why I think these messages should be different.

Let's test out code now. I'll add an obsolete property to the StringValue class:

public class StringValue : Value
{
    public string Value { get; set; }

    [Obsolete]
    public string Id { get; set; }
}

Now we'll deserialize JSON which sets obsolete property:

var jsonString = @"
[
    {
        type: 'integer',
        value: 3,
    },
    {
        type: 'string',
        value: 'aaa',
        id: 'bbb'
    }
]
";
Stack<string> pathsStack = new Stack<string>();

var handler = new TyposAndObsoleteHandlerWithPath(pathsStack);
handler.Ignore(e => e.CurrentObject is Value && e.ErrorContext.Member.ToString() == "type");
handler.AddObsoleteMessage((type, name) =>
{
    if (type == typeof(StringValue) && name == "Id")
        return "Use another property here";
    return null;
});

var settings = new JsonSerializerSettings
{
    Converters =
    {
        new StringEnumConverter {CamelCaseText = false},
        new ValueJsonConverterWithPath(pathsStack)
    },

    ContractResolver = new ObsoletePropertiesContractResolver(),

    MissingMemberHandling = MissingMemberHandling.Error,
    Error = handler.Handle
};

var result = JsonConvert.DeserializeObject<Value[]>(jsonString, settings);

foreach (var message in handler.Messages)
{
    Console.WriteLine(message);
}

As a result, we'll have the following warning message:

Property id ([1].id) is obsolete on objects of 'StringValue' class. Use another property here

Conclusion


That's it. The code here is not production ready, but I think it is a good place to start. Such warning messages can make your Web API more user-friendly.

Another interesting problem is how to make it work with ASP.NET Web API. There we don't have direct access to the JSON serializer, and all instances of serializers use the same JsonSerializerSettings object. Somehow we must distinct one request from another. But this is a question for another article.

No comments:

Post a Comment