A Small DSL Boolean Compiler

by breeve 2. January 2011 15:56

Too many software engineers when they hear the word custom compiler run for the nearest soda machine. Although writing a fully functional compiler takes lots of work, writing a small compiler is simple and can reap huge benefits.

At work we have a product that allows customers to write boolean logic to determine when certain rules run to make a car ad. The resulting boolean code is stored in a database as a string. At runtime, the code is interpreted and the variables are resolved using reflection. Problem is when some developer decides a property name isn’t to their liking and changes it, the reflection lookup fails and things break. Worse, we don’t know until runtime.

The solution is a compiler I wrote that will run on boolean code before the code is deployed. The compiler will create a .NET assembly that can be used at runtime. The assembly created contains not just boolean logic but a template which runs to create phrases that go into the ad. I have simplified it just to do boolean logic to illustrate the basics workings of a simple compiler. The entire example can be downloaded BooleanCompiler.zip (113.19 kb)

The boolean logic operators we use are not standard operators a programmer would recognize. A valid expression could look something like ([A]&!(([B],[C]&[D]),[E])). The & is equal to the and operator and the , is equal to the or operator. The brackets contain variables which map to a .NET class that defines them. The great thing about writing a compiler is you can make up your own language. Instead of a text syntax language just shown—which confuses car dealers anyways—the language could be something as abstract as a picture.

I have defined an example class that implements the boolean properties. It is listed below and will be used by the final generated code.

public class VehicleConditions
{
    public VehicleConditions(VehicleArgs args)
    {
        //do database query and populate properties based on data fields
    }
    public bool OneOwner
    {
        get
        {
            return true;
        }
    }


    public bool GoodGasMileage
    {
        get
        {
            return false;
        }
    }

    public bool HasWarrantyRemaining
    {
        get
        {
            return false;
        }
    }
    public bool IsLuxury
    {
        get
        {
            return false;
        }
    }
    public bool IsCertified
    {
        get
        {
            return true;
        }
    }
}

The class is passed a VehicleArgs which contains enough info to do a database query to populate the properties. They are just stubbed for now.

The code example to be compiled is below.

([OneOwner] & [GoodGasMileage]),(![IsLuxury] & !([IsCertified] & [HasWarrantyRemaining]))

This expression is saying we want the template to run when the vehicle has one owner and good gas mileage or when it is not a luxury car and not both certified and has a warranty.

The first part of the compiler tokenizes the string. The token base class is defined below:

internal abstract class Token
{
    protected Token(string value, int characterIndex)
    {
        Value = value;
        CharacterIndex = characterIndex;
    }

    public string Value { get; private set; }

    public int CharacterIndex { get; private set; }

    public abstract Node CreateNode();
    public abstract void AcceptVisitor(ITokenVisitor visitor);
}

The concrete classes that derive from token are:

VariableToken
OpenParenToken
ClosedParenToken
NotToken
AndToken
OrToken

The token base class implements the visitor pattern that is used by the parser when creating the AST (abstract syntax tree) from the tokens. The create node method is called by the parser to create nodes that will be part of the AST.

The tokenize implementation goes through the string character by character and creates the .NET token objects above. The result is a linked list of tokens that the parser will use. The class implementation is listed below.

internal class ConditionTokenizer
{
    private LinkedList<Token> _list;
    private int _index;
    private string _condition;

    public ConditionTokenizer(string condition)
    {
        _list = new LinkedList<Token>();
        _index = 0;
        _condition = condition;
    }

    public LinkedList<Token> Tokenize()
    {
        bool tokenAdded = false;

        while (true)
        {
            while (_condition[_index] == ' ' && _index < _condition.Length)
                _index++;

            if (_index >= _condition.Length)
                break;

            Token token = ParseVariable();
            AddTokenIfNotNull(token, ref tokenAdded);
            if (_index >= _condition.Length)
                break;

            token = ParseOperator();
            AddTokenIfNotNull(token, ref tokenAdded);
            if (_index >= _condition.Length)
                break;

            token = ParseParen();
            AddTokenIfNotNull(token, ref tokenAdded);
            if (_index >= _condition.Length)
                break;

            if (!tokenAdded)
                throw new InvalidOperationException("Character not recognized.");

            tokenAdded = false;
        }
        return _list;
    }

    private void AddTokenIfNotNull(Token token, ref bool tokenAdded)
    {
        if (token != null)
        {
            tokenAdded = true;
            _list.AddLast(token);
        }
    }

    private Token ParseParen()
    {
        Token token = null;
        if (_condition[_index] == '(')
            token = new OpenParenToken("(", _index);
        if (_condition[_index] == ')')
            token = new ClosedParenToken(")", _index);

        if (token != null)
            _index++;

        return token;
    }

    private Token ParseOperator()
    {
        Token token = null;
        if (_condition[_index] == ',')
            token = new OrToken(",", _index);
        if (_condition[_index] == '&')
            token = new AndToken("&", _index);
        if (_condition[_index] == '!')
            token = new NotToken("!", _index);

        if (token != null)
            _index++;

        return token;
    }

    private Token ParseVariable()
    {
        Token token = null;

        if (_condition[_index] == '[')
        {
            List<char> charList = new List<char>();
            _index++;
            while (_condition[_index] != ']')
            {
                charList.Add(_condition[_index]);
                _index++;
            }

            string variable = new string(charList.ToArray());
            token = new VariableToken(variable, _index);
        }

        if (token != null)
            _index++;

        return token;
    }
}

The parser instantiates this class passing it the boolean string and calls the Tokenize method getting the list of tokens. After this happens our list of tokens based on our example of

([OneOwner] & [GoodGasMileage]),(![IsLuxury] & !([IsCertified] & [HasWarrantyRemaining]))

looks like the following in the debugger:

[0] {BooleanCompiler.Internal.ConditionModel.OpenParenToken} 
[1] {BooleanCompiler.Internal.ConditionModel.VariableToken} 
[2] {BooleanCompiler.Internal.ConditionModel.AndToken} 
[3] {BooleanCompiler.Internal.ConditionModel.VariableToken} 
[4] {BooleanCompiler.Internal.ConditionModel.ClosedParenToken} 
[5] {BooleanCompiler.Internal.ConditionModel.OrToken} 
[6] {BooleanCompiler.Internal.ConditionModel.OpenParenToken} 
[7] {BooleanCompiler.Internal.ConditionModel.NotToken} 
[8] {BooleanCompiler.Internal.ConditionModel.VariableToken} 
[9] {BooleanCompiler.Internal.ConditionModel.AndToken} 
[10] {BooleanCompiler.Internal.ConditionModel.NotToken}
[11] {BooleanCompiler.Internal.ConditionModel.OpenParenToken} 
[12] {BooleanCompiler.Internal.ConditionModel.VariableToken}
[13] {BooleanCompiler.Internal.ConditionModel.AndToken} 
[14] {BooleanCompiler.Internal.ConditionModel.VariableToken} 
[15] {BooleanCompiler.Internal.ConditionModel.ClosedParenToken} 
[16] {BooleanCompiler.Internal.ConditionModel.ClosedParenToken} 

When the parser gets the tokens it visits each token—using the visitor pattern—and with the use of a stack and simple grammar rules, creates an AST and catches syntax errors like mismatched parenthesis. Once the AST is built each node is evaluated recursively to get a string that represents a valid C# boolean expression. A class is then created by using CodeDOM and compiled into an assembly but we are getting ahead of ourselves. First, the parser.

The parser creates its own model that represents the AST. The abstract Node below represents the base AST node.

internal abstract class Node
{
    protected Node(string value)
    {
        Value = value;
    }

    public string Value { get; private set; }
    public abstract string Expression { get; }
}

The concrete classes are:

AndNode
NotNode
OrNode
ParenScopeNode
VariableNode

Depending on the node, they can have one or more children. For example, the operator nodes (AndNode and OrNode) both have left and right children.

The Parser is listed below. The compiler creates the parser which gets the tokenize list in its constructor. The compiler then calls the Parse method which returns an AST. The Parse method visits each token and depending on the token visited, creates nodes and adds them to the stack.

The stack is periodically reduced by running the method RunGrammerRules which takes one or more nodes from the stack and combines them according to the grammer. When all tokens are visited, the code checks to make sure only one Node is left on the stack which represents the root of the AST. That is then returned from the Parse method.

internal class ConditionalParser : ITokenVisitor
{
    private LinkedListNode<Token> _currentToken;
    private Stack<Object> _parseStack;

    public ConditionalParser(string condition)
    {
        _parseStack = new Stack<object>();
        var tokenizer = new ConditionTokenizer(condition);
        LinkedList<Token> tokens = tokenizer.Tokenize();
        _currentToken = tokens.First;
    }

    private void VisitNext()
    {
        _currentToken = _currentToken.Next;
        if (_currentToken != null)
            _currentToken.Value.AcceptVisitor(this);
    }

    private void RunGrammerRules()
    {
        bool successfulCompression = true;
        while (successfulCompression)
        {
            successfulCompression = NodeOperatorNodeGrammer.Compress(_parseStack);
            if (NotNodeGrammer.Compress(_parseStack) && !successfulCompression)
                successfulCompression = true;
        }
    }

    public Node Parse()
    {
        _currentToken.Value.AcceptVisitor(this);
        RunGrammerRules();
        if (_parseStack.Count != 1 || !(_parseStack.Peek() is Node))
            throw new InvalidOperationException("wrong syntax");

        return _parseStack.Pop() as Node;
    }

    void ITokenVisitor.VisitOpenParen(OpenParenToken token)
    {
        _parseStack.Push(token);
        VisitNext();
    }

    void ITokenVisitor.VisitClosedParen(ClosedParenToken token)
    {
        RunGrammerRules();

        if (!(_parseStack.Peek() is Node))
            throw new InvalidOperationException("wrong syntax");
        Node node = _parseStack.Pop() as Node;

        if (!(_parseStack.Peek() is OpenParenToken))
            throw new InvalidOperationException("wrong syntax");
        _parseStack.Pop();

        ParenScopeNode parenScope = new ParenScopeNode();
        parenScope.Child = node;
        _parseStack.Push(parenScope);

        VisitNext();
    }

    void ITokenVisitor.VisitAnd(AndToken token)
    {
        _parseStack.Push(token);
        VisitNext();
    }

    void ITokenVisitor.VisitOr(OrToken token)
    {
        _parseStack.Push(token);
        VisitNext();
    }

    void ITokenVisitor.VisitNot(NotToken token)
    {
        _parseStack.Push(token);
        VisitNext();
    }

    void ITokenVisitor.VisitVariable(VariableToken token)
    {
        _parseStack.Push(new VariableNode(CodeDomArgs.FirstLookConditionsField + "." + token.Value));
        RunGrammerRules();

        VisitNext();
    }

    private class NodeOperatorNodeGrammer //[Expression][Operator][Expression]
    {

        public static bool Compress(Stack<Object> parseStack)
        {
            bool compressed = false;
            if (parseStack.Peek() is Node)
            {
                Node rightExpression = parseStack.Pop() as Node;
                if (parseStack.Count != 0 && (parseStack.Peek() is OrToken || parseStack.Peek() is AndToken))
                {
                    Token operatorToken = parseStack.Pop() as Token;
                    if (parseStack.Peek() is Node)
                    {
                        Node leftExpression = parseStack.Pop() as Node;
                        OperatorNode operatorNode = operatorToken.CreateNode() as OperatorNode;

                        operatorNode.Left = leftExpression;
                        operatorNode.Right = rightExpression;
                        parseStack.Push(operatorNode);
                        compressed = true;
                    }
                    else
                    {
                        parseStack.Push(operatorToken);
                        parseStack.Push(rightExpression);
                    }
                }
                else
                {
                    parseStack.Push(rightExpression);
                }
            }

            return compressed;
        }

    }


    private class NotNodeGrammer //![Expression]
    {
        public static bool Compress(Stack<Object> parseStack)
        {
            bool compressed = false;
            if (parseStack.Peek() is Node)
            {
                Node rightExpression = parseStack.Pop() as Node;
                if (parseStack.Count != 0 && parseStack.Peek() is NotToken)
                {
                    Token notToken = parseStack.Pop() as Token;
                    NotNode node = notToken.CreateNode() as NotNode;



                    node.Child = rightExpression;
                    parseStack.Push(node);
                    compressed = true;
                }
                else
                {
                    parseStack.Push(rightExpression);
                }
            }
            return compressed;
        }
    }
}

Let’s walk through how the AST is built from the stack given the same example of

([OneOwner] & [GoodGasMileage]),(![IsLuxury] & !([IsCertified] & [HasWarrantyRemaining]))

Notice the stack contains both nodes and tokens. This is a must in order to figure out what has been processed.

 

 

Once the final AST is constructed, as shown above, the compiler just calls the Expression property on the root node which then calls its children and so forth. The result is the following:

(_conditions.OneOwner&&_conditions.GoodGasMileage)||(!_conditions.IsLuxury&&!(_conditions.IsCertified&&_conditions.HasWarrantyRemaining))

The AST could contain another property to actually evaluate the AST but the entire point of this exercise was to avoid reflection or string/property lookups at runtime by having compiled code that catches errors and executes quickly. The compiled code is loaded into the app domain at startup and used over and over.

At this point, we have our C# compatible expression and will use CodeDOM to generate a .NET class. CodeDOM is a .NET API to generate a code graph. Once created, the code graph can be compiled or generate source code. CodeDOM is much easier to use than raw reflection emit which writes raw IL. Below is the CodeDom builder class.

internal class CodeDomBuilder
{
    private CodeDomArgs _args;

    public CodeDomBuilder(CodeDomArgs args)
    {
        _args = args;
    }

    public CodeCompileUnit Build()
    {
        CodeCompileUnit unit = new CodeCompileUnit();

        CodeNamespace mainNamespace = new CodeNamespace("GeneratedCode");
        unit.Namespaces.Add(mainNamespace);

        AddNamespaces(mainNamespace);

        var type = CreateType();
        mainNamespace.Types.Add(type);

        var fields = CreateFields();
        type.Members.AddRange(fields);

        var constructor = CreateConstructor();
        type.Members.Add(constructor);

        var properties = CreateProperties();
        type.Members.AddRange(properties);


        return unit;
    }

    private void AddNamespaces(CodeNamespace mainNamespace)
    {
        mainNamespace.Imports.Add(new CodeNamespaceImport("System"));
        mainNamespace.Imports.Add(new CodeNamespaceImport("BooleanSDK"));
    }

    private CodeTypeDeclaration CreateType()
    {
        CodeTypeDeclaration type = new CodeTypeDeclaration(_args.TypeName);
        type.BaseTypes.Add(new CodeTypeReference("IConditionExecute"));
        return type;
    }

    private CodeConstructor CreateConstructor()
    {
        CodeConstructor classConstructor = new CodeConstructor();
        classConstructor.Attributes = MemberAttributes.Public;
        classConstructor.Parameters.Add(new CodeParameterDeclarationExpression(new CodeTypeReference(typeof(VehicleArgs)), "vehicleArgs"));

        classConstructor.Statements.Add(new CodeAssignStatement(new CodeSnippetExpression("_conditions"),
                                                                new CodeObjectCreateExpression(
                                                                    "VehicleConditions",
                                                                    new CodeVariableReferenceExpression("vehicleArgs"))));

        return classConstructor;
    }

    private CodeTypeMemberCollection CreateFields()
    {
        CodeTypeMemberCollection fieldCollection = new CodeTypeMemberCollection();

        CodeMemberField vehicleConditionField = new CodeMemberField();
        vehicleConditionField.Name = CodeDomArgs.FirstLookConditionsField;
        vehicleConditionField.Type = new CodeTypeReference("VehicleConditions");
        fieldCollection.Add(vehicleConditionField);

        return fieldCollection;
    }

    private CodeTypeMemberCollection CreateProperties()
    {
        CodeTypeMemberCollection propertyCollection = new CodeTypeMemberCollection();

        CodeMemberProperty conditionProperty = new CodeMemberProperty();
        conditionProperty.Name = "Condition";
        conditionProperty.Type = new CodeTypeReference(typeof(bool));
        conditionProperty.Attributes = MemberAttributes.Public | MemberAttributes.Final;
        conditionProperty.HasGet = true;
        conditionProperty.HasSet = false;
        conditionProperty.GetStatements.Add(new CodeMethodReturnStatement(new CodeSnippetExpression(_args.ConditionStatement)));
        propertyCollection.Add(conditionProperty);

        return propertyCollection;
    }

It isn’t complicated to create small CodeDOM classes. Once the code graph is constructed it is compiled into an assembly which is loaded into memory and the class is instantiated. Here is the generated code that is compiled for our example:

 //------------------------------------------------------------------------------
// <auto-generated>
//     This code was generated by a tool.
//     Runtime Version:2.0.50727.3615
//
//     Changes to this file may cause incorrect behavior and will be lost if
//     the code is regenerated.
// </auto-generated>
//------------------------------------------------------------------------------


namespace GeneratedCode
{
    using System;
    using BooleanSDK;
   
   
    public class Test : IConditionExecute
    {
       
        private VehicleConditions _conditions;
       
        public Test(BooleanSDK.VehicleArgs vehicleArgs)
        {
            _conditions = new VehicleConditions(vehicleArgs);
        }
       
        public bool Condition
        {
            get
            {
                return (_conditions.OneOwner&&_conditions.GoodGasMileage)||(!_conditions.IsLuxury&&!(_conditions.IsCertified&&_conditions.HasWarrantyRemaining));
            }
        }
    }
}

The class implements an interface which defines the property Condition. Inside that property is the expression generated from our AST. Also, inside the constructor a VehicleConditions class, defined at the beginning of this tutorial, is instantiated passing the vehicleArgs that define enough info for the VehicleConditions class to define its properties. A new class is instantiated for every vehicle the code needs to run against.

Once the assembly is generated the class can be created using reflection passing the created vehicleArgs for a specific vehicle. The returned object can be cast to the IConditionExecute interface and the Condition property called to evaluate the boolean logic.

If I change the expression defined in the test console application to

([OneOwne] & [GoodGasMileage]),(![IsLuxury] & !([IsCertified] & [HasWarrantyRemaining]))

Notice OneOwner is OneOwne which is wrong. When I run the compiler, I get real compile errors.

It is complaining there is no definition for ‘OneOwne’. That is nice. The expression returns true when run with correct variables

Tags:

Software

Powered by BlogEngine.NET 1.5.0.7
Theme by Mads Kristensen

About Me

I am a Principal Engineer with 13 years experience developing and releasing software products. I started developing in C/C++ then moved into .NET and C# and have tech lead multiple projects. I have developed products in Windows Forms, ASP.NET/MVC, Silverlight, and WPF. I currently reside in Austin, Texas.

Own Projects

Pickaxe - An easy to use web page scraper. If you know a little SQL and how basic CSS selectors work, no web scraping product will be easier to use.

Download Page

Source Code

Created ASP.NET MVC forum originally targeting home owner associations but now in use by an investor group.

http://vtss.brockreeve.com

A language for querying PGATour golf strokes.

http://pga.brockreeve.com/

Real time bidder for car ads demo

http://cargawk.com/

Simple front end tic tac toe

http://tictac.brockreeve.com/

Currently Reading