Capturing line numbers from antlr parsing errors

by

(copied from http://goobertron.wordpress.com/2011/07/01/capturing-line…parsing-errors/)

http://www.antlr.org/ is a fantastic tool for writing external parsers (http://martinfowler.com/bliki/DomainSpecificLanguage.html).  Briefly, you define a set of tokens that your grammar should recognise and the grammar itself defining the, er, grammar 🙂 .  ANTLR then generates two (Java in my case) artefacts, a lexer and a parser which you then use in your code.  Inside the grammar you can define chunks of code that should be executed when various chunks of the grammar are recognised typically to build up your own representation of the input.  Very similar to flex and yacc for the oldies amongst us.

If ever you need to process an external DSL then antlr is well worth checking out.

When ANTLR comes across a string of tokens that it doesn’t recognise it will throw out an error.  You can capture these messages in a collection of strings, for example, which is useful to a point.  I wanted a bit more context, in particular *where* the error occured (i.e. the position in the source text which wasn’t recognised) so I could highlight failures in the HTML editor (more on that in the future!).

Turns out to be trivial (excuse formatting…still learning wordpress…):

First, define a error container (in groovy):

public final class ParserValidationFailure {

    String token
    int lineNumber
    int character
}

then update the grammar (.g) file:

@parser::members {

    private List<ParserValidationFailure> errors = new ArrayList<ParserValidationFailure>();
    public void displayRecognitionError(String[] tokenNames, RecognitionException e) {
    	super.displayRecognitionError(tokenNames, e);
        ParserValidationFailure failure = new ParserValidationFailure();
       	failure.setLineNumber(e.line);
       	failure.setCharacter(e.charPositionInLine);
       	failure.setToken(e.token.getText());
       	this.errors.add(failure);
    }

    public List<ParserValidationFailure> getErrors() { return errors; }

}

What that snippet does is override the “displayRecognitionError” method, extracts the relevant information into our container and sticks it in a list.  It also defines a new method which allows me to retrieve those errors later, to do with as I like.

And that’s it!

Leave a comment