aperiot

Welcome
Aperiot at glance
Distribution
Documentation
Credits

aperiot at glance

aperiot is both a grammar description language and a parser generator for Python. Its purpose is to provide the means to describe a language's grammar and automatically generate a parser to recognize and process text written in that language.1 It is intended to be used mainly for programming and modelling languages.

The basic idea is this:

  1. Write the grammar of a language you want to parse using the aperiot meta-language described in the documentation. Save this in a plain text file with a .apr extension.
  2. Use the aperiot grammar compiler script to produce one of two possible grammar representations.
  3. In your application, load one of the generated representations using a simple API provided in aperiot, which results in a Python object, the parser, that can parse strings or files given as input.
To illustrate this process, we'll consider a simple language for arithmetic expressions. The application is a simple calculator.

  1. Write the grammar below in a plain text file named aexpr.apr.

    # This is a simple language for arithmetic expressions

     

    numbers

        number

     

    operators

        plus   "+"

        times  "*"

        minus  "-"

        div    "/"

     

    brackets

        lpar  "("

        rpar  ")"

     

    start

        EXPR

     

    rules

    EXPR -> TERM              : "$1"

          | TERM plus EXPR    : "$1 + $3"

          | TERM minus EXPR   : "$1 - $3"

     

    TERM -> FACTOR               : "$1"

          | FACTOR times TERM    : "$1 * $3"

          | FACTOR div TERM      : "$1 / $3"

     

    FACTOR -> number          : "float($1)"

            | minus FACTOR    : "-$2"

            | lpar EXPR rpar  : "$2"

    In this file, the sections titled ``numbers,'' ``operators,'' and ``brackets'' define symbolic names for the input tokens. The last section provides the actual rules. Each rule is annotated with a ``Python expression template.'' This is, a Python expression that uses placeholders (numbers preceded by `$'.) The placeholders refer to the corresponding symbol in the symbol sequence. For example, in FACTOR times TERM : "$1 * $3", $1 refers to FACTOR, and $3 refers to TERM. When parsing, if this rule is applied, the result of applying the actions that yield a FACTOR will replace the $1 entry and the result of applying the actions that yield TERM will replace the entry $3, and the result of evaluating the full Python expression will be the result of applying this rule.
  2. Use the aperiot grammar compiler script to produce one of two possible grammar representations.

    In the comand-line prompt, execute the grammar compiler by typing:

    apr aexpr.apr
    This will generate a Python package called aexpr_cfg in the same directory where aexpr.apr is located. This package contains a module called aexpr.py.
  3. In your application, load one of the generated representations using a simple API provided in aperiot, which results in a Python object, the parser, that can parse strings or files given as input.

    Assuming that the aperiot package and the directory where you generated aexpr_cfg are in the Python path, in your application you can write something like this:

    from aperiot.parsergen import build_parser

    myparser = build_parser(`aexpr')

    text_to_parse = "56 +43* -21/(12-7)"

    outcome = myparser.parse(text_to_parse)

    print outcome

    Alternatively, you can split the parsing process in two steps: 1) obtaining the parse tree, and 2) applying the rule actions on the parse tree:

    from aperiot.parsergen import build_parser

    myparser = build_parser(`aexpr')

    text_to_parse = "56 +43* -21/(12-7)"

    tree = myparser.parse(text_to_parse, apply_actions=False)

    outcome = myparser.apply_actions(tree)

    print outcome

    Furthermore, the input provided to the parser could be a file:

    from aperiot.parsergen import build_parser

    myparser = build_parser(`aexpr')

    text_to_parse = file("myfile.txt"`r')

    outcome = myparser.parse(text_to_parse)

    text_to_parse.close()

    print outcome

The scheme described above generates a minimal Python representation of the grammar in the aexpr.py module within the aexpr_cfg package, and the parser object is built at run-time in the client application by the build_parser function. This approach, however, may be time-consuming if the language's grammar is large. aperiot provides alternative approach, in which the parser object is built during the grammar compilation and saved into a special file (with a .pkl extension,) which then can be quickly loaded by the application. To do this, use the -f command-line option of the apr script:

apr -f aexpr.apr
This will generate other files in the aexpr_cfg package, in particular a file called aexpr.pkl, containing the parser object itself.

Then, in the client Python application, use the load_parser function instead of the build_parser function:

from aperiot.parsergen import load_parser

myparser = load_parser(`aexpr')

text_to_parse = file("myfile.txt"`r')

outcome = myparser.parse(text_to_parse)

text_to_parse.close()

print outcome

Usually you want to report parsing errors in a user-friendly way. To do that, wrap around the parse method invocation with an exception handler as follows:

from aperiot.parsergen import load_parser

from aperiot.llparser import ParsingException

myparser = load_parser(`aexpr')

text_to_parse = file("myfile.txt"`r')

try:

    outcome = myparser.parse(text_to_parse)

    print outcome

except ParsingException, e:

    print e

text_to_parse.close()

The printout of the parsing error can be made nicer by keeping a separate copy of the source file:

from aperiot.parsergen import load_parser

from aperiot.llparser import ParsingException

myparser = load_parser(`aexpr')

text = file("myfile.txt", `r')

lines = text.readlines()

text.close()

text_to_parse = file("myfile.txt"`r')

try:

    outcome = myparser.parse(text_to_parse)

    print outcome

except ParsingException, e:

    e.pprint(lines[e.linenum-1])

text_to_parse.close()


Footnotes:

1``Aperio'' is a Latin word meaning ``to uncover,'' ``to unearth.'' A parser is, after all, a tool that uncovers the structure of text.