|
aperiot at glance
aperiot is both a grammar description language and a parser generator
for Python. Its purpose is to provide the means to describe a language's
grammar and automatically generate a parser to recognize and process
text written in that language.1 It is intended to be used mainly for programming and modelling languages.
The basic idea is this:
- Write the grammar of a language you want to parse using the aperiot meta-language
described in the documentation. Save this in a plain
text file with a .apr extension.
- Use the aperiot grammar compiler script to produce one of two possible
grammar representations.
- In your application, load one of the generated representations using
a simple API provided in aperiot, which results in a Python object,
the parser, that can parse strings or files given as input.
To illustrate this process, we'll consider a simple language for arithmetic
expressions. The application is a simple calculator.
- Write the grammar below in a plain text file named aexpr.apr.
-
# This is a simple language for arithmetic expressions
numbers
number
operators
plus "+"
times "*"
minus "-"
div "/"
brackets
lpar "("
rpar ")"
start
EXPR
rules
EXPR -> TERM : "$1"
| TERM plus EXPR : "$1 + $3"
| TERM minus EXPR : "$1 - $3"
TERM -> FACTOR : "$1"
| FACTOR times TERM : "$1 * $3"
| FACTOR div TERM : "$1 / $3"
FACTOR -> number : "float($1)"
| minus FACTOR : "-$2"
| lpar EXPR rpar : "$2"
In this file, the sections titled ``numbers,'' ``operators,''
and ``brackets'' define symbolic names for the input tokens. The
last section provides the actual rules. Each rule is annotated with
a ``Python expression template.'' This is, a Python expression
that uses placeholders (numbers preceded by `$'.) The placeholders
refer to the corresponding symbol in the symbol sequence. For example,
in FACTOR times TERM : "$1 * $3", $1
refers to FACTOR, and $3 refers to TERM.
When parsing, if this rule is applied, the result of applying the
actions that yield a FACTOR will replace the $1
entry and the result of applying the actions that yield TERM
will replace the entry $3, and the result of evaluating
the full Python expression will be the result of applying this rule.
- Use the aperiot grammar compiler script to produce one of two possible
grammar representations.
In the comand-line prompt, execute the grammar compiler by typing:
-
apr aexpr.apr
This will generate a Python package called aexpr_cfg in
the same directory where aexpr.apr is located. This package
contains a module called aexpr.py.
- In your application, load one of the generated representations using
a simple API provided in aperiot, which results in a Python object,
the parser, that can parse strings or files given as input.
Assuming that the aperiot package and the directory where you generated
aexpr_cfg are in the Python path, in your application you
can write something like this:
-
from aperiot.parsergen import build_parser
myparser = build_parser(`aexpr')
text_to_parse = "56 +43* -21/(12-7)"
outcome = myparser.parse(text_to_parse)
print outcome
Alternatively, you can split the parsing process in two steps: 1)
obtaining the parse tree, and 2) applying the rule actions on the
parse tree:
-
from aperiot.parsergen import build_parser
myparser = build_parser(`aexpr')
text_to_parse = "56 +43* -21/(12-7)"
tree = myparser.parse(text_to_parse, apply_actions=False)
outcome = myparser.apply_actions(tree)
print outcome
Furthermore, the input provided to the parser could be a file:
-
from aperiot.parsergen import build_parser
myparser = build_parser(`aexpr')
text_to_parse = file("myfile.txt", `r')
outcome = myparser.parse(text_to_parse)
text_to_parse.close()
print outcome
The scheme described above generates a minimal Python representation
of the grammar in the aexpr.py module within the aexpr_cfg
package, and the parser object is built at run-time in the client
application by the build_parser function. This approach,
however, may be time-consuming if the language's grammar is large.
aperiot provides alternative approach, in which the parser object
is built during the grammar compilation and saved into a special file
(with a .pkl extension,) which then can be quickly loaded
by the application. To do this, use the -f command-line option
of the apr script:
-
apr -f aexpr.apr
This will generate other files in the aexpr_cfg package,
in particular a file called aexpr.pkl, containing the parser
object itself.
Then, in the client Python application, use the load_parser
function instead of the build_parser function:
-
from aperiot.parsergen import load_parser
myparser = load_parser(`aexpr')
text_to_parse = file("myfile.txt", `r')
outcome = myparser.parse(text_to_parse)
text_to_parse.close()
print outcome
Usually you want to report parsing errors in a user-friendly way.
To do that, wrap around the parse method invocation with an exception
handler as follows:
-
from aperiot.parsergen import load_parser
from aperiot.llparser import ParsingException
myparser = load_parser(`aexpr')
text_to_parse = file("myfile.txt", `r')
try:
outcome = myparser.parse(text_to_parse)
print outcome
except ParsingException, e:
print e
text_to_parse.close()
The printout of the parsing error can be made nicer by keeping a separate
copy of the source file:
-
from aperiot.parsergen import load_parser
from aperiot.llparser import ParsingException
myparser = load_parser(`aexpr')
text = file("myfile.txt", `r')
lines = text.readlines()
text.close()
text_to_parse = file("myfile.txt", `r')
try:
outcome = myparser.parse(text_to_parse)
print outcome
except ParsingException, e:
e.pprint(lines[e.linenum-1])
text_to_parse.close()
Footnotes:
1``Aperio'' is a Latin word meaning ``to uncover,'' ``to
unearth.'' A parser is, after all, a tool that uncovers the structure
of text.
|