COMP 304B Object-Oriented Software Design -- Assignment 2\ Due date: Monday 15 March 2004 before 23:55

COMP 304B Object-Oriented Software Design - Assignment 2
Due date: Monday 15 March 2004 before 23:55

Marc Provost

[printable version of this document (pdf)]| [COMP 304B Home]

Practical information

Goals

In this assignment, we look at the part of the spreadsheet design which is concerned with support for formulas such as =2+3*4+MAX(A1:A10)-$B$2 which are entered in spreadsheet cells. At this point, the spreadsheet application does not yet have a graphical user interface component. The assignment covers the design using UML Class Diagrams and an implementation in Python of an Abstract Syntax Tree (AST). The AST is the data structure used to represent a formula in memory. You will also use the visitor Design Pattern to obtain a string representation of such a formula internal representation. You will present a Collaboration Diagrams to describe the interaction between various objects to produce the formula's string representation.

The front-end

First, an overview of the process followed to convert input strings into an appropriate internal structure will be given. In order to obtain this data-structure, several steps need to be taken. The first step for the spreadsheet to "understand" formulas entered as strings is to recognize the lexical entities ("words" or "tokens") that can be found in the formula language. A module which performs the task of converting a string into a series of tokens is called a scanner. A scanner can be generated automatically from a specification consisting of regular expressions specifying the structure of legal tokens. The syntax of a regular expression adheres to the following:
a an ordinary character stands for itself
e empty string
another way to write empty string
M | N Alternation, M or N
M ·N Concatenation, M followed by N
MN Another way to write concatenation
M* Repetition, zero or more times
M+ Repetition, one or more times
M? Optional, zero or one occurrence
[a-zA-z] Alternation in a set of characters
. A period stands for any single character except newline
"a.+*" String in quote stands for itself literally
For example, valid expressions could be
"while" Represent the string "while"
[a-z][a-z0-9]* Represent legal variable
names in a programming language
[0-9]+ Represent an integer number
([0-9]+"."[0-9]*)|(([0-9]*+"."[0-9]+) Represent a real number
Once a regular expression is specified for each legal token that can be found in a formula, a scanner can be generated, which will take a potential formula string as input and will output tokens. For instance, consider the formula
=2+3+MAX(A1:A10)
The scanner will read the string from left to right and output the token stream
int(2) plus int(3) plus name("MAX") leftparen ref("A1") colon ref("A10") rightparen
This token stream will be fed to a parser module which verifies the order of the tokens and produces a parse tree representing the formula in memory. A grammar is used to specify the parser. A grammar, G, is a structure < N,T,P,S > where N is a set of non-terminals, T is a set of terminals, P is a set of productions, and S is a special non-terminal called the start symbol of the grammar. The set of non-terminals recursively describes the structures of a language in terms of other non-terminals and terminals. As expected, the terminals are simply the tokens output by the scanner. For example, consider the following grammar:

S
®
EXP
EXP
®
EXP  plus  EXP  |  int
This grammar recognizes simple addition expressions (e.g., 4+5+4+832;). It specifies the structure the stream of the tokens produced by the scanner should have.
For this assignment, the formula language has been defined for you (in the form of regular expressions and a grammar). The specification can be found in the file parser.g. The module Parser.py was generated for you from parser.g using the Yapps compiler compiler (http://theory.stanford.edu/ ~ amitp/Yapps/). The utility file yappsrt.py is required for the parser to function.

Requirements

Your goal is to design and implement a tree structure, called an Abstract Syntax Tree (AST). An AST represents information from the parse tree constructed by the parser in an "optimal" way. It does away with redundant information and is optimally suited for further processing.
First, the structure of the formulas needs to be discussed. A formula extends basic arithmetic expressions with functions and references to other formulas. Below are the potential nodes the AST can have:
Note that references and range-references do not have children. A reference in a formula F simply holds the (column,row) information of a cell containing another formula F¢ to be evaluated (later). A first draft for a design could be to create a class for each of the structures described above. However, we soon realize that the composite structures are all very similar. Indeed, all of them have operands (or arguments in the case of a function). They all apply an operation to the operands/arguments and produce an output. Addition, subtraction, multiplication, division and exponentiation can all be replaced by built-in functions. For instance, the expression 4+5 would be represented internally as:
Function(name="sum", [Number(4), Number(5)])
So, instead of creating distinct node types for each arithmetic operator, we reduce them to functions thus simplifying the design. The requirements for each class are described in the following.
A few notes:

ASTVisitor

StrReprVisitor

ASTNode

Number

CellRef

RangeRef

Function

The Design

Class Diagram

You must provide a UML class diagram. Both attributes and associations (with corresponding role name) must be present.

Collaboration Diagram

Assume that we have an AST named ast in memory representing the formula 5+min(A1:A2)*10 and str(ast) is called. To obtain the string representation, your design must use the visitor pattern described in

Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns, Elements of Reusable Object-Oriented Software. Professional Computing Series. Addison-Wesley, 1995.

In particular, str(ast) (implemented in the __str__ method of ASTNodes will instantiate a StrReprVisitor and use it to visit the tree with ast as root. The visiting will be initiated by passing the StrReprVisitor instance
as an argument to ast's accept method. When this visiting stops, __str__() will access the full string representation by calling the StrReprVisitor's method getStrRepr.
The formatting of the string representation must be such that it reproduces results given below (under "Interacting with the parser").
You must provide a complete UML collaboration diagram representing all the interactions between the instance of the StrReprVisitor class and the ASTNode instances corresponding to the above formula.

Implementation

An implementation of the design must be provided. In particular, the files AST.py, ASTVisitor.py, and StrReprVisitor.py must be provided.

Requirements common to all classes

Interacting with the parser

Once you have finished implementing, you experiment with the parser interactively. Simply run main.py. It will parse string inputs and print the AST string representation. You should get something like this:
        Welcome to the formula parser!
        Enter a formula (e.g. =4+5)
        >>> =3+4-5
        parse result: sum(3.0,4.0,invSum(5.0))
        >>> =2*3+4^3+min($A$1:A2)
        parse result: sum(mul(2.0,3.0),exp(4.0,3.0),min($A$1:A2))
        >>> =3+4*2+min(2,3)
        parse result: sum(3.0,mul(4.0,2.0),min(2.0,3.0))
        >>> quit
      

Files

Required, but need not be modified:
Required, must be modified/created:

Tools

The produce the diagrams, you may for example use the diagram editor dia
(http://www.lysator.liu.se/ ~ alla/dia/) which is installed on the UNIX machines in the labs.



File translated from TEX by TTH, version 3.40.
On 12 Mar 2004, 18:14.