Team

Nurudeen Lameed
McGill University, Montreal.

Motivation

AToM3[1] is a graphical meta-modeling tool that is being developed by the Simulation and Modeling group at McGill University. It is written in Python and has capabilities to create models graphically in Causal Block Diagrams (CBD), Petri- net, among others. A causal block diagram is a graph of connected operational blocks; AToM3 transforms the concrete syntax (the block diagram) into an abstract syntax (the dependency graph). The software generates Python code representing the models. Experiment/simulation is then set-up for some number of iterations and finally, results are generated by computing each block at every iteration. Scientific models often require intensive computations over large number of iterations. But Python interprets computation for every block in the CBD repeatedly. Furthermore, in many cases, the values of some blocks remain constant throughout the simulation. However, the current implementation of CBD in AToM3 re-computes these blocks at each iteration. All this generally results in low performance for large models.


Proposal/Requirements

In the foregoing paragraph, it was mentioned that code interpretation combined with redundant computations generally results in low performance for large models. To speed up performance for most computation intensive applications, we propose the following optimizations over the existing implementation.


Design/Models

The optimizing compiler developed for this project generates efficient C programs for causal block diagrams. It works by in-lining C code into AToM3 simulator program. AToM3 flattens models during the first iteration and generates a dependency graph for every model. The optimizer performs some analysis on this dependency graphs to determine among others, which inputs are constants; which blocks require repeated computations and which blocks do not require repeated computations. Nodes of the dependency graph are marked depending on whether they are part of varying computations or not. For example, if a block has mark = 1 then such block should be computed only once; any other value indicates that the block must be recomputed at every iteration. The following rules are used to mark nodes:
  1. if a block is a constant block, then the mark for the node is 1.
  2. if a block is a timedelayed block, then the mark for the node is 0.
  3. if a block is not a constant block but all its input blocks are marked 1, then the mark for the node is 1 otherwise, the mark is 0; loops are handled by considering all the input blocks to the current block and checking for 1 and 2 above.
Combining the above rules with the dependency graph, appropriate code is generated for every block and constant values propagated, where necessary.


Implementation

This project is implemented in Python. Code generation is in-lined with AToM3 code. The implementation uses LAPACK[3] library to solve systems of linear equations generated when an algebraic loop is detected. It is possible to have more than one loop in a model. Each loop is unique and generates a unique system of linear equations. A loop generates a system of linear equation of the form:
      Ax = b 
   
where A is the coefficient matrix formed from the relationships between the variables given by the vector x; b is the vector for containing the right-hand sides of the equations. Because the dependency graph at time = 0 might be different from the dependency graph at time > 0; code generation occurs only during the first two iterations (time = 0 and time = 1). The generated C code includes function for printing the results to the standard output device, file and for collecting timing statistics.


Experiments

This section describes some of the experiments that were developed and, run to test the capabilities of the optimizing compiler.


Performance Evaluation

The circle test and Physbe experiments were run with both AToM3 and the corresponding C programs. The following table compares the performance of the models in AToM3 (Python) and in the corresponding C programs.
Table comparison performance of AToM3 with the generated C program
ApplicationPython(time in seconds)C (time in seconds)
Physbe(3,000 iterations)23.87s0.82s
Circletest (60,000 iterations)22.33s0.94s
Although AToM3 performs some tasks of converting concrete syntax of a model into abstract syntax (flattening, building the dependency graphs, and others). This does not have any significant effect on the total time as collecting the timing statistics after from the third iteration to the last iteration yields similar result. The C program generally always outperforms Python especially for large number of iterations. For the experiments/simulations, the generated C program is over 20 times faster than Python.


Conclusions

Through the project, we show that hierarchical ordering in CBD provides opportunities for optimizations and, elimination of redundant computations and compiling code instead of interpreting code as in Python result in performance gains.


References

  1. AToM3.http://atom3.cs.mcgill.ca/
  2. Steven S. Muchnick. Advanced Compiler Design and Implementation. MORGAN KAUFFMAN, CA. ISBN-13: 978-1-55860-320-2, 1997.
  3. LAPACK.http://www.netlib.org/lapack/
  4. McLeod J, PHYSBE...a physiological simulation benchmark experiment, SIMULATION, 324-329, vol.7, no.6, 1966.
  5. The Mathworks. http://www.mathworks.com


Presentation

Presentation for the project can be found here.