Timewarp-enabled Distributed Extended Statechart Simulation in SVM

Thomas Huining Feng
MSDL, School of Computer Science, McGill
http://msdl.cs.mcgill.ca/people/tfeng/
hfeng2@cs.mcgill.ca


Contents



Abstract:

This page gives a brief overview of the current development of SVM, mostly focused on timewarp simulation. All the functions and features described here will be completed in Summer, 2003.

Step-by-step Development Process

To view the SVM page for more information on the latest version, click here.

Main Line

\includegraphics[width=12cm]{formalisms}

  1. Basics - Finite State Automata (FSA) \bgroup\color{red}$\surd$\egroup
  2. Statechart - FSA + hierarchy + orthogonal components + history + enter/exit actions, etc. \bgroup\color{red}$\surd$\egroup
  3. Extended Statechart (ES) - Statechart + importation + parametrized templates + transition priority schemes [1] \bgroup\color{red}$\surd$\egroup
  4. Distributed Extended Statechart (DES) - Extended Statechart + ports + dynamic component lookup mechanisms \bgroup\color{red}$\surd$\egroup
  5. Timewarp-enabled Distributed Extended Statechart (TDES) - Distributed Extended Statechart + virtual-time timewarp simulation support (in process ...)

  Realtime Simulation (Execution)1 Scaled-time Simulation2 As-fast-as-possible Simulation3
Single-machine \bgroup\color{red}$\surd$\egroup \bgroup\color{red}$\surd$\egroup \bgroup\color{red}$\surd$\egroup
Distributed \bgroup\color{red}$\surd$\egroup \bgroup\color{red}$\surd$\egroup timewarp (in process)

Concurrent Side Lines

  1. Extended Regular Expression to automatically verify the output simulation trace \bgroup\color{red}$\surd$\egroup
  2. Default graphical/textural interface for Python and Jython \bgroup\color{red}$\surd$\egroup
  3. A complete model of CD player with a model-specific graphical interface \bgroup\color{red}$\surd$\egroup
  4. Graphical/textural run-time model debugger \bgroup\color{red}$\surd$\egroup
  5. Built-in syntax highlighting and syntax highlighting for popular editors including XEmacs (Linux) and UltraEdit (Windows) \bgroup\color{red}$\surd$\egroup

General Scheme

Timewarp-enabled Distributed SVM uses the optimistic scheme to implement timewarp simulation. Every component is a model of the extended statechart formalism in its own right, running in a standalone SVM instance. This is to run multiple logical processes (LP) in multiple processes. Running multiple logical processes sequentially is done by model importation4. Running multiple logical processes in multiple threads will be studied in the future.

  Sequential multi-threaded multi-processed
LP (tight coupling) \bgroup\color{red}$\surd$\egroup future study impossible
LP (loose coupling) future study future study timewarp (in process)

SVM instances may reside on the same machine or be distributed across the network. Two libraries are built to support this distributed computation: one based on PVM, the other based on CORBA. The choice between them is left to the designer.

Every component is allowed to advance as far as possible, unless something is noticed to be wrong in the system: i.e., a component receives a message from another component, which is tagged with a time in its past. In this case, the receiver component must rollback to EXACTLY5 the message time, and all the components which it contacts after that time must also rollback accordingly. (Those components may request more rollbacks in turn.)

Global Time

$GlobalTime=min\{min\{CLT_i\}, min\{MT_i\}\}$
where
$CLT_i$ is the component local time of component $i$;
$MT_i$ is the delivery time of message $i$, which is sent out by the sender but has not yet been received (on the way).

$GlobalTime$ has a special implication. All behavior in its past can be forgotten safely, since the system will never rollback to a time less then the $GlobalTime$.

It is not easy to determine $GlobalTime$. One important reason is that it takes variable time to transfer a message. I.e., $min\{MT_i\}$ is hard to determine, especially for a design with a separate Clock component (discussed in the latter part).

\includegraphics[width=11cm]{timelines}

Preparations

This section discusses some preparations for timewarp simulation. Though they are crucial for timewarp, they can be also used for other purposes.

Dynamic Component Lookup

One issue of distributed simulation is how to locate different components and connect them. These components may run on a single machine or be distributed across a network. Their location may be statically decided at design time. But in most cases this is not practical. The designer is seldom interested in specifying this physical deployment at design time. Moreover, it is also desirable to run the same model on different physical platforms with different deployment schemes, and compare the outcome.

A more flexible approach is to dynamically locate the required components. The tester can thus plug-and-play different components at run-time. When a component is executed, it may require the functionality of some other components. Once it finds these required components, it uses the mechanisms provided by the SVM environment to establish necessary connections between ports. Each pair of connected ports represents a communication channel, which is set up by one component and passively accepted by the other.

To locate a component, there are three schemes, which can also combine to form more sophisticated query:

A DNS (Dynamic Naming Service) server runs in the background, which records information of all the running components. When a component starts, its SVM solver registers itself and also queries the DNS server for required components. When those components are found, direct connections between them are established, so that it can call the functionality of them. Those components can also call its functionality, provided that the connections allow incoming calls.

\includegraphics[width=13cm]{physical}

Snapshotting

Two approaches are possible for a rollback: undoing all the changes from the last checkpoint or restoring a complete snapshot, which is taken at the checkpoint.

For an FSA model, both the approaches are feasible. Either all the state changes after the checkpoint are undone, or the old state which is recorded at the checkpoint is restored. However, for an Extended Statechart model, it is not easy to undo all the changes, since the output actions may modify arbitrary variables.

For the undoing approach, a C++ implementation is available here (the project report by Shuqing Wu and me).

SVM takes the snapshotting approach by making it possible to snapshot any model at any time in its execution. The snapshot result is a string which records all the current states of the model, including user-defined variables and objects. For a simple model, a snapshot takes about 6000 bytes of memory.6

Event Parameters and Serialization

Each event is allowed to carry a parameter. If the parameter is a list or tuple, it can be imagined that an arbitrary number of parameters are passed with the event.

The meaning of the parameters are a decision of the designer. In the output of a transition, the parameter accompanying the current event can be retrieved. In the guard, such a parameter can also be used to determine run-time condition.

For distributed simulation, one SVM instance automatically serialize the parameters before sending them to another one (possibly via a network). The serialization is recursive if the parameters are tuples or objects. The SVM instance on the other side automatically reconstructs the parameters from the serialization strings before passing them to the running component.

Automatic serialization is useful for a lot of applications. For SVM, I implemented it in Python (or Jython), which, like Java, allows dynamic reflection into an object. Another implementation, which does not use any reflection, is built in C++. The readers can find it here (the project report by Shuqing Wu and me).

Synchronous Calls

\includegraphics[width=11cm]{calls}

In theory, every message is sent asynchronously. The sender dispatches the message, which may be transfered via the underlying PVM or CORBA to the other side, and then immediately continues to do other jobs. Neither does it wait for a reply from the callee. On top of PVM or CORBA, it is guaranteed that the callee will receive all the messages eventually and in their order. However, the transfer time is unpredictable.

In practice, asynchronous calls complicate model design, since, after sending a message, if a reply is expected, the model must remember this state. If communication between models are common, the state space are increased dramatically.

SVM supports built-in synchronous calls. Synchronous calls can be used in the output, which is similar to sending an external event. Suppose event $e$ is sent via output port $p$. The external event is written as $p.e$. Also suppose an event $f$ should be received from input port $q$ afterward. The synchronous call must specify both $p.e$ and $q.f$. When this call is being executed, the component is blocked until $f$ is actually received from $q$.

It is the designer's responsibility to make sure that the reply will be received in the future. Otherwise the component will be blocked forever. There is no way to awake it, because an output part is executed as a critical section and global locks are acquired for it, which also block other concurrent threads. If a model cannot leave an output part, it is dead.

Like some other extensions, synchronous calls do not add functionality to statechart. They are just a shorthand notation for practical purpose.

Time and Sender Tags

A message must be tagged with the delivery time. When the receiver receives a message, it uses the Time tag to determine which of the following three actions are taken:

A message also carries a Sender tag, which gives the global ID of its sender (for PVM implementation, the ID is a PVM ID; for CORBA implementation, the ID is an IDL ID). This tag is useful especially when the behavior of the receiver depends on the sender of the message, or when the receiver must send back a result to the sender.

Theoretically speaking, an output port of message sender is connected to one or more input ports of its receivers. When it sends a message to this output port, all the connected receivers should receive a copy of the message, whether they are interested in it or not.

However, in practice it is usually desirable to explicitly send a message to only part of those receivers. Loads on the network are greatly decrease. An example is the Clock component discussed in the next section. It has a timewarp port supporting timewarp simulation. Every other component is connected to this port. It is not practical to broadcast a message to all other components while only one or two of them will actually handle it.

With the Sender tag, the receiver can determine where a message comes from, and explicitly reply to it later.

Clock Component

A special component, Clock, manages time advance in the whole system. Other components cannot advance their local time unless they receive a notification from the Clock. The Clock also broadcasts the global time whenever it is changed.

\includegraphics[width=9.5cm]{communication}

Time Retrieval

The Clock maintains the local times of all the other components. Those components cannot assume their time without synchronization with the Clock.7

To retrieve its local time, the component uses a synchronous call:

t=[SYNCALL("timewarp.get", None, "timewarp.time")]
8

(Suppose this component has an in/out port called timewarp. It is connected to the in/out port timewarp of the Clock component.)

Since a component has to get its time frequently with this synchronous call, the performance of the Clock is crucial. If the Clock component crashes, all the other components are blocked, because the reply of the synchronous call will never be returned.

On the Clock side, it must handle this time query event no matter which state it is in (the Clock itself is also a component written in the DES formalism). The following transition is taken from its description file:

TRANSITION: [HS]
S: TIMEWARP
N: TIMEWARP
E: timewarp.get
O: [EXTEVENT("timewarp.time", [[CURRENT]], [[SENDER]])]
9
\includegraphics[width=8cm]{tran1}

(This transition example also shows that the textural model description is 1:1 representation of the graphical semantics.)

Time Scheduling

A component asynchronously schedules events in its future by sending message schedule. to the Clock (the port is schedule, the event name is omitted). The scheduled time is a parameter sent with the message.

The Clock should not immediately notify the component, which may be still busy handling events at its current local time. It keeps this scheduling message in a priority queue. (There is a priority queue ordered by scheduled time for each connected component.)

When a component finished handling all the events at the current time (i.e., its current queue becomes empty), it synchronously sends message timewarp.notifyidle to the Clock, and waits for reply timewarp.goahead.

On receiving timewarp.notifyidle, the Clock checks if the idle component has scheduled an event in its future. If so, it does the following things:

  1. reply with message timewarp.goahead
  2. increase the local time of the component according to its minimum scheduled time
  3. send message notify. to the component

If the idle component has not scheduled any event, the Clock simply sends timewarp.goahead to it. It will be truly idle and waiting for later incoming events.

Rollback

Rollback is an important issue in timewarp.

\includegraphics[width=21cm]{rollback}

Discovering a Time Skew

It is important for a component to know that a message it receives is tagged with a time in its past. In this case, it stops the event handler immediately and sends message timewarp.timeskew to the Clock.

As to solve a time skew, a rollback of this component is necessary. Its rollback also affects other components, which have received or will receive its messages sent after the rollback time. So the component must remember all its influencees when sending out messages. It sends the set of affected influencees with message timewarp.timeskew to the Clock.

The rollback time is also sent as another parameter.

Central Rollback Management

When it receives the timewarp.timeskew message, the Clock uses a set $S_a$ to record all affected components. Another set $S_r$ is used to record rollbacked components, which is initialized to be empty:

\begin{displaymath}
\left \{
\begin{array}{l}
S_a=\{c \vert c is an affected component\}\\
S_r=\emptyset
\end{array}\right .
\end{displaymath}

The Clock broadcasts timewarp.rollback to all the components in $S_a$, including the one that first discovers time skew. The rollback time is sent as a parameter.

Rollback in Individual Components

Every component that receives this rollback message must immediately rollback to exactly the rollback time. All the events received after the rollback time which are sent from affected components are deleted. This is because those senders rollback to a time when these messages are not sent yet. However, events from other components which are not affected should be kept and handled later.

When it is done, it sends back a timewarp.rollbacked message to the Clock, and waits for timewarp.goahead message. It should not immediately start handling new events, because other affected components may not have rollbacked at this time and are still sending out erroneous messages.

A fact is very important: these rollbacked components may also send back their influencees. The Clock listens for timewarp.rollback messages. If more affected components are found, they will be added to $S_a$ and timewarp.rollback will be sent to them.

Finalizing the Rollback

After broadcasting rollback messages, the Clock listens for timewarp.rollbacked. When this message is received, the sender is added to $S_r$. If more affected components are returned, they are added to $S_a$.

When $S_a$=$S_r$, the rollback process is complete. The Clock broadcasts timewarp.goahead message to all the components in $S_a$.

Distribution

From version 0.3, SVM (and jSVM implemented in Jython) will have timewarp support that makes building timewarp simulation easy. Changing a simulation of other kinds to timewarp only needs to add a few deployment keywords to some of the participating components.

Current stable version is 0.23, which is available here.

The new version will be released later this summer.

Acknowledgment

Special thanks to Prof. Hans Vangheluwe for his excellent supervision and advice!

Bibliography

1
Thomas Feng.
A virtual machine supporting multiple statechart extensions.
In 2003 Summer Computer Simulation Conference, Montreal, Canada, July 20-24 2003.

2
Richard M. Fujimoto.
Parallel and Distributed Simulation Systems.
Wiley-Interscience, 2000.

3
Jie Liu and Edward A. Lee.
A component-based approach to modeling and simulating mixed-signal and hybrid systems.
ACM Transactions on Modeling and Computer Simulation (TOMACS), 12(4):343-368, 2002.



Footnotes

... (Execution)1
The simulation (or execution) is synchronous with wall clock.
... Simulation2
Simulation time is linearly related to wall-clock time.
... Simulation3
Simulation time is advanced instantaneously when an event occurs. The model state is constant between two successive events and is not simulated at all.
... importation4
Importing a model is to put all its states and transitions between them in one of the states of the importing model, and thus form a larger model.
...EXACTLY5
It cannot rollback even a little more. If so, some messages it sent before the necessary rollback time have to be canceled, and their receivers must rollback to the message delivery time (which is less than the necessary rollback time). Those receivers may also rollback more than necessary. ...At last it is possible that the system rollbacks to the very beginning!
... memory.6
To compensate for this, later version will improve the performance greatly and thus snapshotting is less costly. Old snapshots will be compressed. Out-of-date snapshots (snapshots taken before the global time) will be deleted.
... Clock.7
The only exception is that their local times are not changed during the execution of a single output part, which may consists of a sequence of commands.
... "timewarp.time")]8
SYNCALL is a predefined SVM macro. SVM model descriptions use macros heavily, which reduce language dependency. All the macros are used in square brackets. [SYNCALL(...)] carries 3 parameters: an output event (in this case, event get via port timewarp, the parameter of the output event (in this case, None since no parameter is required), and an input event (time from port timewarp). The result of this call is the parameter coming alone with the input event.
... 9
A transition starts with TRANSITION: tag. The source state (S) is TIMEWARP; the new state (N) is also TIMEWARP; the triggering event is timewarp.get (get event from port timewarp); the output contains only one commands, which asynchronously sends event time via port timewarp, with parameter [CURRENT] returned to the [SENDER] component. The transition goes to the history state ([HS]) of state TIMEWARP instead of its default substate.


Thomas Huining Feng 2003-11-24