Model Checking Contest @ Petri Nets 2015
Bruxelles, Belgium, June 23, 2015
Complete Results for the 2015 Edition of the Model Checking Contest
Last Updated
August 19, 2015

1. Introduction

This page summarizes the results for the 2015 edition of the Model Checking Contest (MCC’2015). This page is divided in three sections:

2. List of Qualified Tools in 2015

Ten tools where submitted this year. They all successfully went through a qualification process requiring a total number of more than 25000 runs (each tool had to answer each examination for the first instance of each «known» model).

Data about these tools are summarized in the table below. For any tool, you can download the disk image that was provided with all its data. You may use these to reproduce measures locally and perform comparison with your own tool on the same benchmark.

Once downloaded, please unpack each disk image with the following command:

tar xjf image.vmdk.bz2

Summary of the Participating Tools (alphabetical order)
Tool name Representative Author Origin Type of execution Link to the submitted disk image Reported Techniques
Cunf C. Rodriguez Univ. Paris 13 (France) Sequential NET_UNFOLDING SAT_SMT
GreatSPN-Meddly E. Amparore and M. Beccuti Univ. Torino (Italy) Sequential DECISION_DIAGRAMS SYMMETRIES
ITS-Tools Y. Thierry-Mieg Univ. P. & M. Curie (France) Parallel (collateral processing) CEGAR DECISION_DIAGRAMS SAT_SMT TOPOLOGICAL USE_NUPN
LTSMin J. Meijer and J. van de Pol Univ. Twente (Netherlands) Parallel (parallel processing) DECISION_DIAGRAMS STATIC_VARIABLE_REORDERING USE_NUPN
Marcie C. Rohr and M. Heiner Univ. Cottbus (Germany) Sequential DECISION_DIAGRAMS UNFOLDING_TO_PT
pnmc A. Hamez IRT Saint-Exupéry (France) Sequential DECISION_DIAGRAMS USE_NUPN
StrataGEM 0.5.0 E. Lopez-Bobeda and D. Buchs Univ. Genève (Switzerland) Parallel (collateral processing) DECISION_DIAGRAMS TOPOLOGICAL
TAPAAL J. Srba Univ. Aalborg (Denmark) 4 variants (2 sequential, 2 parallel — collateral and parallel) or EXPLICIT STATE_EQUATIONS STRUCTURAL_REDUCTION

3. Experimental Conditions of the MCC'2015

Each tool was submitted to 12077 executions in various conditions (929 model/instances and 13 examination per model/instance) for which it could report: DNC (do not compete), CC (cannot compute) or the result of the query. These executions were handled by BenchKit, that was developed in the context of the MCC for massive testing of software. Then, from the raw data provided by BenchKit, some post-analysis scripts consolidated these and computed a ranking.

16 GB of memory were allocated to each virtual machine (both parallel and sequential tools) and a confinement of one hour was considered (execution aborted after one hour). So, a total of 169 078 runs (execution of one examination by the virtual machine) generated 24 Gbyte of raw data (essentially log files and CSV of sampled data)

The table below shows some data about the involved machines and their contribution to the computation of these results.

Involved Machines and Execution of the Benchmarks
  bluewhale03 Ebro Quadhexa-2 Small Total
Cores 40 @ 2.8GHz 64 @ 2.7GHz 24 @ 2.66GHz 5×24 @ 2.4GHz -
Memory (GB) 512 1024 128 5×64 -
Used Cores sequential tools) 31,
31 VM in //
63 VM in //
7 VM in //
5×3 VM in //
Used Cores (parallel tools) 36 (4 per VM),
9 VM in //
60 (4 per VM),
15 VM in //
20 (4 per VM),
5 VM in //
5×12 (4 per VM),
5×3 VM in //
Number of runs 42 406 79 534 20 748 26 390 169 078
Total CPU required 400d, 12h, 48m, 36s 713d, 04h, 30m, 32s 175d, 17h, 25m, 55s 249d, 19h, 19m, 20s 1539d, 6h, 4m, 23s
Total CPU about 4 years, 2 months and 18 days -
Time spent to complete benchmarks about 23 days and 12 hours -
Estimated boot time of VMs +
management (overhead)
58 d, 16h, 59m -

We are please to thanks those who helped in the execution of tools:

4. The Results of the MCC’2015

This First table below presents detailed results about the MCC'2015.

Details about the Examinations in the MCC'2015 (part I):
Details about Results and Scoring + Model Performance Charts
   Details about Results and Scoring   Model Performance Charts 
No tool did participated in LTL examinations

This Second table below presents some performance analysis related to tools during the MCC'2015.

Details about the examinations in the MCC'2015 (part II)
Tool Performance Charts
 models only 
 models only 
 models only 
LoLA 2.0

You can download the full archive (1.5 GB compressed and 24 GB uncompressed) of the 169 078 runs processed to compute the results of the MCC'2015. This archive contains execution traces, execution logs and sampling, as well as a large CSV file (24 MB) that summarizes all the executions.

5. The Winners for the MCC'2015

This section presents the results for the main examinations that are:

Each examination for a given model brought 16 points (see the details in the slides). A multiplier was applied depending to the model category:

One problem was detected on the SmallOperatingSystem, a last minute change lead to a mistake reducing its marking graph to a single state. It is thus treated by all tools ;-). This will be corrected next time. Another consistency problem was detected between the QuasiCertifProtocol where the colored version was accidentally not equivalent to the P/T one (thus, consistency checks for this model have been disabled).

5.1. Winners in the StateSpace Category

This examination had the highest participation rate (11 tools). Results based on the scoring shown below is:

Let us note that all the variants of TAPAAL were considered as one tool. We also discovered too late a problem with PNXDD (a java 8 configuration error resulted in numerous «Cannot Compute» answers) that was not detected during the qualification phase because it was only affecting «Stripped» and «Surprise». For next year, the qualification protocol will be extended to avoid such a situation. We also discovered that the LTSmin PNML front-end ignores inscriptions on arcs (i.e. it always assumes one token is consumed/produced per arc). This explains the errors in at least BridgeAndVehicles, PhaseVariation and DrinkVendingMachine.

In some cases, the automatic analysis scripts (no one could handle such an amount of results manually ;-) are fooled by the way numeric values are displayed by tools (depending on the library they used). This can be observed in the instance «864» of Parking where all results can be considered as being true while the round algorithm is sometimes "fuzzy" (compare the execution report with the values used to get convinced). Next year, the minimum precision required for numeric values will be specified in the call for tool participation.

5.2. Winners in the Reachability Formulas Category

This examination was also quite successful with a hight participation rate (10 tools). Results based on the scoring shown below is:

Bugs were detected by tool developers but too late to be considered (already several years of CPU were processed). In ITS-Tool, some student work was wrong when translating some king of formulas and the management of hierarchical NUPN data was inappropriate. Another problem was revealed in TAPAAP On the Fly. These two tools were intensively upgraded since their last participation to the MCC (2014 for TAPAAL and 2013 for ITS-Tools), this may explain the situation. A mistake was also discovered in the on-the-fly symbolic invariant detection in LTSmin's symbolic back-end (when optimizing formulas). This explains all errors in the ReachabilityCardinality category.

5.3. Winners in the CTL Formulas Category

Only two tools did participated. Results based on the scoring shown below is:

A bug was revealed in the translation of formulas for GreatSPN-Meddly. However, it was discovered by developers too late (already years of CPU were processed).

5.4. Winners in the LTL Formulas Category

In 2015 no tool competed in the LTL Formulas Category.

6. Estimation of Tool Reliability

This year, we tried to introduce some reliability analysis to enforce the computation of «right results» based on the answers of participating tools. To do so, we considered each value provided in the contest (a value is a partial result such as the result of a formula or a number provided for state space, bound computation, etc.). To do so, we processed as follows:

  1. For each «line» (all tools for a given examination for a given instance), we selected all «significant values» where at least 3 tools do agree.
  2. Based on this subset of values, we computed the ratio between the selected values for the tool and the number of good answers. This ratio gave us a tool reliability rate that is provided in the table below.
  3. This tool reliability rate was then applied to compute the scores presented in the dedicated section.

The table below provides, in first column, the computed reliability rate (that is naturally lower for tools where a bug was detected). Then, the table provides the number of correct results (column 2) out of the number of «significant values» selected for the tool (column 3). The last column shows the number of examination (and their type) the tool was involved in.

Estimated Tool Reliability rate (based on the «significant values» computed by tools)
Tool name Reliability Correct Values «significant values» Involved Examinations
Cunf 96.96% 4 728 4 875 3 (Reachability)
GreatSPN-Meddly 62.30% 11 966 19 206 10 (StateSpace, Reachability, CTL)
ITS-Tools 64.05% 10 890 17 003 4 (StateSpace, Reachability)
LoLA 2.0 97.80% 25 796 26 378 6 (Reachability)
LTSMin 79.13% 13 995 17 687 5 (StateSpace, Reachability)
Marcie 92.52% 18 443 19 934 10 (StateSpace, Reachability, CTL)
pnmc 99.59% 741 744 1 (StateSpace)
PNXDD 88.89% 56 63 1 (StateSpace)
STrataGEM 0.5.0 100.00% 243 243 1 (StateSpace)
TAPAAL (MC) 99.75% 23 247 23 306 7 (StateSpace, Reachability)
TAPAAL (SEQ) 99.88% 22 880 22 907 7 (StateSpace, Reachability)
TAPAAL-OTF (PAR) 88.43% 15 253 17 248 7 (StateSpace, Reachability)
TAPAAL-OTF (SEQ) 96.19% 19 001 19 733 7 (StateSpace, Reachability)