# Introduction

The Model Checking Contest is a yearly scientific event dedicated to the assessment of formal verification tools for concurrent systems.

The Model Checking Contest has two different parts: the Call for Models, which gathers Petri net models proposed by the scientific community, and the Call for Tools, which benchmarks verification tools developed within the scientific community.

# I. Rules specific to Model Submission and their Publication

M-1.The Organizers of the Call for Models of the 2017 edition of the Model Checking Contest are: Hubert Garavel, Lom Hillah, and Fabrice Kordon.

M-2. The detailed instructions for proposing a model are given in the Call for Models and the model submission toolkit.

M-3. Any model submitted should be in the public domain. Any model that is proprietary, copyrighted, and/or confidential should not be submitted. Persons who submit a model hereby allow the organizers to freely use this model and publish it in the web if this model is selected.

M-4. If a submitted model is improper (e.g., invalid PNML file, missing or incomplete model description form, etc.), the organizers may contact the person(s) who submitted the model and ask for corrections. If corrections are not done or are unsatisfactory, the organizers can reject the model.

M-5. Among all submitted models, the Organizers will select approximately a dozen, according to criteria such as: originality, complexity, diversity with respect to other models, industrial relevance, etc.

M-6. The list of models selected for the 2017 edition of the Model Checking Contest will be kept confidential by the Organizers until all tools have been submitted, so that selected models are not known in advance by tool developers.

M-7. When the tool contest start, the selected models will be published on the web site and the names of the authors of selected models will be mentioned on the same web site. The Organizers of the Call for Models will not submit tools themselves, and will ensure fair competition between all tool developers competing in the Call for Tools.

# II. Rules specific to Tool Submission and their Publication

T-1.The Organizers of the Call for Tools of the 2017 edition of the Model Checking Contest are: Francis Hulin-Hubard, and Fabrice Kordon.

T-2. Developers of a tool (or a tool extension) can participate with their own work. Tools may also be submitted by others if they present the latest version available on the web.

T-3. All MCC models are provided in PNML, which is the input format for Petri nets. In principle, each submitted tool should be able to read these PNML files; if a tool is not capable of reading PNML, it may read alternative files provided by the submitter(s) in formats other than PNML. This only holds for «known» models that are known when the call for tool is issued ; for «surprise» models, the use of PNML is mandatory

T-4. It is allowed to participate for certain models only and/or certain examinations only (e.g., state space generation, some types of formulas). See the Submission Manual for details.

T-5. Submitted tools must honestly perform the computations required by the MCC examinations. Using precomputed results is not allowed and, if detected, will disqualify the tool submission.

T-6. Each examination must be computed independently, i.e. without using previous results of an earlier computation. However, when a file contains several formulas, the tool may reuse the results of certain formulas to compute other formulas.

T-7. Participants authorize the publication of the MCC results.

# III. Rules specific to the Evaluation and Scoring of Tools

This section defines the way we intend to operate tools, evaluate tools, and compute scores.

### III.1. Description of the Examinations for 2017

**This section defines the examinations for 2017, as well as the outputs
expected by tools for these examinations.**

E-1.1. There are five categories of
examinations: *StateSpace*, *UpperBounds*, *Reachability*
formulas, *CTL* formulas, and *LTL* formulas.

E-1.2. For the *StateSpace* category,
each participating tool must provide the following results: number of states in
the marking graph, number of transitions firing in the marking graph, the
maximum number of tokens per marking in the net, and the maximum number of
tokens that can be found in a place.

E-1.3. For the *UpperBounds* category,
each participating tool must provide the bound of places designated in a
formula as an integer value, specifying the upper bound of the place. The
provided value is expected to be exact upper bound of the corresponding place
in the system.

For each examination of the *UpperBounds* category, 16 formulas
(selected places) are proposed.

E-1.4. The *Reachability* formulas
category contains three subcategories : *ReachabilityDeadlock*,
*ReachabilityFireability*, and *ReachabilityCardinality*. Each
tool must provide (when it answers):

- for
*ReachabilityDeadlock*, a boolean TRUE or FALSE indicating the existence (or not) of at least one deadlock state in the system, - for
*ReachabilityFireability*and*ReachabilityCardinality*, a sequence of booleans TRUE or FALSE stating whether a formula is satisfied.

The differences between the three subcategories only relies in the atomic
propositions of the formula. *ReachabilityDeadlock* corresponds to a
simple formula stating the existence of a deadlock state. In
*ReachabilityFireability*, atomic propositions are boolean combinations
of propositions checking for the firability of transitions. In
*ReachabilityCardinality* atomic propositions are boolean combinations
of propositions comparing the number of tokens in places.

For each examination of the *ReachabilityFireability* and
*ReachabilityCardinality* categories, 16 formulas are
proposed.

E-1.5. The *CTL* formulas category
contains two subcategories : *CTLFireability*, and
*CTLCardinality*. Each tool must provide (when it answers), a sequence
of booleans TRUE or FALSE stating whether a formula is satisfied.

The differences between the two subcategories only relies in the atomic
propositions of the formula. In *CTLFireability*, atomic propositions
are boolean combinations of propositions checking for the firability of
transitions. In *CTLCardinality*, atomic propositions are boolean
combinations of propositions comparing the number of tokens in places.

For each examination of the *CTL* category, 16 formulas are proposed.

E-1.6. The *LTL* formulas category
contains two subcategories : *LTLFireability*, and
*LTLCardinality*. Each tool must provide (when it answers), a sequence
of booleans TRUE or FALSE stating whether a formula is satisfied.

The differences between the two subcategories only relies in the atomic
propositions of the formula. In *LTLFireability*, atomic propositions
are boolean combinations of propositions checking for the firability of
transitions. In *LTLCardinality*, atomic propositions are boolean
combinations of propositions comparing the number of tokens in places.

For each examination of the *LTL* category, 16 formulas are
proposed.

E-1.7. If a tool participates in an examination in a category, its answers must be returned using a dedicated keyword. If the tool does not participate, it must state it explicitly (see the submission manual). If the tool cannot compute a result, it must state it explicitly with a dedicated keyword. When the analysis script does not find any appropriately formated answer, it will assume that the tool could not compute anything.

All instructions with regards to outputs are presented in the submission manual. These instructions must be respected. Mistakes in the automatic evaluation of results are not the responsibility of the MCC organizers.

E-1.8. For all examinations involving formulas, the reference is the XML representation of these formulas, the textual format is just provided to ease the readability.

### III.2. Evaluation of Tools Confidence in 2017

**This section defines the notion of tool Confidence, to be computed
during a preliminary pass, once all the tools have been processed for all the
examinations (and have therefore provided all the expected results).**

E-2.1. The Confidence of a tool is stated by
a value *R _{tool}* ∈ [0, 1]. Where 1 means the tool always
finds the commonly agreed results within a set of consistent values. The value
0 means that the tool never finds the commonly agreed result within the same
set of consistent values.

E-2.2. The evaluation of the participating tools Confidence is performed during a preliminary pass that checks for all the values provided by tools. A value is a piece of result provided by a tool for a given examination. For example, if an examination requires N values, then, each of these will be considered separately to evaluate tools Confidence. Typically, the state space examination requires tools to compute 4 values that are to be considered separately when evaluating the tools Confidence.

From the full list of values provided by tools for the Model Checking
Contest, let us consider *V*, the subset of consistent values that are
defined in the following way:

- A majority of tools agree on the same value,
- and at least three tools agree on this value.

Let us now consider *V _{tool}*, the set of values provided by
a given tool for the consistent set of values

*V*(not computed results are of course not considered), that also agree with those of

*V*. Then, the Confidence of a tool is computed as follows :

C = _{tool} |
⎪V⎪_{tool} |

⎪V⎪ |

where ⎪*Set*⎪ represents the cardinality of the set.

### III.3. Computing the Reference Values in 2017

**This section deals with how we compute the reference values that are used
to evaluate the results provided by tools. The objective is to determine the
correct result that is a priori unknown in most cases. We must consider two
cases: when the expected result is a value and when the expected result is an
interval.**

E-3.1. For a given examination where exact
values are expected, all the values provided by the participating tools (e.g
those which answered something else than «cannot compute» or
«do not compete») are examined to evaluate the reference one,
according to the majority of the participating tools. In that process, each
tool is weighted by its Confidence rate *R _{tool}*. Several
situations are considered:

- All tools agree, then the reference value is a global agreement,
- There is a weighted majority of tools that agree on one value that is considered to be the reference one.
- Only two tools (t1 and t2) provide different values. If
*R*>_{t1}*R*, the value provided by t1 is considered. If_{t2}*R*>_{t2}*R*, the value provided by t2 is considered. Otherwise, the reference value is declared to be unknown._{t1} - Only one tool provides a value. Then, if
*R*> 0.95, the value is considered to be the reference value. Otherwise, the reference value is declared to be unknown._{tool} - No tool provides any value, then the reference value is declared to be unknown.

E-3.2. **This rule is for information
purposes only in 2017 because, after a pool for the pre-registered tools
developers concerning the UpperBounds examination, all tool are able
to compute an exact upper bound for places. So far, manipulation of intervals
appear to be useless but it might be activated in future editions of the model
checking contest.**

For a given examination where an interval is expected as a result, a more complex algorithm must be defined. First, let us define that:

**N**is the set of natural numbers {0, 1, 2, ...},- for a given expected value, we get, from T tools, a list of answers noted [a
_{i}, b_{i}], where i varies from 1 to T.

We can then define a function *f*: **N** → [0..T] as follows:
for any x in **N**, we set f(x) to be the number of tools that returned an
interval that contains x. Formally, we have f(x) = ⎪{i ∈ {1..T}: x
∈ [a_{i}, b_{i}]}⎪.

Let us observe that the intersection of all intervals is not empty, if there
exists some x ∈ **N** such that f(x) =n. We now consider y ∈ [1,t]
to be the maximum of function f, so that y = max({x ∈ **N**:f(x)}). We
can then define the reference set *S* as follows:

*S* = {x ∈ **N**: f(x) = y}

The reference set *S* is the set of naturals on which the maximum
number of tools agree to locate the computed value. It is the best guess we use
to evaluate the accuracy of results provided by the participating tools.

### III.4. Scoring for 2017

**This section provides information on how we compute a score from a value
provided by a tool. Please note that some details may change in the final
evaluation (due to implementation matters fo example).**

E-4.1. Let us define the following constants that are used to compute scores in the model checking contest:

*ValueScore*: this is the number of points a tool gets when it computes a value correctly in an examination. The corresponding number depends on the examination, the total number of points a tool can receive from an examination being 16 (e.g. when 16 formulas are provided, each correctly computed formula is rewarded with 1 point).*ValuePenalty*: this is the number of points a tool looses when a value is considered to be wrong. It is worth twice the points when a tool has provided the correct value.*ExaminationBonusCPU*: this is the number of extra points a tool receives for being the fastest to compute an examination (the result must not contain an erroenous value). The number associated to this constant is 2 (*ValueScore*÷ 8).*ExaminationBonusMemory*: this is the number of extra points a tool gets for having the lowest memory footprint when computing an examination (the result must not contain an erroenous value). The number associated to this constant is 2 (*ValueScore*÷ 8).

E-4.2.When a tool provides a value
corresponding to the reference value for an examination, it gets
*ValueScore* points. The tool gets 0 points for non computed values. The
score for the examination is a sum of the scores obtained for the values,
unless there is a mistake, as stated in rule E-4.3.

E-4.3. To get a score for an examination, a
tool must provide at least one correct value. If no value is provided (the tool
states «cannot compute» or «do not compete»), the score
for the corresponding examination is 0. For each wrong value provided, the
tools gets the (penalty) score of *ValuePenalty* points. To get bonus
points for the examination (CPU and/or memory), a tool must:

- compute at least half of the values required for the examination, and
- provide no erroneous value.

E-4.4. Score values for «Known» models is ×1, scores for «Stripped» models is ×2, scores for «surprise» is × 6.

E-4.5. For each category, a score is computed as the sum of all the examination scores in this category. Then, tools are ranked in this category with respect to this sum. If a tool is submitted with several variants, then, the best variant for the current category is considered for the podium only (the socre of others are displayed for information).

### III.5. Execution Conditions for Tools in 2017

**This section defines the execution conditions for all tools.**

E-5.1. For a given examination, each tool is executed in a virtual machine. This machine is booted from a reference disk image, then the calculus is performed, results are retrieved and the machine is halted so that no information can be saved from one execution to another one.

For a given model, all examinations on all instances are processed for all tools on the same physical machine so that results can be compared fairly in terms of performances (CPU, memory).

For each examination, there is a CPU time limit of one hour. Once this time limit is reached, the virtual machine is halted and non provided values are considered as "cannot compute".

E-5.2. The CPU emulated in the virtual machine
corresponds to the `SandyBridge` parameter in the -cpu option of the
qemu-system command. It enables the following flags:

Depending on the configuration of the execution machine, some other flags may be enabled too:

flag | signification |

vme |
Virtual 8086 mode enhancements |

rdtscp |
Read Time-Stamp Counter and Processor ID |

constant_tsc |
TSC ticks at a constant rate |

rep_good |
rep microcode works well |

eagerfpu |
Non lazy FPU restore |

arat |
Always Running APIC Timer |

xsaveopt |
Optimized XSAVE |

xtopology |
cpu topology enum extensions |

vmmcall |
prefer VMMCALL to VMCALL |

**IMPORTANT: **if non compatible CPU-based optimizations are enabled in
the provided executable files, this may cause to crashes (and bad results for
the concerned tools) that will not be the responsibility of the organizers.

E-5.3. For sequential tools, the virtual machine characteristics are defined as follows:

- 16 GB of memory,
- 1 core.

E-5.4. For concurrent tools, the virtual machine characteristics are defined as follows:

- 16 GB of memory,
- 4 cores.

# IV. Miscellaneous Provisions

P-1. In case of technical problems or litigious situations, participants should first discuss with the organizers of the Call for Models, or Call for Tools, respectively.

P-2. Any issue that cannot be solved by discussion with the organizers (call for models, call for tools) should be brought before Didier Buchs, who is one of the two General Chairs.

P-3. Withdrawal of a participating tool by the organizers at any moment is possible in case of not respecting the present rules.

P-4. A participating tool can resign at any moment, it will be completely removed from the record of the contest.

P-5. The Model Checking Contest is governed by French laws. Only the justice courts of Paris (France) are legally competent for settling disputes related to this contest.