Beem DataBase Combined Results

The BEEM Database

One of the problems in formal verification has long been that there is no generally agreed upon benchmark set that researchers can use to compare the performance of different algorithms, different implementation, or different verification tools.

There are several problems with the development of a standard approach. Different verification tools (such as Spin, SMV, Murphi, or Uppaal) are typically designed for very specific domains of problems that do not always overlap. Spin, for instance, is designed for models of asynchronous software systems, SMV was designed primarily for the verification of hardware circuits, and Uppaal was designed primarily for real-time verification problems. Modeling a hardware circuit as a Spin model is typically not a good idea, as is the verification of a real-time asynchronous software verification problem in SMV.

Each verification tool tends to have its own specification format, and a specific set of primitives for which it can give optimized performance. Translation from one format into another is always plagued with the problem that an encoding that is efficient for one tool is translated into an encoding that is inefficient for another tool. An experienced user can in these cases typically come up with a new version of the converted model that performs significantly better than the default mechanically translated version.

The most meaningful tool comparisons that have been done to date, therefore, have involved the original designers of each verification tool to craft optimal encodings of the same basic model for each different tool. A comparison of the resulting tool performance, as used by expert users on the same verification problems now becomes a much better indication of a tool's relative strengths and weaknesses. Only a handful of such studies has been done to date.

Beem

The research group in Brno has made a very noteworthy attempt to address this problem by making available a large benchmark set of verification models: the BEEM database.

The collection contains 57 separately models, categorized in 8 different groups (communication protocols, mutual exclusion algorithms, leader election algorithms, other protocols, controllers, planning and scheduling, puzzles, and miscellaneous).

Each model is parameterized to yield between 3 and 8 different problem instances, which brings the total number of models in the database to 298. For each instance there is a wealth of information provided in the database.

Most, though not all, models in the database have both a version in a basic DVE format that supports the DiVinE Distributed Verification Environment toolset, together with a small number of relevant correctness properties. For many, though not all, model instances there is also a mechanically generated Promela version of the model, that can be verified with the Spin tool. At the time of writing (March 31, 2007) we counted 232 of the 298 models as having a Spin version. The Spin versions are not in all cases comparable to the DVE version (as measured by the number of reachable states), but in quite a few cases they are.

We downloaded all 232 Spin models, and the statistics on the verification performed with the DiVinE toolset (as available in xml files for some of the problem instances, or from the main pages for each model). We then setup a test-script to run verifications for each of the models, with settings that should be roughly comparable to those used for the DiVinE tests. Specifically: we ran the tests on a 64-bit Linux workstation with 8 GB of memory, running at 2.3 GHz. The verification settings were as follows (note that we use the default 64-bit compilation which can increases memory use):

	#!/bin/sh

	echo Safety Verification
	for i in *.pm
	do	echo $i
		spin -a $i > stderr/$i 2>&1
		cc -O3 -DSAFETY -DMEMLIM=7000 -DNSUCC -o pan pan.c 2>> stderr/$i
		./pan -m10000000 -c0 -n -w27 > stdout/$i 2>> stderr/$i
	done
The -DNSUCC directives prompts Spin version 4.3.0 to generate some additional statistics about the number of successors per reachable state, which matches statistics collected in the BEEM database for the DiVinE verifications. From the output, we collect selected statistics, such as runtimes, numbers of states, memory size, and whether or not the verification completed successfully. We have populated the following table with this information, and compare it with the results from the BEEM database.

We list all models here in alphabetical order, rather than grouped in categories as in the BEEM database. None of the Spin versions of the models were modified from the mechanically generated versions in the BEEM database, with one exception. The exception is the production_cell example where a syntax error exists in the generated model: the name done is used as both a labelname and as a variable name in some of the problem instances. We have renamed the variable name into Done to avoid this problem. Clearly, rewriting the generated models could lead to huge reductions in complexity (e.g., by using channels instead of buffers) but we have resisted doing so.

In the table below, all results from Spin are given in blue, all results from the BEEM database are in red. Where one of the two tools performed better (in terms of given a shorter overall runtime without failing the run) the corresponding number is set in bold, red or blue. When no number from the BEEM database was present in the xml files, the corresponding field in the table has been left blank. A red or blue x in the last column indicates that BEEM or Spin did not complete the search, either by exhausting the maximum stack depth or by exhausting the maximum amount of available memory.

At the bottom of the table some statistics are generated for the relative performance of the BEEM and Spin verifications (where comparisons were possible).

One of the objectives of the BEEM database is to facilitate evaluating the performance of multi-core algorithms for model-checking. We have not done multi-core performance tests with Spin yet for these models (and none are present in the BEEM database for DiVinE yet either), but we plan to add these later.

Beem DataBase: B=BEEM S=Spin
Model NameTime (sec.)StatesFanOutDepthComplete
adding.10.50.01737273721.511671.5124039--
adding.27.40.988368388368381.54177--
adding.319.12.21189437618943801.54284--
adding.428.33.95337068033706801.54390--
adding.552.36.22527145652714601.54395--
adding.664.89760968476096801.54499--
anderson.14.10.593470393526661.9972639--
anderson.20.40145914612.539412.535627630--
anderson.31179.6401755739251317770002.9461465064--
anderson.40.90.0829641296433.28993.291201712021--
anderson.5982.7546281436731350490004.4939999999xx
anderson.6378.686.918206917182069004.7786295728--
anderson.7859.2417285581751118580005.1299999999xx
anderson.81054.5514278588521111920005.1269999999-x
at.11.10.0939354393562.755452.75557225585--
at.21.30.1249443494452.971912.97265546307--
at.338.85.65171162017116203.549207582--
at.4152.823.1659724565972503.861851760--
at.5821.112331999440319994003.9143826060--
at.61529.2481374127611188550004.3489999999xx
at.71802.534628881350973722004.3529999999xx
bakery.10.40150614851.790841.788302240--
bakery.20.40114611251.819371.816291254--
bakery.30.90.0732919299582.583952.56135343762--
bakery.42.70.361570031512902.61424156--
bakery.5143.619.3786640172385503.414349824--
bakery.6228.73011845035111080003.3931729048--
bakery.7540.875.529047471275317003.4531691697--
bakery.81027.4441304426981183910004.1011841293xx
blocks.20.50.01705770592.628882.62842634264--
blocks.3142.096954186954203.012487724--
blocks.4745.525223598882889878003.4299999999xx
bopdp.10.70.0312642114471.901521.522109118--
bopdp.21.10.0625685180052.840882.07828832283--
bopdp.329.42.4210409537643751.954221--
bridge.10.60.223186961941.432831.1814084--
bridge.29.533.29692393147301.974351.15488124--
brp.11.10.0218928101461.88991.15627692512--
brp.21.60.0529188183511.992291.18925111858--
brp.343.21.369966275795411.21510589--
brp.4564.516.91206844773553201.20635004--
brp.5818.224.817740267108417001.20646574--
brp.62192.762.642728113267536001.20160765--
cambridge.11.21.3113393358472.36071.071176146019--
cambridge.21.72.27159405317993.821021.121236648408--
cambridge.31.72.44181385932102.510531.064354678506--
cambridge.45.19.186046322879502.546281.0744942109783--
cambridge.559.8168698912347003001.1331717959--
cambridge.62682373354295601976001.045219019-x
cambridge.71108.725211465015547350001.0921854342-x
driving_phils.10.70.011488952031.920551.9451770843--
driving_phils.210.023317375752.467492.5383702557--
driving_phils.3838.218523583477602338002.244413xx
driving_phils.4841.133.421317183111781002.647288766x-
driving_phils.5507.715418601771556005001.995304xx
elevator.10.70.338543874611.862811.188145525079--
elevator.20.50.082825239691.86691.1864388979--
elevator.323.2111416935186877001.1554519007--
elevator.443.1408888053598434001.189999999-x
elevator.51802.542814082548509614001.2169999999xx
elevator2.10.50172817282.759262.759569452--
elevator2.240.831792001792005.78618092--
elevator2.3176.945.1766771276677107.222883189--
elevator_planning.110.0927630276325.931235.931481014811--
elevator_planning.2324.768.811428767114288008.1627923957--
elevator_planning.31.70.2552496524988.887698.8863104631047--
extinction.11.11.2489933458812.640941.28873166--
extinction.21.31.71100614360432.652121.29673166--
extinction.396.8239751930509136001.286266--
extinction.4328.73312001372657152001.256356-x
firewire_link.11.10.01172423781.914730.826368158--
firewire_link.2230.1855887501602.402540.7403129305--
firewire_link.33601.83885683833656434001.171183491xx
firewire_link.423.60.0724330180782.310690.837496221--
firewire_link.53601.831.6337121959261500.6792581x-
firewire_link.63601.93123076225449737001.128670164xx
firewire_link.7641.40.723995981269380.8624285--
fischer.10.406346362.200322.191195157--
fischer.20.80.0621733217353.110023.10932473035--
fischer.367.911.5289670528967104.239141602--
fischer.432.14.3127225412722603.62328932--
fischer.51249.7493310772461010280004.73951946x-
fischer.6226.533832172883217304.02105361--
fischer.71105.6607281130011164310004.9042031495xx
frogs.10.50.01509450961.040641.048285--
frogs.20.70.0318207182091.823971.8243134--
frogs.313.21.127607897607911.007260--
frogs.4363.844.117443219174432002.0864--
frogs.5561.8288203115381032910002.10574-x
gear.10.50.162689531711.326521.2342124545--
gear.21.10.89166893249711.304271.219116229995--
hanoi.10.50.02656165632.999542.99743766567--
hanoi.29.51.875314415314433531447--
hanoi.325454.1143489071432150039999999-x
hanoi.4432.6131188903963457620039999999xx
iprotocol.10.60.04681498283.303791.361193314--
iprotocol.21.40.129994367753.35031.363443445--
iprotocol.329.84.09101345612379401.351075--
iprotocol.491.211.2329091637173901.347981--
iprotocol.5918.411131071582349860001.3431624--
iprotocol.61210.312241387484385171001.3552455--
iprotocol.71216.618824673777584881001.3461701x-
krebs.10.50.176027592023.159121.86666102--
krebs.21.72.36624767215184.170272.19388132--
krebs.35.51423887640453602.26161--
krebs.424.365.21047405170658002.411163--
lamport.10.90.012924267272.642982.5719171543--
lamport.22.10.04110920212682.732222.68730472736--
lamport.310.013806786582.699112.6326522342--
lamport.520.60.4210668001755843.31341640--
lamport.6181.12.4287176889762463.539138997--
lamport.71001.1153871784647208204.0591115500--
lamport.81503.123.76266931773435604.1941877113--
lamport_nonatomic.13.80.46204341252533.207111.3741054341491--
lamport_nonatomic.22.60.37129581071773.240551.388151011240--
lamport_nonatomic.36.20.92369832464983.334961.3892058381622--
lamport_nonatomic.4553881257304161946001.4495489457--
lamport_nonatomic.53601.84333488300791061001.7389999999xx
lann.11.10.2518424716212.153331.193160612617--
lann.210.44127841041302.674591.539651127--
lann.388.131.3183213946660601.7571376484--
lann.460.987.3966855125950001.3663778024--
lann.562.4209993914288605001.6894088--
lann.63601.8594571208271031220001.6589999999xx
lann.73601.873052237514903772001.8949999999xx
lann.83601.854845091872902999001.6927600550xx
leader_filters.10.50.01496648071.890250.99343938--
leader_filters.20.80.0229284164092.255221.2293938--
leader_filters.31.80.1391093877092.458811.0994544--
leader_filters.41.20.0750025470252.534411.0243534--
leader_filters.527.41572886-
leader_filters.6802.128480454x
leader_filters.7546.626302351-
loyd.10.407207222.334721.555225227--
loyd.25.90.923628803628821.778199348--
loyd.3601.7254302478861246510002.8989999999xx
mcs.10.50796342412.700362.35321371101--
mcs.20.4014086962.288351.7658344--
mcs.311.70.685714592870343.09571527--
mcs.40.70.011638439923.252.36714677--
mcs.50.710260556519290025003.8427785275--
mcs.67.20.07332544366002.701136--
msmie.10.50233423361.326911.326111129--
msmie.20.90.0210558105601.125020.964219232--
msmie.38.50.251348441348461.487751.48818521969--
msmie.4778.818.3712544171254401.55252287--
needham.10.504977841.515090.96841625--
needham.29.10.0454976150722.480661.12540--
needham.336.10.13206925502141.10247--
needham.41488.61.0165250193609431.10954--
peg_solitaire.12.70.1732181321834.84184.8421516--
peg_solitaire.23601.8108013264450803501007.07333xx
peg_solitaire.33601.917405439599803501007.60833xx
peg_solitaire.478.96.318733268733286.26724--
peg_solitaire.514.90.7484191841933.856093.8561617--
peg_solitaire.61802.610307330153904079007.49636xx
peterson.10.60.021249881452.669952.46318621596--
peterson.22.20.261247041145163.200682.7252681323505--
peterson.32.80.08170156351422.72112899--
peterson.421.42.0811195607524603.25268989--
peterson.5823.229531656891910712003.5839999999xx
peterson.6778.129.53176215685725803.5413187515x-
peterson.7967.339629618148958416004.1314582953x-
phils.10.4080802.652.655850--
phils.20.405815814.044754.045407412--
phils.30.4072972944518495--
phils.49.52.723407893407899.166291549--
phils.513.94.265314405314408434031--
phils.6589.714414348906139566009.9859999999-x
phils.73965.6568141939726440660012.539999999xx
phils.83840.4359144871193699100010.659999999xx
pouring.13.50.025035038.908551.279348837--
pouring.2231.17.92516245162423.87871.44450993403--
production_cell.10.80.3614586635852.688191.19395919676--
production_cell.20.60.229003553262.354991.213151387--
production_cell.346.757.482261257626101.1451013424--
production_cell.414.972340685100017001.226474--
production_cell.5291.34374211856334881001.1214110153--
production_cell.61273.478314520700553597001.0899999999-x
protocols.10.50243015042.666671.413728893--
protocols.20.60.011128638153.744021.4192194914--
protocols.30.50.01281754692.778131.42591559--
protocols.48.53.4943924511984401.492199849--
protocols.5209.3699634531252801.501720752--
public_subscribe.10.405808071.494831.143140227--
public_subscribe.22121.718466034858441.34483783--
public_subscribe.3206.41.4918466034858441.34483783--
public_subscribe.4206.11.4418466034858441.34483783--
public_subscribe.53601.829116788621707643001.4429999999xx
reader_writer.10.70.05266633683.997751.109247648--
reader_writer.210.154104821111.98591.243409710085--
reader_writer.3137.324.36044987519521.14281942--
rether.10.50.01245847311.120831.20712513918--
rether.210.029278108791.113281.20144997980--
rether.3220.15305334679691.19441788--
rether.4154.50.6111570522525401.218135238--
rether.56340.7630170442934691.182158879--
rether.61239.31.4859196945329761.213303756--
rether.71607.22.1347894097260481.173428668--
rushhour.10.40.01104810505.196563.767535479--
rushhour.20.50.01224222445.621324.29906896--
rushhour.310.82.1715672315672510.11137647--
rushhour.422.14.732767532767710.35295929--
schedule_world.11.20.012306138866.206585.262852118--
schedule_world.275.70.6215703401061007.64873--
schedule_world.31508.238.514420303426798010.246636x-
sokoban.12.20.2691453914552.496512.49615241525--
sokoban.215.42.047616337616352.643690--
sokoban.3177.21117034432248750002.41737252xx
sorter.10.90.0220544147191.494211.456617506--
sorter.20.60.01759247441.381721.344247223--
sorter.3412.0912884787794812.106800--
sorter.442136.912958752131844002.05213197--
sorter.59.70.42961481519472.124749--
szymanski.10.80.0420264184352.798112.46820642079--
szymanski.210.0631875297482.777132.51440952051--
szymanski.325.92.4711284249987943.01446277--
telephony.10.40128012822.732032.626146183--
telephony.21.60.1451826518283.865283.86572787379--
telephony.321.12.587653797653813.86247009--
telephony.4558.150.812291552122916005.2161133794--
telephony.51802.6436211286621014570005.4721629482xx
telephony.61801.5539238257321198810004.707462532xx
telephony.71017.290.121960308219603004.6881345692--
telephony.81802.6428223803341028500004.596397109xx
train-gate.10.40.091020296002.11.205273944--
train-gate.21.565.522076178781002.150031.2282455117373--
train-gate.31.574.322496198322002.150781.2422455208457--
train-gate.451.6299789808778792001.1891865412-x
train-gate.551.6294803458763866001.1983192892-x
train-gate.6476.23185904140797734001.183413954-x
train-gate.73601.829434562600714876001.181609304xx

Stats (unmodified Spin models as mechanically generated from DVE format):
Nr of cases where Spin failed: 42 of 232 fraction: 0.181034
Nr of cases where Beem failed: 38 of 232 fraction: 0.163793
Nr of cases where Spin has shorter runtime: 166 of 232 fraction: 0.715517
Nr of cases where Beem has shorter runtime : 33 of 232 fraction: 0.142241
Nr of cases where Spin has fewer states: 79 of 232 fraction: 0.340517
Nr of cases where Beem has fewer states: 107 of 232 fraction: 0.461207
Nr of cases with matching nrs of states : 13 of 232 fraction: 0.0560345
Nr of cases with matching nrs of states +- 10%: 105 of 232 fraction: 0.452586
Nr of cases where Beem has fewer states but a longer runtime: 107 of 107 fraction: 1

generated: Sun Apr 1 19:35:35 PDT 2007
BEEM database
Spin homepage