Spinroot

A forum for Spin users

You are not logged in.

#1 2013-05-29 10:22:23

Mara
Member
Registered: 2012-05-30
Posts: 24

Pan compilation error when running several swarm jobs on a cluster

Dear Spin experts,
I've run into an error message during the run of a bunch of swarm jobs.  I am on a cluster and so I can run several swarm jobs at the same time. I already know that I have to use different directories for each spin call  (i.e  for each swarm job) and so I did. But if I try to run together all the jobs, for some of them I get the error message I attach below. Please note that if I run the single swarm job alone I don't get any error message.
My hypothesis is that maybe, when running a huge amount (about 1500) of swarm jobs each of them generating the pan.* files, some kind of race conditions happens in the access to some library or similar.  I would like to hear your opinion on that... maybe there is something that I'm missing?
BTW, are you planning to release a parallel-safe version of Spin in the future?

Thanks,
Mara


here begins the  error, is only a (very) small part of the whole output:

In file included from /usr/include/stdio.h:28,
                 from pan.c:7:
/usr/include/features.h:361:25: error: /usr/include/sys/cdefs.h: Input/output error
In file included from /usr/include/stdio.h:34,
                 from pan.c:7:
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/include/stddef.h:211: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'ty
pedef'
In file included from pan.c:7:
/usr/include/stdio.h:49: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'typedef'
/usr/include/stdio.h:54: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__USING_NAMESPACE_STD'
In file included from /usr/include/stdio.h:75,
                 from pan.c:7:
/usr/include/libio.h:332: error: expected specifier-qualifier-list before 'size_t'
/usr/include/libio.h:364: error: expected declaration specifiers or '...' before 'size_t'
/usr/include/libio.h:373: error: expected declaration specifiers or '...' before 'size_t'
/usr/include/libio.h: In function '_IO_feof':
/usr/include/libio.h:462: error: expected declaration specifiers before '__THROW'
/usr/include/libio.h:463: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__THROW'
/usr/include/libio.h:465: error: storage class specified for parameter '_IO_peekc_locked'
/usr/include/libio.h:471: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__THROW'
/usr/include/libio.h:472: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__THROW'
/usr/include/libio.h:473: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__THROW'
/usr/include/libio.h:491: error: storage class specified for parameter '_IO_vfscanf'
/usr/include/libio.h:493: error: storage class specified for parameter '_IO_vfprintf'
/usr/include/libio.h:494: error: storage class specified for parameter '_IO_padn'
/usr/include/libio.h:495: error: expected '=', ',', ';', 'asm' or '__attribute__' before '_IO_sgetn'
/usr/include/libio.h:497: error: storage class specified for parameter '_IO_seekoff'
/usr/include/libio.h:498: error: storage class specified for parameter '_IO_seekpos'
/usr/include/libio.h:500: error: expected '=', ',', ';', 'asm' or '__attribute__' before '__THROW'

Last edited by Mara (2013-05-29 10:52:12)

Offline

#2 2013-05-30 06:21:15

spinroot
forum
Registered: 2010-11-18
Posts: 695
Website

Re: Pan compilation error when running several swarm jobs on a cluster

this seems to be an error encountered by the compiler -- and i agree that most likely
different runs of the compiler may be stepping on each others temporary files.
you could try to generate the pan executables for the swarm jobs separately first,
before you actually invoke all the swarm jobs for the actual verification.
the will slow things down a bit at the start -- but just for generating the executables that
are needed.
if you do actually invoke swarm twice in parallel, i wouldn't be surprised if this happened.
it would then be better to define the right jobs in a single swarm configuration file (all 1500 or so jobs)
and execute once -- and let swarm farm out the executables to all the cores

Offline

#3 2013-05-30 09:22:43

Mara
Member
Registered: 2012-05-30
Posts: 24

Re: Pan compilation error when running several swarm jobs on a cluster

Thank you very much for the answer! I will try separating the compilation part from the execution part (but I still think that a parallel-safe version of Spin/Swarm would be of great interest).

Offline

Board footer

Powered by FluxBB