A forum for Spin users
You are not logged in.
Pages: 1
Hi,
I am trying to run spin using multicore options. It runs perfectly fine on intel processors. But I want to run it on a POWER7 processor. The pan.c generated doesn't seem to have code for POWER7. So we included the code for POWER7 in it. Specifically what we did was to include the test and set(tas) code for POWER7. There is place in pan.c where it defines tas operation for different processors. There we defined the tas for power 7 as follows using the atomic api in IBM C compiler(xlc)
#else
/*IBM */
int
tas(volatile int *s)
{
int r;
r = __fetch_and_or(s, 1);
return r;
}
//#error missing definition of test and set operation for this platform
#endif
The api docs of fetch_and_or tells [1]:
unsigned int __fetch_and_or (volatile unsigned int* addr, unsigned int val) - Sets bits in the word or doubleword specified by addr by OR-ing that value with the value specified val, in a single atomic operation, and returns the original value of addr.
It runs perfectly on some executions(it gives a speed of around 20000 states/sec for 12 cores). But in some executions it goes very slow and we have to forcefully stop the execution. In those cases the speed of around 100-500 states/sec. While doing Ctrl+c, we take the statcktrace and in all the traces it is in the function Get_Full_Frame in thread 1.
#0 0x00000000100646a8 in Get_Full_Frame (n=0) at pan.c:5636
#1 0x0000000010065818 in Read_Queue (q=0) at pan.c:5982
#2 0x0000000010066714 in mem_get () at pan.c:5502
#3 0x00000000100669e0 in do_the_search () at pan.c:7186
#4 0x0000000010067d9c in run () at pan.c:2574
#5 0x0000000010069850 in main (argc=1, argv=0xfffffffee38) at pan.c:10487
Equivalently we tried with gcc using the api "__sync_fetch_and_or". It is also giving same result(getting stuck in some executions).
Is there something more we need to do while trying to include a new architecture?
[1]:http://publib.boulder.ibm.com/infocenter/cellcomp/v101v121/index.jsp?topic=/com.ibm.xlcpp101.cell.doc/compiler_ref/bif_fetch_and_or_fetch_and_orlp.html
Thank you
Sriraj
Offline
Hello,
We solved the problem for multicore execution in power7. The problem is that Power 7 has a relaxed memory model.
We solved it as follows. There was no test_and_set defined for power7 in the pan.c file
We included it as :
"
#elif defined(__powerpc64__)
int
tas(volatile int *s)
{
int r;
r = __fetch_and_or(s, 1);
__isync();
return r;
}
"
The __isync() prevents any instructions inside the critical section from moving above the the __fetch_and_or.
Similarly at the end of the critical section, that is where sh_lock[which] is set to zero ( sh_lock[which] = 0; /* unlock */ ) we need a barrier. We should prevent any memory operations inside critical section from moving outside this unlock. For this we can use __lwsync();
We included it as :
"
__lwsync(); sh_lock[which] = 0; /* unlock */
"
__isync and __lwsync are lighter than full sync. __fetch_and_or() gives a warning since its definition uses unsigned int whereas we pass int. These atomic functions require xlc [1] compiler (not gcc).
Possible gcc replacements would be [2]:
__lwsync() -> __sync_synchronize()
__fetch_and_or(s, 1); __isync(); -> __sync_lock_test_and_set()
[1]: http://www-01.ibm.com/support/docview.wss?uid=swg27024742&aid=1
[2]: https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
Offline
That would be great.
For powerpc include the barrier while unlocking also.
Offline
Pages: 1