[BIOSAL] Results with Xeon and Xeon Phi
Boisvert, Sebastien
boisvert at anl.gov
Mon Nov 3 16:05:21 CST 2014
> From: Fangfang Xia [fangfang.xia at gmail.com]
> Sent: Monday, November 03, 2014 3:47 PM
> To: Boisvert, Sebastien
> Cc: biosal at lists.cels.anl.gov
> Subject: Re: [BIOSAL] Results with Xeon and Xeon Phi
>
>
> This interesting. I’m curious what the call stacks for these spin locks are?
>
> On Nov 3, 2014, at 3:35 PM, Boisvert, Sebastien <boisvert at anl.gov> wrote:
> 42.42%
> [kernel] [k] _spin_lock
>
I traced the job with perf.
[boisvert at bigmem biosal]$ ./performance/latency_probe/latency_probe -threads-per-node 30 | tee log
[boisvert at bigmem biosal]$ perf record -g -e cpu-cycles -o spinlock.data -p 4744
^C[ perf record: Woken up 842 times to write data ]
[ perf record: Captured and wrote 215.969 MB spinlock.data (~9435828 samples) ]
[boisvert at bigmem biosal]$ perf report -i spinlock.data
Samples: 2M of event 'cpu-cycles', Event count (approx.): 1263624159273
- 61.46% latency_probe [kernel.kallsyms] [k] _spin_lock ▒
- _spin_lock ◆
- 57.54% futex_wake ▒
do_futex ▒
sys_futex ▒
system_call_fastpath ▒
- __lll_unlock_wake_private ▒
100.00% 0xb54 ▒
- 42.20% futex_wait_setup ▒
futex_wait ▒
do_futex ▒
sys_futex ▒
system_call_fastpath ▒
- __lll_lock_wait_private ▒
100.00% 0xb54
For this, it is not clear what the problem is. The biosal code is not using Fast Userspace mutexes (FUTEX).
I used gdb:
(gdb) info threads
(gdb) thread 5
(gdb) bt
#0 0x0000003f196f7fce in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x0000003f1963651d in _L_lock_10 () from /lib64/libc.so.6
#2 0x0000003f19636361 in random () from /lib64/libc.so.6
#3 0x0000003f196369e9 in rand () from /lib64/libc.so.6
#4 0x0000000000402087 in process_send_ping (self=0x7f3df6e88360) at performance/latency_probe/process.c:278
#5 0x000000000040228c in process_receive (self=0x7f3df6e88360, message=0x7f3de256ada0) at performance/latency_probe/process.c:252
#6 0x0000000000406aa4 in thorium_actor_receive_private (self=0x7f3df6e88360) at engine/thorium/actor.c:1195
#7 thorium_actor_receive (self=0x7f3df6e88360) at engine/thorium/actor.c:1091
#8 thorium_actor_work (self=0x7f3df6e88360) at engine/thorium/actor.c:2077
#9 0x0000000000408b65 in thorium_worker_work (worker=0xf82028) at engine/thorium/worker.c:1858
#10 thorium_worker_run (worker=0xf82028) at engine/thorium/worker.c:1646
#11 0x0000000000408dc3 in thorium_worker_main (worker1=0xf82028) at engine/thorium/worker.c:675
#12 0x0000003f1a2079d1 in start_thread () from /lib64/libpthread.so.0
#13 0x0000003f196e886d in clone () from /lib64/libc.so.6
The problem is that all threads share the same seed, and in the glibc there a futex to protect the
seed and/or the current value.
I am fixing this right away.
>
>
More information about the BIOSAL
mailing list