[BIOSAL] latency_probe results on POWER7 (on dowd at JLSE)

Boisvert, Sebastien boisvert at anl.gov
Tue Nov 4 20:30:55 CST 2014


I just read that power7 has a relaxed memory model. I fixed the problem

I am getting 3.7 M messages / s on 30 threads (this is a 4.2 GHz CPU)

biosal version: 07353e8af5d5db5be41faa5578cf25c917df6cad

Platform: IBM Power 740 Express server 8205-E6D
Architecture: POWER7 (relaxed memory model, very different from x86-64)

[boisvert at dowd biosal]$ head /proc/cpuinfo  -n4
processor       : 0
cpu             : POWER7 (architected), altivec supported
clock           : 4284.000000MHz
revision        : 2.1 (pvr 004a 0201)

[boisvert at dowd biosal]$ tail -n 4 /proc/cpuinfo 
timebase        : 512000000
platform        : pSeries
model           : IBM,8205-E6D
machine         : CHRP IBM,8205-E6D

The machine has 1 processor, 6 cores, and 8 threads per core (48 threads).

[boisvert at dowd biosal]$ grep processor /proc/cpuinfo |wc -l
48

[boisvert at dowd biosal]$ uname -a
Linux dowd 2.6.32-431.3.1.el6.ppc64 #1 SMP Fri Dec 13 06:57:48 EST 2013 ppc64 ppc64 ppc64 GNU/Linux

[boisvert at dowd biosal]$ gcc --version|head -n1
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)

[boisvert at dowd biosal]$ make CONFIG_MPI=n CC=cc -j

[boisvert at dowd biosal]$  ./performance/latency_probe/latency_probe -threads-per-node 12 | tee log-12
[boisvert at dowd biosal]$ grep COUNTER log-12
PERFORMANCE_COUNTER type = ping-pong
PERFORMANCE_COUNTER ping-action = ACTION_PING
PERFORMANCE_COUNTER pong-action = ACTION_PING_REPLY
PERFORMANCE_COUNTER node-count = 1
PERFORMANCE_COUNTER worker-count-per-node = 11
PERFORMANCE_COUNTER actor-count-per-worker = 100
PERFORMANCE_COUNTER worker-count = 11
PERFORMANCE_COUNTER actor-count = 1100
PERFORMANCE_COUNTER ping-message-count-per-actor = 40000
PERFORMANCE_COUNTER ping-message-count = 44000000
PERFORMANCE_COUNTER pong-message-count = 44000000
PERFORMANCE_COUNTER message-count = 88000000
PERFORMANCE_COUNTER elapsed-time = 43.722831 s
PERFORMANCE_COUNTER computation-throughput = 2012678.467207 messages / s
PERFORMANCE_COUNTER node-throughput = 2012678.467207 messages / s
PERFORMANCE_COUNTER worker-throughput = 182970.769746 messages / s
PERFORMANCE_COUNTER worker-latency = 5465 ns
PERFORMANCE_COUNTER actor-throughput = 1829.707697 messages / s
PERFORMANCE_COUNTER actor-latency = 546535 ns

[boisvert at dowd biosal]$ grep COUNTER log-24
PERFORMANCE_COUNTER type = ping-pong
PERFORMANCE_COUNTER ping-action = ACTION_PING
PERFORMANCE_COUNTER pong-action = ACTION_PING_REPLY
PERFORMANCE_COUNTER node-count = 1
PERFORMANCE_COUNTER worker-count-per-node = 23
PERFORMANCE_COUNTER actor-count-per-worker = 100
PERFORMANCE_COUNTER worker-count = 23
PERFORMANCE_COUNTER actor-count = 2300
PERFORMANCE_COUNTER ping-message-count-per-actor = 40000
PERFORMANCE_COUNTER ping-message-count = 92000000
PERFORMANCE_COUNTER pong-message-count = 92000000
PERFORMANCE_COUNTER message-count = 184000000
PERFORMANCE_COUNTER elapsed-time = 73.469081 s
PERFORMANCE_COUNTER computation-throughput = 2504454.892274 messages / s
PERFORMANCE_COUNTER node-throughput = 2504454.892274 messages / s
PERFORMANCE_COUNTER worker-throughput = 108889.343142 messages / s
PERFORMANCE_COUNTER worker-latency = 9183 ns
PERFORMANCE_COUNTER actor-throughput = 1088.893431 messages / s
PERFORMANCE_COUNTER actor-latency = 918363 ns

[boisvert at dowd biosal]$  ./performance/latency_probe/latency_probe -threads-per-node 30 | tee log-30
[boisvert at dowd biosal]$ grep COUNTER log-30
PERFORMANCE_COUNTER type = ping-pong
PERFORMANCE_COUNTER ping-action = ACTION_PING
PERFORMANCE_COUNTER pong-action = ACTION_PING_REPLY
PERFORMANCE_COUNTER node-count = 1
PERFORMANCE_COUNTER worker-count-per-node = 29
PERFORMANCE_COUNTER actor-count-per-worker = 100
PERFORMANCE_COUNTER worker-count = 29
PERFORMANCE_COUNTER actor-count = 2900
PERFORMANCE_COUNTER ping-message-count-per-actor = 40000
PERFORMANCE_COUNTER ping-message-count = 116000000
PERFORMANCE_COUNTER pong-message-count = 116000000
PERFORMANCE_COUNTER message-count = 232000000
PERFORMANCE_COUNTER elapsed-time = 61.575498 s
PERFORMANCE_COUNTER computation-throughput = 3767732.395311 messages / s
PERFORMANCE_COUNTER node-throughput = 3767732.395311 messages / s
PERFORMANCE_COUNTER worker-throughput = 129921.806735 messages / s
PERFORMANCE_COUNTER worker-latency = 7696 ns
PERFORMANCE_COUNTER actor-throughput = 1299.218067 messages / s
PERFORMANCE_COUNTER actor-latency = 769693 ns


[boisvert at dowd biosal]$ ./performance/latency_probe/latency_probe -threads-per-node 48 | tee log-48
[boisvert at dowd biosal]$ grep COUNTER log-48
PERFORMANCE_COUNTER type = ping-pong
PERFORMANCE_COUNTER ping-action = ACTION_PING
PERFORMANCE_COUNTER pong-action = ACTION_PING_REPLY
PERFORMANCE_COUNTER node-count = 1
PERFORMANCE_COUNTER worker-count-per-node = 47
PERFORMANCE_COUNTER actor-count-per-worker = 100
PERFORMANCE_COUNTER worker-count = 47
PERFORMANCE_COUNTER actor-count = 4700
PERFORMANCE_COUNTER ping-message-count-per-actor = 40000
PERFORMANCE_COUNTER ping-message-count = 188000000
PERFORMANCE_COUNTER pong-message-count = 188000000
PERFORMANCE_COUNTER message-count = 376000000
PERFORMANCE_COUNTER elapsed-time = 184.703716 s
PERFORMANCE_COUNTER computation-throughput = 2035692.666180 messages / s
PERFORMANCE_COUNTER node-throughput = 2035692.666180 messages / s
PERFORMANCE_COUNTER worker-throughput = 43312.609919 messages / s
PERFORMANCE_COUNTER worker-latency = 23087 ns
PERFORMANCE_COUNTER actor-throughput = 433.126099 messages / s
PERFORMANCE_COUNTER actor-latency = 2308796 ns


> ________________________________________
> From: biosal-bounces at lists.cels.anl.gov [biosal-bounces at lists.cels.anl.gov] on behalf of Boisvert, Sebastien [boisvert at anl.gov]
> Sent: Tuesday, November 04, 2014 1:03 PM
> To: biosal at lists.cels.anl.gov
> Subject: [BIOSAL] latency_probe results on POWER7 (on dowd at JLSE)
> Version: c541b41c0e
> $ make CONFIG_MPI=n CC=cc -j
> Hardware: IBM Power 740 Express (8205-E6D)
> http://www-03.ibm.com/systems/power/hardware/740/specs.html
> I think this is a POWER7+ with 6 cores and 8 threads per core (48 threads).
> [boisvert at dowd biosal]$ ./performance/latency_probe/latency_probe > log-1
> [boisvert at dowd biosal]$ grep COUNTER log-1
> PERFORMANCE_COUNTER node-count = 1
> PERFORMANCE_COUNTER worker-count-per-node = 1
> PERFORMANCE_COUNTER actor-count-per-worker = 100
> PERFORMANCE_COUNTER worker-count = 1
> PERFORMANCE_COUNTER actor-count = 100
> PERFORMANCE_COUNTER message-count-per-actor = 40000
> PERFORMANCE_COUNTER message-count = 4000000
> PERFORMANCE_COUNTER elapsed-time = 21.154284 s
> PERFORMANCE_COUNTER computation-throughput = 189086.994342 messages / s
> PERFORMANCE_COUNTER node-throughput = 189086.994342 messages / s
> PERFORMANCE_COUNTER worker-throughput = 189086.994342 messages / s
> PERFORMANCE_COUNTER worker-latency = 5288 ns
> PERFORMANCE_COUNTER actor-throughput = 1890.869943 messages / s
> PERFORMANCE_COUNTER actor-latency = 528857 ns
> 
> I am using gcc 4.4 (default on the system):
> $ cc --version|head -n1
> cc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
> With 2 worker threads, there is some sort of problem (maybe memory visibility ?):
> [boisvert at dowd biosal]$ ./performance/latency_probe/latency_probe -threads-per-node 3 > log-3
> ^C
> [boisvert at dowd biosal]$ tail log-3
> progress 1000097 276000/40000
> progress 1000086 366000/40000
> progress 1000078 280000/40000
> progress 1000086 368000/40000
> progress 1000156 280000/40000
> progress 1000086 370000/40000
> progress 1000039 280000/40000
> progress 1000086 372000/40000
> progress 1000184 282000/40000
> _______________________________________________
> BIOSAL mailing list
> BIOSAL at lists.cels.anl.gov
> https://lists.cels.anl.gov/mailman/listinfo/biosal


More information about the BIOSAL mailing list