[BIOSAL] First job on Edison

Boisvert, Sebastien boisvert at anl.gov
Mon Nov 24 11:43:28 CST 2014


Hello everyone,

As expected, Edison (Cray XC30) is twice faster than Beagle (Cray XE6).
This was expected because NERSC is using the factor 2.0 for Edison and the factor
1.0 for Hopper.

And Beagle is around ~ 10 X faster than BGQ for the same number of nodes.



Some unordered timers
==================

boisvert at edison12:/project/projectdirs/m1523/Jobs> grep TIMER spate-iowa-continuous-corn-soil-2.*
spate-iowa-continuous-corn-soil-2.00253.txt:TIMER [Build assembly graph / Distribute vertices] 2 minutes, 0.706993 secondscore_manager/1021181 dies
spate-iowa-continuous-corn-soil-2.00253.txt:TIMER [Build assembly graph / Distribute arcs] 4 minutes, 30.089874 seconds
spate-iowa-continuous-corn-soil-2.00253.txt:TIMER [Build assembly graph] 6 minutes, 30.796875 seconds
spate-iowa-continuous-corn-soil-2.00255.txt:TIMER [Load input / Count input data] 21.429867 seconds
spate-iowa-continuous-corn-soil-2.00255.txt:TIMER [Load input / Distribute input data] 25.910631 seconds
spate-iowa-continuous-corn-soil-2.00255.txt:TIMER [Load input] 47.340496 seconds
Fichier binaire spate-iowa-continuous-corn-soil-2.spate concordant

The file system is much faster than that of Beagle ( ~ 8X faster)..
Graph is generated in just 6min30s. This is very fast.

The load during "Distribute vertices" looks like this:
thorium_worker_pool: node/253 EPOCH LOAD 150 s 15.71/22 (0.71) 0.72 0.71 0.70 0.73 0.71 0.64 0.74 0.71 0.73 0.70 0.71 0.71 0.71 0.71 0.66 0.73 0.75 0.72 0.73 0.72 0.73 0.73

The load during "Distribute arcs" looks like this:
thorium_worker_pool: node/253 EPOCH LOAD 410 s 13.28/22 (0.60) 0.58 0.62 0.61 0.61 0.58 0.61 0.61 0.61 0.59 0.60 0.59 0.60 0.61 0.59 0.65 0.61 0.60 0.60 0.60 0.61 0.58 0.61


Memory usage
============


thorium_node: node/250 METRICS AliveActorCount: 2245 ByteCount: 18765324288 / 67657900032

Heap usage per node is very low too: 18 GiB / 64 GiB. This is because Linux uses a sparse memory model, copy-on-write zero pages, 4K pages,
and a sane model for memory pressure. CNK seems to have none of these features in comparison.


Messaging system health check
=========================

The messaging system looks very healthy too:

1280 s
thorium_node: node/250 MESSAGES Tick: 1567752844  ReceivedMessageCount: 279612987 SentMessageCount: 277693878 BufferedInboundMessageCount: 0 BufferedOutboundMessageCount: 790 ActiveRequestCo

1290 s
thorium_node: node/250 MESSAGES Tick: 1567909882  ReceivedMessageCount: 282757208 SentMessageCount: 280834514 BufferedInboundMessageCount: 0 BufferedOutboundMessageCount: 14 ActiveRequestCou
nt: 22

That's 314422.1 messages / s for each node, or around 80 M messages / s for the whole job (256 nodes).
The multiplexer is clearly working hard !



Graph traversal velocity (small messages, the peer-to-peer multiplexer is utilized a lot !)
==================================================================


In the graph traversal, the load is at 9% (it is at 12% on BGQ at this step):
thorium_worker_pool: node/250 EPOCH LOAD 1290 s 1.90/22 (0.09) 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.08 0.08 0.09 0.09 0.09


Timelines are basically empty since everything is waiting for data.

thorium_worker_pool: node/250 EPOCH FUTURE_TIMELINE 1290 s  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
thorium_worker_pool: node/250 EPOCH WAKE_UP_COUNT 1290 s  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0



I am getting a sustained velocity of 10 vertices / s per actor.

biosal_unitig_visitor/1118714 visited 8500 vertices so far (velocity: 9.953161 vertices / s)
biosal_unitig_visitor/1256698 visited 8500 vertices so far (velocity: 9.964830 vertices / s)
biosal_unitig_visitor/1111034 visited 8500 vertices so far (velocity: 9.953161 vertices / s)
biosal_unitig_visitor/1124090 visited 8500 vertices so far (velocity: 9.953161 vertices / s)

spate-iowa-continuous-corn-soil-2.00252.txt:DEBUG the system has 563200 visitors

Across the board, the throughput is 5603840.0 vertices / s.


I can compute an expected running time:

boisvert at edison12:/project/projectdirs/m1523/Jobs> grep GRAPH spate-iowa-continuous-corn-soil-2.*.txt
spate-iowa-continuous-corn-soil-2.00253.txt:GRAPH ->  148375705714 vertices, 298256036296 vertex observations, and 146235667225 arcs.

So the number of canonical DNA bits with at least a sequencing coverage depth of 2 is:

$ irb
irb(main):001:0> 148375705714 / 2 - 56665010890
=> 17522841967


boisvert at edison12:/project/projectdirs/m1523/Jobs> head spate-iowa-continuous-corn-soil-2/coverage_distribution.txt-canonical  -n 4
1 56665010890
2 7970399985
3 3453812029
4 1787385290

It should run in under 52 minutes for the graph traversal.

irb(main):006:0> 17522841967 / 5603840 / 60
=> 52



Comparison with others
==================

With MegaHIT  ( http://arxiv.org/pdf/1409.7208v1.pdf )

MEGAHIT, GPU...                                   44.1 hours
MEGAHIT, CPU only...                           99.6 hours
SPATE with 256 Cray XC30 nodes...   probably < 2 hours  (spate-iowa-continuous-corn-soil-2, see https://github.com/GeneAssembly/biosal/issues/822 )


Actors, actors, actors
================

The complex stuff in BioSAL is definitely Thorium, not the actor code. Actor scripts are easy to write and understand.
And the scope of an actor is very small too.

On the other hand, Thorium has more complex code paths -- it is a runtime for actors.


More information about the BIOSAL mailing list