[BIOSAL] First job on Edison

Tue Nov 25 17:14:59 CST 2014

Thanks for sharing these results with us! It looks encouraging at first
glance.

By the way, is this a system that I'll be able to try out at some point?

Best,
George

George K. Thiruvathukal, PhD
*Professor of Computer Science*, Loyola University Chicago
*Director*, Center for Textual Studies and Digital Humanities
*Guest Faculty*, Argonne National Laboratory, Math and Computer Science
Division
Editor in Chief, Computing in Science and Engineering
<http://www.computer.org/portal/web/computingnow/cise> (IEEE CS/AIP)
(w) thiruvathukal.com (v) 773.829.4872

On Mon, Nov 24, 2014 at 11:43 AM, Boisvert, Sebastien <boisvert at anl.gov>
wrote:

> Hello everyone,
>
> As expected, Edison (Cray XC30) is twice faster than Beagle (Cray XE6).
> This was expected because NERSC is using the factor 2.0 for Edison and the
> factor
> 1.0 for Hopper.
>
> And Beagle is around ~ 10 X faster than BGQ for the same number of nodes.
>
>
>
> Some unordered timers
> ==================
>
> boisvert at edison12:/project/projectdirs/m1523/Jobs> grep TIMER
> spate-iowa-continuous-corn-soil-2.*
> spate-iowa-continuous-corn-soil-2.00253.txt:TIMER [Build assembly graph /
> Distribute vertices] 2 minutes, 0.706993 secondscore_manager/1021181 dies
> spate-iowa-continuous-corn-soil-2.00253.txt:TIMER [Build assembly graph /
> Distribute arcs] 4 minutes, 30.089874 seconds
> spate-iowa-continuous-corn-soil-2.00253.txt:TIMER [Build assembly graph] 6
> minutes, 30.796875 seconds
> spate-iowa-continuous-corn-soil-2.00255.txt:TIMER [Load input / Count
> input data] 21.429867 seconds
> spate-iowa-continuous-corn-soil-2.00255.txt:TIMER [Load input / Distribute
> input data] 25.910631 seconds
> spate-iowa-continuous-corn-soil-2.00255.txt:TIMER [Load input] 47.340496
> seconds
> Fichier binaire spate-iowa-continuous-corn-soil-2.spate concordant
>
> The file system is much faster than that of Beagle ( ~ 8X faster)..
> Graph is generated in just 6min30s. This is very fast.
>
> The load during "Distribute vertices" looks like this:
> thorium_worker_pool: node/253 EPOCH LOAD 150 s 15.71/22 (0.71) 0.72 0.71
> 0.70 0.73 0.71 0.64 0.74 0.71 0.73 0.70 0.71 0.71 0.71 0.71 0.66 0.73 0.75
> 0.72 0.73 0.72 0.73 0.73
>
> The load during "Distribute arcs" looks like this:
> thorium_worker_pool: node/253 EPOCH LOAD 410 s 13.28/22 (0.60) 0.58 0.62
> 0.61 0.61 0.58 0.61 0.61 0.61 0.59 0.60 0.59 0.60 0.61 0.59 0.65 0.61 0.60
> 0.60 0.60 0.61 0.58 0.61
>
>
> Memory usage
> ============
>
>
> thorium_node: node/250 METRICS AliveActorCount: 2245 ByteCount:
> 18765324288 / 67657900032
>
> Heap usage per node is very low too: 18 GiB / 64 GiB. This is because
> Linux uses a sparse memory model, copy-on-write zero pages, 4K pages,
> and a sane model for memory pressure. CNK seems to have none of these
> features in comparison.
>
>
> Messaging system health check
> =========================
>
> The messaging system looks very healthy too:
>
> 1280 s
> thorium_node: node/250 MESSAGES Tick: 1567752844  ReceivedMessageCount:
> 279612987 SentMessageCount: 277693878 BufferedInboundMessageCount: 0
> BufferedOutboundMessageCount: 790 ActiveRequestCo
>
> 1290 s
> thorium_node: node/250 MESSAGES Tick: 1567909882  ReceivedMessageCount:
> 282757208 SentMessageCount: 280834514 BufferedInboundMessageCount: 0
> BufferedOutboundMessageCount: 14 ActiveRequestCou
> nt: 22
>
> That's 314422.1 messages / s for each node, or around 80 M messages / s
> for the whole job (256 nodes).
> The multiplexer is clearly working hard !
>
>
>
> Graph traversal velocity (small messages, the peer-to-peer multiplexer is
> utilized a lot !)
> ==================================================================
>
>
> In the graph traversal, the load is at 9% (it is at 12% on BGQ at this
> step):
> thorium_worker_pool: node/250 EPOCH LOAD 1290 s 1.90/22 (0.09) 0.09 0.09
> 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09
> 0.08 0.08 0.09 0.09 0.09
>
>
> Timelines are basically empty since everything is waiting for data.
>
> thorium_worker_pool: node/250 EPOCH FUTURE_TIMELINE 1290 s  0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> thorium_worker_pool: node/250 EPOCH WAKE_UP_COUNT 1290 s  0 0 0 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
>
>
> I am getting a sustained velocity of 10 vertices / s per actor.
>
> biosal_unitig_visitor/1118714 visited 8500 vertices so far (velocity:
> 9.953161 vertices / s)
> biosal_unitig_visitor/1256698 visited 8500 vertices so far (velocity:
> 9.964830 vertices / s)
> biosal_unitig_visitor/1111034 visited 8500 vertices so far (velocity:
> 9.953161 vertices / s)
> biosal_unitig_visitor/1124090 visited 8500 vertices so far (velocity:
> 9.953161 vertices / s)
>
> spate-iowa-continuous-corn-soil-2.00252.txt:DEBUG the system has 563200
> visitors
>
> Across the board, the throughput is 5603840.0 vertices / s.
>
>
> I can compute an expected running time:
>
> boisvert at edison12:/project/projectdirs/m1523/Jobs> grep GRAPH
> spate-iowa-continuous-corn-soil-2.*.txt
> spate-iowa-continuous-corn-soil-2.00253.txt:GRAPH ->  148375705714
> vertices, 298256036296 vertex observations, and 146235667225 arcs.
>
> So the number of canonical DNA bits with at least a sequencing coverage
> depth of 2 is:
>
> $ irb
> irb(main):001:0> 148375705714 / 2 - 56665010890
> => 17522841967
>
>
> boisvert at edison12:/project/projectdirs/m1523/Jobs> head
> spate-iowa-continuous-corn-soil-2/coverage_distribution.txt-canonical  -n 4
> 1 56665010890
> 2 7970399985
> 3 3453812029
> 4 1787385290
>
> It should run in under 52 minutes for the graph traversal.
>
> irb(main):006:0> 17522841967 / 5603840 / 60
> => 52
>
>
>
> Comparison with others
> ==================
>
> With MegaHIT  ( http://arxiv.org/pdf/1409.7208v1.pdf )
>
> MEGAHIT, GPU...                                   44.1 hours
> MEGAHIT, CPU only...                           99.6 hours
> SPATE with 256 Cray XC30 nodes...   probably < 2 hours
> (spate-iowa-continuous-corn-soil-2, see
> https://github.com/GeneAssembly/biosal/issues/822 )
>
>
> Actors, actors, actors
> ================
>
> The complex stuff in BioSAL is definitely Thorium, not the actor code.
> Actor scripts are easy to write and understand.
> And the scope of an actor is very small too.
>
> On the other hand, Thorium has more complex code paths -- it is a runtime
> for actors.
> _______________________________________________
> BIOSAL mailing list
> BIOSAL at lists.cels.anl.gov
> https://lists.cels.anl.gov/mailman/listinfo/biosal
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cels.anl.gov/pipermail/biosal/attachments/20141125/74f4e408/attachment.html>