[BIOSAL] First job on Edison

Boisvert, Sebastien boisvert at anl.gov
Tue Nov 25 17:19:51 CST 2014


> From: George K. Thiruvathukal [gkt at cs.luc.edu]
> Sent: Tuesday, November 25, 2014 5:14 PM
> To: Boisvert, Sebastien
> Cc: Xia, Fangfang; biosal at lists.cels.anl.gov
> Subject: Re: [BIOSAL] First job on Edison
> 
> 
> 
> 
> Thanks for sharing these results with us! It looks encouraging at first glance.
> 
> By the way, is this a system that I'll be able to try out at some point?
> 

Yes.

In fact, you could use Cetus or Mira right now (project CompBIO).


We will submit an ALCC proposal to DOE for time on Edison, and on Mira.

Right now, Beagle is in maintenance.

Mira... IBM Blue Gene/Q (ALCF)
Cetus... IBM Blue Gene/Q (ALCF)
Beagle.. Cray XE6 (UChicago)
Edison... Cray XC30 (NERSC)
Hopper... Cray XE6 (NERSC)

> 
> Best,
> George
> 
> 
> 
> 
> 
> 
> George K. Thiruvathukal, PhD
> 
> Professor of Computer Science, Loyola University Chicago
> 
> Director, Center for Textual Studies and Digital Humanities
> Guest Faculty, Argonne National Laboratory, Math and Computer Science Division
> Editor in Chief, Computing in
>  Science and Engineering (IEEE CS/AIP)
> 
> (w) 
> thiruvathukal.com (v) 773.829.4872
> 
> 
> 
> 
> 
> 
> 
> 
> On Mon, Nov 24, 2014 at 11:43 AM, Boisvert, Sebastien 
> <boisvert at anl.gov> wrote:
> 
> Hello everyone,
> 
> As expected, Edison (Cray XC30) is twice faster than Beagle (Cray XE6).
> This was expected because NERSC is using the factor 2.0 for Edison and the factor
> 1.0 for Hopper.
> 
> And Beagle is around ~ 10 X faster than BGQ for the same number of nodes.
> 
> 
> 
> Some unordered timers
> ==================
> 
> boisvert at edison12:/project/projectdirs/m1523/Jobs> grep TIMER spate-iowa-continuous-corn-soil-2.*
> spate-iowa-continuous-corn-soil-2.00253.txt:TIMER [Build assembly graph / Distribute vertices] 2 minutes, 0.706993 secondscore_manager/1021181 dies
> spate-iowa-continuous-corn-soil-2.00253.txt:TIMER [Build assembly graph / Distribute arcs] 4 minutes, 30.089874 seconds
> spate-iowa-continuous-corn-soil-2.00253.txt:TIMER [Build assembly graph] 6 minutes, 30.796875 seconds
> spate-iowa-continuous-corn-soil-2.00255.txt:TIMER [Load input / Count input data] 21.429867 seconds
> spate-iowa-continuous-corn-soil-2.00255.txt:TIMER [Load input / Distribute input data] 25.910631 seconds
> spate-iowa-continuous-corn-soil-2.00255.txt:TIMER [Load input] 47.340496 seconds
> Fichier binaire spate-iowa-continuous-corn-soil-2.spate concordant
> 
> The file system is much faster than that of Beagle ( ~ 8X faster)..
> Graph is generated in just 6min30s. This is very fast.
> 
> The load during "Distribute vertices" looks like this:
> thorium_worker_pool: node/253 EPOCH LOAD 150 s 15.71/22 (0.71) 0.72 0.71 0.70 0.73 0.71 0.64 0.74 0.71 0.73 0.70 0.71 0.71 0.71 0.71 0.66 0.73 0.75 0.72 0.73 0.72 0.73 0.73
> 
> The load during "Distribute arcs" looks like this:
> thorium_worker_pool: node/253 EPOCH LOAD 410 s 13.28/22 (0.60) 0.58 0.62 0.61 0.61 0.58 0.61 0.61 0.61 0.59 0.60 0.59 0.60 0.61 0.59 0.65 0.61 0.60 0.60 0.60 0.61 0.58 0.61
> 
> 
> Memory usage
> ============
> 
> 
> thorium_node: node/250 METRICS AliveActorCount: 2245 ByteCount: 18765324288 / 67657900032
> 
> Heap usage per node is very low too: 18 GiB / 64 GiB. This is because Linux uses a sparse memory model, copy-on-write zero pages, 4K pages,
> and a sane model for memory pressure. CNK seems to have none of these features in comparison.
> 
> 
> Messaging system health check
> =========================
> 
> The messaging system looks very healthy too:
> 
> 1280 s
> thorium_node: node/250 MESSAGES Tick: 1567752844  ReceivedMessageCount: 279612987 SentMessageCount: 277693878 BufferedInboundMessageCount: 0 BufferedOutboundMessageCount: 790 ActiveRequestCo
> 
> 1290 s
> thorium_node: node/250 MESSAGES Tick: 1567909882  ReceivedMessageCount: 282757208 SentMessageCount: 280834514 BufferedInboundMessageCount: 0 BufferedOutboundMessageCount: 14 ActiveRequestCou
> nt: 22
> 
> That's 314422.1 messages / s for each node, or around 80 M messages / s for the whole job (256 nodes).
> The multiplexer is clearly working hard !
> 
> 
> 
> Graph traversal velocity (small messages, the peer-to-peer multiplexer is utilized a lot !)
> ==================================================================
> 
> 
> In the graph traversal, the load is at 9% (it is at 12% on BGQ at this step):
> thorium_worker_pool: node/250 EPOCH LOAD 1290 s 1.90/22 (0.09) 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.09 0.08 0.08 0.09 0.09 0.09
> 
> 
> Timelines are basically empty since everything is waiting for data.
> 
> thorium_worker_pool: node/250 EPOCH FUTURE_TIMELINE 1290 s  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> thorium_worker_pool: node/250 EPOCH WAKE_UP_COUNT 1290 s  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> 
> 
> 
> I am getting a sustained velocity of 10 vertices / s per actor.
> 
> biosal_unitig_visitor/1118714 visited 8500 vertices so far (velocity: 9.953161 vertices / s)
> biosal_unitig_visitor/1256698 visited 8500 vertices so far (velocity: 9.964830 vertices / s)
> biosal_unitig_visitor/1111034 visited 8500 vertices so far (velocity: 9.953161 vertices / s)
> biosal_unitig_visitor/1124090 visited 8500 vertices so far (velocity: 9.953161 vertices / s)
> 
> spate-iowa-continuous-corn-soil-2.00252.txt:DEBUG the system has 563200 visitors
> 
> Across the board, the throughput is 5603840.0 vertices / s.
> 
> 
> I can compute an expected running time:
> 
> boisvert at edison12:/project/projectdirs/m1523/Jobs> grep GRAPH spate-iowa-continuous-corn-soil-2.*.txt
> spate-iowa-continuous-corn-soil-2.00253.txt:GRAPH ->  148375705714 vertices, 298256036296 vertex observations, and 146235667225 arcs.
> 
> So the number of canonical DNA bits with at least a sequencing coverage depth of 2 is:
> 
> $ irb
> irb(main):001:0> 148375705714 / 2 - 56665010890
> => 17522841967
> 
> 
> boisvert at edison12:/project/projectdirs/m1523/Jobs> head spate-iowa-continuous-corn-soil-2/coverage_distribution.txt-canonical  -n 4
> 1 56665010890
> 2 7970399985
> 3 3453812029
> 4 1787385290
> 
> It should run in under 52 minutes for the graph traversal.
> 
> irb(main):006:0> 17522841967 / 5603840 / 60
> => 52
> 
> 
> 
> Comparison with others
> ==================
> 
> With MegaHIT  ( 
> http://arxiv.org/pdf/1409.7208v1.pdf )
> 
> MEGAHIT, GPU...                                   44.1 hours
> MEGAHIT, CPU only...                           99.6 hours
> SPATE with 256 Cray XC30 nodes...   probably < 2 hours  (spate-iowa-continuous-corn-soil-2, see
> https://github.com/GeneAssembly/biosal/issues/822 )
> 
> 
> Actors, actors, actors
> ================
> 
> The complex stuff in BioSAL is definitely Thorium, not the actor code. Actor scripts are easy to write and understand.
> And the scope of an actor is very small too.
> 
> On the other hand, Thorium has more complex code paths -- it is a runtime for actors.
> _______________________________________________
> BIOSAL mailing list
> BIOSAL at lists.cels.anl.gov
> https://lists.cels.anl.gov/mailman/listinfo/biosal
> 
> 
> 
> 


More information about the BIOSAL mailing list