[BIOSAL] Memory fragmentation on IBM Blue Gene/Q (Cetus or Mira)

Boisvert, Sebastien boisvert at anl.gov
Sat Nov 15 20:58:24 CST 2014


I am seeing a lot of memory fragmentation on Blue Gene/Q.

A quote from a website:

"Unfortunately, the CNK memory administrator is very basic and is not capable of handling the phenomenon of memory fragmentation."
    source: http://www.idris.fr/eng/turing/turing-fragmentation_memoire-eng.html

I am modifying the biosal code across the board to use memory pools and also to use
free lists inside the memory pool. The biosal code is already making heavy use of memory pools.
In the code, each memory pool instance has a API key associated to it.

So far, I reduced calls to malloc by 70%. The 2 main new tricks are:

- use free lists to save freed pointers without any allocations;
- store sizes of tracked pointers before pointers (this trick is used by the glibc too).


irb(main):002:0> (39965.0 - 129995) / 129995 * 100
=> -69.25650986576407

Yesterday:
[boisvert at bigmem biosal]$ grep memory_allocate log|awk '{print $10}'|sort|wc -l
129995
[boisvert at bigmem biosal]$ grep memory_allocate log|awk '{print $10}'|sort|uniq -c|sort -r -n|head
98142 0xc170626e
15292 0x9739a8fa
3824 0x146f7d15
3317 0x2d9d481
3225 0x37ddf367
2800 0x185945f7
1008 0x89e9235d
803 0x46d316e4
280 0x1d4f2792
224 0x78e238cd

Now:
[boisvert at bigmem biosal]$ grep memory_allocate log|awk '{print $10}'|sort|wc -l
39965
[boisvert at bigmem biosal]$ grep memory_allocate log|awk '{print $10}'|sort|uniq -c|sort -r -n|head
  22062 0xc170626e
   4622 0x146f7d15
   3315 0x2d9d481
   3223 0x37ddf367
   2799 0x185945f7
   1008 0x89e9235d
    766 0x46d316e4
    588 0x1d4f2792
    419 0x9739a8fa
    395 0x2ee1c5a6
[boisvert at bigmem biosal]$ grep 0xc170626e log|grep allocate | awk '{print $3}'|sort -n|uniq -c|sort -r -n|head
   9192 8
   4340 32
   3628 2048
   2294 64
   1152 4
    389 512
    368 128
    278 32768
    193 16
     72 256


More information about the BIOSAL mailing list