Classic running out of memory... huh? what? <long>

bruce.labitt at autoliv.com bruce.labitt at autoliv.com
Thu Jun 11 11:54:43 EDT 2009


I have a simulation program I have written in C (a little C++ is in there 
too) that computes ambiguity functions.  When I run large sims, the system 
runs out of memory.  The platform is an IBM QS22 blade running 
YellowDogLinux6.1-64 (RH-like, for PPCs) with 32GB RAM.  The QS22 has two 
enhanced Cell processors, similar to what is in a PS3.

To attempt to avoid running out of memory, after the stimulus file is read 
(which contains the waveform(s) in question), the file length is checked, 
as well as other key parameters.  This sequence length is compared with 
the maximum allowed memory chunk that I have allocated.  (Actually I read 
the very beginning of the file to check these parameters.)  If the 
sequence is longer than the chunk, the program breaks up the problem into 
chunks and processes each chunk and saves the result to file(s).  Later, 
the progarm attempts to stitch things back together.

Believe it or not - the stitching works.  I struggled with long longs to 
get it to work.

Anyways, the program seems to run out of memory after processing many 
blocks.  So either there is a memory leak, or something else going on. 

Any suggestions?

Curiously, if I allocate more memory but I constrain the problem to fit in 
RAM (i.e. run a smaller problem), the program always runs to completion. 

Example:
Prob.           Mem alloc.      Runs to completion?             Entirely 
Fits    Dies in x hrs
8Mx200  18GB            y                                       y  
8Mx200  12GB            y                                       n
80Mx200 18GB            n                                       n       7
80Mx200 14GB            n                                       n       ~5

In both cases, if I use "free" I see that free memory is all the way down 
to 160MB during the file write.  This seems absurdly low somehow ;)

# free -m -t
                        tot     used    free    shared  buffers cached
                        31750   31573   172     0               7 10528
+/- buffers/cache:      21041   10708
Swap:                   0       0       0
Total:          31750   31577   172

The problem with testing this is that the larger run is estimated to take 
10-12 hours! 

Any good memory tracking tools?  I have used valgrind but not gained much 
insight.  Must be operator error...

I notice that there is no swap listed.  Umm, how does one add swap to a 
nfs based system?

Any suggestions appreciated.  (Besides "Sir, step away from the console. 
Sir?")

-Bruce


******************************
Neither the footer nor anything else in this E-mail is intended to or constitutes an <br>electronic signature and/or legally binding agreement in the absence of an <br>express statement or Autoliv policy and/or procedure to the contrary.<br>This E-mail and any attachments hereto are Autoliv property and may contain legally <br>privileged, confidential and/or proprietary information.<br>The recipient of this E-mail is prohibited from distributing, copying, forwarding or in any way <br>disseminating any material contained within this E-mail without prior written <br>permission from the author. If you receive this E-mail in error, please <br>immediately notify the author and delete this E-mail.  Autoliv disclaims all <br>responsibility and liability for the consequences of any person who fails to <br>abide by the terms herein. <br>
******************************



More information about the gnhlug-discuss mailing list