Backing up a little - Trying to get LAPACK to work...

Jerry Feldman gaf at blu.org
Tue May 25 21:14:55 EDT 2010


On 05/25/2010 09:03 PM, Bruce Labitt wrote:
> Umm, my CLAPACK experiment is not doing so well.  (Reference Shot in the 
> Dark Thread)  So I thought I'd try to interface to the "industry 
> standard" LAPACK.  In the end, I expect to use CLAPACK, but I thought 
> since MATLAB, GSL, SciPy, et. al use LAPACK, perhaps I could at least 
> get some real work(TM) (coding) done.
> 
> Fundamentally, the LAPACK results are the same as in CLAPACK, I suppose 
> that is good in a way.  I rewrote everything in C using the accumulated 
> knowledge I've gained.  Nearly everything is on the heap.  mallocs and 
> frees where they belong.  When the 2x2 example is run, it works.  
> Valgrind declared no leaks, no problems. 
> 
> When the 9x9 example is run, it segfaults.  The program architecture is 
> test1.c, svd.c.  test1 is "main".  svd.c is a wrapper function that 
> actually calls the FORTRAN subroutine zgesvd_.  The segfault occurs when 
> returning from svd.c, not returning from the FORTRAN subroutine.
> 
> Valgrind reports that the routine does not know where to return to, 
> i.e., the return address is 0x?  From what I've been told this is 
> indicative of a stack error (overrun).
> 
> If instead of using zgesvd_, I put in a dummy set of operation which 
> actually write to all of the output matrices and then returns from 
> svd.c, the program runs with no "error" for the 9x9 case.  I did this 
> experiment to see if I was doing something wrong.
> 
> Next I tried compiling with the -fstack-protector-all switch.  If I 
> removed the dummy operations (put back to "normal") and ran the 9x9 
> case, zgesvd_ gave results (reported INFO=0) which indicated success.  
> The svd.c routine returned to main (test1.c) and printed out an entirely 
> optimistic success message ;).  However, on the next instruction, which 
> accesses the output arrays, the system segfaulted with a similar 0x? 
> error.  In other words the main program can no longer access the arrays 
> which it had malloc'ed (and had not yet freed).
> 
> If I am interpreting this correctly then it seems there is a stack error 
> of some sort in my compiled version of LAPACK.  Or? <smart people fill 
> in the blank here, please!>
> 
> Does anyone have an idea?
> One thing that I can try is to use the "reference" LAPACK in my system 
> and link to it.  That way I can hopefully take out the effect of my build.
> Any other suggestions?
> 
> Jeesh, this was supposed to be 'just' a port...  :-[

What you describe is indicative of stack corruption. If the pointers
that used to have the malloc'd addresses are null then either you failed
to check the results of malloc, or something wrote into your local
stack. This can happen when something goes beyond the boundaries of
arrays. This is another thing that Purify does for you.


-- 
Jerry Feldman <gaf at blu.org>
Boston Linux and Unix
PGP key id: 537C5846
PGP Key fingerprint: 3D1B 8377 A3C0 A5F2 ECBB  CA3B 4607 4319 537C 5846


More information about the gnhlug-discuss mailing list