Backing up a little - Trying to get LAPACK to work...
Bruce Labitt
bruce.labitt at myfairpoint.net
Tue May 25 21:03:13 EDT 2010
Umm, my CLAPACK experiment is not doing so well. (Reference Shot in the
Dark Thread) So I thought I'd try to interface to the "industry
standard" LAPACK. In the end, I expect to use CLAPACK, but I thought
since MATLAB, GSL, SciPy, et. al use LAPACK, perhaps I could at least
get some real work(TM) (coding) done.
Fundamentally, the LAPACK results are the same as in CLAPACK, I suppose
that is good in a way. I rewrote everything in C using the accumulated
knowledge I've gained. Nearly everything is on the heap. mallocs and
frees where they belong. When the 2x2 example is run, it works.
Valgrind declared no leaks, no problems.
When the 9x9 example is run, it segfaults. The program architecture is
test1.c, svd.c. test1 is "main". svd.c is a wrapper function that
actually calls the FORTRAN subroutine zgesvd_. The segfault occurs when
returning from svd.c, not returning from the FORTRAN subroutine.
Valgrind reports that the routine does not know where to return to,
i.e., the return address is 0x? From what I've been told this is
indicative of a stack error (overrun).
If instead of using zgesvd_, I put in a dummy set of operation which
actually write to all of the output matrices and then returns from
svd.c, the program runs with no "error" for the 9x9 case. I did this
experiment to see if I was doing something wrong.
Next I tried compiling with the -fstack-protector-all switch. If I
removed the dummy operations (put back to "normal") and ran the 9x9
case, zgesvd_ gave results (reported INFO=0) which indicated success.
The svd.c routine returned to main (test1.c) and printed out an entirely
optimistic success message ;). However, on the next instruction, which
accesses the output arrays, the system segfaulted with a similar 0x?
error. In other words the main program can no longer access the arrays
which it had malloc'ed (and had not yet freed).
If I am interpreting this correctly then it seems there is a stack error
of some sort in my compiled version of LAPACK. Or? <smart people fill
in the blank here, please!>
Does anyone have an idea?
One thing that I can try is to use the "reference" LAPACK in my system
and link to it. That way I can hopefully take out the effect of my build.
Any other suggestions?
Jeesh, this was supposed to be 'just' a port... :-[
-Bruce
More information about the gnhlug-discuss
mailing list