Backing up a little - Trying to get LAPACK to work...
Bruce Labitt
bruce.labitt at myfairpoint.net
Tue May 25 22:07:24 EDT 2010
Jerry Feldman wrote:
> On 05/25/2010 09:03 PM, Bruce Labitt wrote:
>
>> Umm, my CLAPACK experiment is not doing so well. (Reference Shot in the
>> Dark Thread) So I thought I'd try to interface to the "industry
>> standard" LAPACK. In the end, I expect to use CLAPACK, but I thought
>> since MATLAB, GSL, SciPy, et. al use LAPACK, perhaps I could at least
>> get some real work(TM) (coding) done.
>>
>> Fundamentally, the LAPACK results are the same as in CLAPACK, I suppose
>> that is good in a way. I rewrote everything in C using the accumulated
>> knowledge I've gained. Nearly everything is on the heap. mallocs and
>> frees where they belong. When the 2x2 example is run, it works.
>> Valgrind declared no leaks, no problems.
>>
>> When the 9x9 example is run, it segfaults. The program architecture is
>> test1.c, svd.c. test1 is "main". svd.c is a wrapper function that
>> actually calls the FORTRAN subroutine zgesvd_. The segfault occurs when
>> returning from svd.c, not returning from the FORTRAN subroutine.
>>
>> Valgrind reports that the routine does not know where to return to,
>> i.e., the return address is 0x? From what I've been told this is
>> indicative of a stack error (overrun).
>>
>> If instead of using zgesvd_, I put in a dummy set of operation which
>> actually write to all of the output matrices and then returns from
>> svd.c, the program runs with no "error" for the 9x9 case. I did this
>> experiment to see if I was doing something wrong.
>>
>> Next I tried compiling with the -fstack-protector-all switch. If I
>> removed the dummy operations (put back to "normal") and ran the 9x9
>> case, zgesvd_ gave results (reported INFO=0) which indicated success.
>> The svd.c routine returned to main (test1.c) and printed out an entirely
>> optimistic success message ;). However, on the next instruction, which
>> accesses the output arrays, the system segfaulted with a similar 0x?
>> error. In other words the main program can no longer access the arrays
>> which it had malloc'ed (and had not yet freed).
>>
>> If I am interpreting this correctly then it seems there is a stack error
>> of some sort in my compiled version of LAPACK. Or? <smart people fill
>> in the blank here, please!>
>>
>> Does anyone have an idea?
>> One thing that I can try is to use the "reference" LAPACK in my system
>> and link to it. That way I can hopefully take out the effect of my build.
>> Any other suggestions?
>>
>> Jeesh, this was supposed to be 'just' a port... :-[
>>
>
> What you describe is indicative of stack corruption. If the pointers
> that used to have the malloc'd addresses are null then either you failed
> to check the results of malloc, or something wrote into your local
> stack. This can happen when something goes beyond the boundaries of
> arrays. This is another thing that Purify does for you.
>
>
>
OK, I thought it seemed like stack corruption, too.
I know I didn't check the results of malloc directly - however, I made
it a point to write to the arrays to initialize them to a known value.
If I had a bad malloc, wouldn't the program have died during that
initialization? It was one of those arrays whose pointer got hosed.
Purify & Insure++ are looking pretty good right now...
More information about the gnhlug-discuss
mailing list