shell, perl, performance, parallelism, profiling, etc.

Wed Oct 22 11:34:58 EDT 2008

On Wed, Oct 22, 2008 at 10:12 AM, Jerry Feldman <gaf at blu.org> wrote:
> The original reason for the sticky bit is because some Unix commands are so
> frequently used and small that keeping them in memory significantly improves
> performance, especially in a multi-user system.

  Plus (from what I've been told), older Unix systems didn't always
have very sophisticated caching subsystems, and/or enough RAM to make
use of such.  So the sticky bit was a way to manually tell a system to
never unload an executable image.  Linux effectively makes that
decision automatically.

> I believe from my previous research on Linux virtual memory that commands
> will remain in virtual memory long enough to where the sticky bit is not
> needed.

  That correlates with what I read (long ago): Linux is entirely
demand-paged, and does not implicitly commit swap space.  The
executable file is mapped into virtual memory first.  Then the process
starts, and immediately triggers a page fault to load the code.  Pages
are not allocated in the swap file until the system needs to free up
RAM for more stuff.  That's one of the reasons the kernel
out-of-memory algorithms are of such interest.  (Again, I'm just
repeating received wisdom; this could be wrong.)

> Perl, on the other hand may need to be loaded, causing a Perl script\
> to appear to be slower than a shell script.

  I ran did several test runs in succession.  The goal is to get
everything cached in RAM for these trials.  (Artificial, to be sure,
but it's harder to normalize the effects of disk I/O.)  The first run
in a series was typically different.  I threw those results out.  The
results I have been reporting are the typical case, after everything
is cached.  It seems to be stable after that first run.  (Within +/-
0.1 seconds "real".)

> Additionally, Perl compiles the script first.

  Hmmm, that's an interesting point.  Still, once compiled, Perl
scripts are supposed to run faster.  bash has to parse and interpret
everything as it goes.  In this case, the shell variant doesn't have
any looping, so I would expect it to be a wash.

  If there was any kind of loop in the shell script, I would expect
the Perl variant to be much faster at the task.

  Hmmm, I suppose another experiment would be to turn the shell
variant into a Perl script, without using any Perl constructs beyond
what the shell variant does.  I'll try that next.

> Another was that you can look at performance is to use a C shell script on
> a system where the C Shell is not used as a login shell, so it is
> generally not resident.

  csh/tcsh would be resident after the first run.  :-)

-- Ben