Linux, gobs of RAM, RAID and performance suckage...

Thu Nov 30 21:38:17 EST 2006

Paul Lussier writes:
> 
> This is bizarre.
> 
> We've got an NFS server with Dual 3Ghz Xeon CPUs as our NFS server
> connected to a Winchester OpenSAN FC-based RAID array.  The array is a
> single 1TB partition (unfortunately).
> 
> Before yesterday we were noticing lots of NFS drop-outs on the clients
> (300+ of them) and we correllated this pretty much to the backups
> (amanda).  The theory was that local disk I/O was beating out
> nfs-client requests.
> 
> We also noticed that our memory utilization was through the roof.  We
> had 2GB of PC2300, ECC, DDR, Registered RAM.  That RAM was averaging
> the following usage patterns:
> 
>  active     - 532M
>  inactive   - 1.2G
>  unused     -  39M
>  cache      - 1.3G
>  slab cache - 255M
>  swap cache -   6M
>  apps       -  78M
>  buffers    - 350M
>  swap       -  11M
> 
> We were topping out our memory usage and occasionally dipping into
> swap space on disk.
> 
> Yesterday we added 2GB of RAM and our memory utilization now looks like this:
> 
>  active     - 793M
>  inactive   - 2.3G
>  unused     - 213M
>  cache      - 2.9G
>  slab cache - 194M
>  swap cache -   2M
>  apps       -  71M
>  buffers    - 313M
>  swap       - 4.5M
> 
> So, it appears we really only succeeded in doubling the cache
> available to the system, and a little more than halving the amount of
> swap that was getting touched.
> 
> However, now when backups are run, the system becomes completely
> unresponsive from an NFS client perspective, and the load average
> skyrockets (e.g. into the 40s!).
> 
> Does anyone have any ideas ?  I'm at a complete loss on this one.

General nfs server comments:

1)
Make sure you are exporting the nfs shares async otherwise most
operations will seem slow from the clients point of view. Check
/proc/fs/nfs/exports for 'async'  If not there, set it in your
/etc/exports

2)
Make sure you server has enough nfsd threads to handle the client
load.  With 300 clients, you should have at least 100 nfsd threads in
my opinion. Check your init.d scripts for how to set this.

Other stuff:

11MB of swap doesn't mean anything is wrong.  It's actually a
good thing meaning your kernel found 11MB of stuff that wasn't needed
and booted it out to swap space in order to make more room available
for cache.

You should check the disk rates and cpu usage with 'vmstat 5' for a
few minutes.  This will also show how much of the cpus are spending in
wait. The full outout /proc/meminfo may also be useful.

I assume you are using a x86_64 kernel?  Using a 32bit kernel is ok as
long as you dont run out of low memory.  The kernel's heap/slab as
well as filesystem metadata (buffers) must come from low memory while
filesystem data (cache) and userspace processes can be in high memory.

If your SAN is on a dedicated lan to the server only you should
investigate converting that subnet to support jumbo frames.

Since your using Xeons not Opterons you should make sure irqbalance is
installed and running to spread irq load across all cpu cores (this
may not be a good idea on a multi-node Opteron system though)  You can
run top in multi-cpu mode (press '1') to see if any cpus are
overloaded with irq or wait load while others are idle.

-- 
Dave