server uptime

Wed Mar 19 20:23:14 EDT 2008

On Wed, Mar 19, 2008 at 5:01 PM, Warren Luebkeman <warren at resara.com> wrote:
> I'm impressed that a system could run for two years straight without failing ...

  Ah.  Well... that gets old after awhile.  :)

  At the extreme end of the scale, old school IBM mainframe systems
can measure service availability in decades.   When everything is done
via batch transaction processing in virtual machines, it's easy to run
redundant systems in geographically separate areas.  If a nuclear bomb
vaporizes the East coast data center, the pending transactions get
committed at the West coast data center instead.

  Back in the days of NetWare 3.12, the only time I had servers go
down was on hardware or power failure.  With the right equipment (UPS,
generator, and good server hardware), uptimes measured in the hundreds
or thousands of days were the norm.  NetWare 3.x didn't do very much
-- LAN file and print were really about it -- so it wasn't hard to
keep it stable.  And there wasn't anything like a public IPX network
(in contrast to the public IP network we call "the Internet"), so
exploitation of software flaws was rarer (inside jobs usually find
easier attacks).

  Even an unpatched Windows NT 4.0 box can stay up for years.  Don't
connect it to a network, or install any software, or log-in, or use
it.  Strangely enough, nobody is very impressed by that scenario.  ;-)

  At work, we've only got batteries for about 10-15 minutes of
runtime, and we average about one prolonged power outage a year, so
that limits our uptime.  (During a prolonged outage, they send
everybody home, so buying more batteries wouldn't pay off.)

  At home, my Linux desktop PC rarely gets more than a few weeks of
uptime.  I don't have it on a UPS, I like to experiment with different
distros (lots of reboots for that), and occasionally I play Wintendo.
In fact, my Windows XP PC at work probably has better uptime numbers
than my Linux PC at home (I've got a UPS at work).

  At work, our Linux servers could probably go for years, if it wasn't
for power failures and kernel security holes.  But even the Windows
servers usually get at least a few months of uptime, before some
update we need to install also needs a reboot to get installed.
Windows has a lot of stuff that can't be updated without rebooting the
whole system.  On Linux, similar updates just mean doing things like
restarting Samba.  But in both cases, the service is unavailable
during the update, so the distinction is largely academic.  It still
means the users can't use the server, and so it still means I'm there
after hours to do the update.  I don't really care if the uptime
counter gets reset or not.  Linux is easier, and cheaper, but it's
convenience more than life-changing.

  And let's not forget that Linux isn't immune to restart-the-world
issues, either.  For example, on a Linux server, if you update glibc
to patch a security bug, you pretty much need to restart *everything*.

  Sorry to be a stick-in-the-mud, but I've been doing IT for 15+
years.  There are people on this list who have been doing IT for
longer than I've been alive.  After awhile, you start seeing the big
picture.

-- Ben