server uptime

Wed Mar 19 17:01:35 EDT 2008

There is no question continuity of service is more important than "uptime" alone.  I guess I'm just being a rube, addmittantly so, because I'm impressed that a system could run for two years straight without failing, notwithstanding the "big picture" of service availability.  

I guess my only point is, I just think its cool...  

----- Original Message -----
From: "David J Berube" <djberube at berubeconsulting.com>
To: "GNHLUG mailing list" <gnhlug-discuss at mail.gnhlug.org>
Sent: Wednesday, March 19, 2008 4:46:24 PM (GMT-0500) America/New_York
Subject: Re: server uptime

Got to agree with Ben here. While it's bad if a server can't go 24 hours 
due to an OS-level problem, it's also inaccurate to say that a long 
uptime implies high service availability. This is doubly so if you are 
hosting software: not only does your service need to be available, but 
it needs to respond to changing business demands and other technical 
issues - including OS and application level security threats - and you 
need to be able to change it respond quickly. If you cannot do that, 
then you have a technical failure resulting in what is effectively 
downtime for your service: if your users can't use your service in a way 
that works for them, then you have an outage.

There are other issues as well, of course: one of my clients recently 
had severe trouble with upstream providers of bandwidth; on a 100mbps 
connection, we were getting under 1mbps throughput. While this wasn't a 
hardware problem, and it wasn't a software problem, and it wasn't even a 
network problem at the host level, it nonetheless resulted in a 
substandard level of service, which was, in effect, an outage for 
effected users.

In short, uptimes of individual components are not especially relevant; 
if a machine can be occasionally brought down for repair or maintenance 
without resulting in an effective lack of availability for end users, 
then an extremely uptime figure is meaningless - an extremely short 
uptime figure, of course, still has relevance.

If an individual component cannot at any time afford downtime, then the 
problem is not with the component: the problem is with your 
architecture, as all components fail occasionally, and if it is truly 
important that that component never goes down, you need more redundancy, 
which should be sufficient to, again, allow a brief period of 
maintenance for any given component.

Take it easy,

David Berube
Berube Consulting
djberube at berubeconsulting.com
(603)-485-9622
http://www.berubeconsulting.com/

Ben Scott wrote:
> On Wed, Mar 19, 2008 at 1:50 PM, Warren Luebkeman <warren at resara.com> wrote:
>> Our server, running Debian Sarge, which serves our email/web/backups/dns/etc
>> has been running 733 days (two years) without a reboot.
> 
>   You're obviously not installing all your security updates, then.
> Both the 2.4 and 2.6 Debian kernels have had security advisories
> posted within the past two years.
> 
>   In my experience, discussions about uptime typically involve
> approximately the same mentality as a penis-length competition.
> Especially since nobody really cares about what uptime(1) shows --
> it's service level availability that counts.  Who cares if your kernel
> hasn't been restarted but the email service was down for a month, or
> slow, or if your company's data is being harvested by a cracker who
> used some unpatched security holes to break in.
> 
> -- Ben
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
> 
> 
_______________________________________________
gnhlug-discuss mailing list
gnhlug-discuss at mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

-- 
Warren Luebkeman
Founder, COO
Resara LLC
888.357.9195
www.resara.com