External Monitoring and Alerting

Thu Mar 4 13:56:36 EST 2010

On Thu, Mar 4, 2010 at 9:40 AM, Alan Johnson <alan at datdec.com> wrote:
> On Thu, Mar 4, 2010 at 8:52 AM, Kenny Lussier <klussier at gmail.com> wrote:
>>
>> I have looked at Keynote/RedAlert, Gomez, and a few other third parties.
>> However, I can't help but think that I am better off doing it myself. My
>> thought was to get virtual servers from various hosting companies (Linode,
>> Vereo, GoDaddy, etc.) so that they are geographically and network
>> diversified, and deploy something like ZenOSS, Zabbix, or GroundworkIT on
>> each to do the testing and centralize the reporting. Does anyone have any
>> thoghts on this? Has anyone done it before (I'm sure someone has)?
>> Discussion anybody?
>
> I expect 3 VMs to cost you more than a third party service for basic HTTP
> uptime based on pattern matches.  We use alertsite.com for such things, but
> the cost depends on the number of servers you want to monitor and the
> complexity of the monitors.  We just have alertsite check a single URL for
> a single pattern, and that is pretty cheap (<$20/m? maybe?).  It quickly
> gets more expense as you add URLs, but it does 5 minute frequency from 3
> location with configurable excalation and black outs for expected down
> time.  I get emailed for any hiccup from any location and paged if all 3
> fail a couple of times in a row.  They also do automatic traceroutes on a
> failure and email daily uptime and response time stats.  You can run adhoc
> reports on their site with a good amount of flexibility.
>
> Probably they do a lot more, but this is just how we have it setup.

HTTPS POST is the method that we need to use to test our systems
availability. However, what we are testing is more than just web site
availability or performance. It would actually be testing into an
application, gauging response times and response content. We also need
the ability to identify the IP addresses that the tests are coming
from for security reasons.

Also, I have noticed that everyone seems to offer either a 15-minute
or a 5-minute test interval. Is that really the most that is needed? I
would think that a higher frequency would be better, seeing as how 5
minutes is beyond the "five 9's"  uptime that everyone strives for.
With a home-grown system on VMs, you could test every 30 seconds or
so.

Thanks,
Kenny