centralizing network configuration

Wed May 30 21:06:54 EDT 2007

Bill McGonigle <bill at bfccomputing.com> writes:

> Wow, what a great, thoughtful response.

Thanks.  As you can see, I've been down this road a few times too :)

> oh, neat.  I didn't realize you could use a host name in fixed- 
> address - that helps remove one duplication I have.

Yep, actually, as long as DNS is set up and working correctly,
anyplace in teh dhcp.conf file that takes an IP address, you an
substiture a hostname.  However, as I just discovered the other day,
if there's a hostname in there that doesn't resolve, you can be
screwed.  We had 3 dns server in our 'domain nameservers' directive
being handed out; ns1, ns2, ns3.  All were CNAME pointers to different
hosts.  The host ns3 pointed to died last week and we've run out of
'round to-its' and therefore, not yet re-built the tertiary DNS
server.  I inadvertantly removed the CNAME pointer from the zone
files, which resulted in the dhclient script barfing on every host
when it re-dhcp'ed.  All these hosts ended up with empty
/etc/resolv.conf file.  This made it *really* difficult to do
anything, especially like log into any host to fix them, since we use
hesiod, which is DNS based :)

>> Currently, all the hostnames and MAC addresses are tracked in a
>> database and the DHCP config files are dynamically generated from a
>> script which deals with the revision control aspect of the config
>> files, and starts/stops the DHCP server.
>
> That's a good idea.  I *love* 'live documentation' because if you
> break your documentation your stuff stops working.  This encourages
> good documentation!
>
>>  - there's always (My,Postgre)SQL and a bunch of p(erl,ython) or
>> bash glue.
>
> I love those kind of apps - when the system is done booting.

Well, the generated files are also kept under rev control, so if there
were a bootstrapping problem, we can fall back to those.

> Hmm, I have to think about this some more, but the idea intrigues  me.
> It might be too confusing for the human users, though - not sure.

> I basically see the same problem over and over again - a small
> business need a 'network manager'.  It needs to handle DHCP, DNS, and
> various network services (file sharing, mail, web site, webmail,
> network monitoring, etc.).  The management of it is usually done by
> somebody onsite who can understand what DHCP and DNS do, but isn't
> really going to edit the zone files by hand.  Webmin is sometimes a
> little too much, and tools like that often don't understand DNS views
> (where www.example.com gets one answer from the LAN or DMZ and a
> different answer from everywhere else).

It's easy enough to build an interface on top of it though.  It's not
uncommon to have the primary method of interaction between human and
data be via script, or some other automated means, with manually
editing the data as the fallback approach for when things go wrong.

If those who are "on site" are not overly well versed in things like
DNS or DHCP, then this makes perfect sense.  Tell them that when the
add a new host, user, or whatever, "do this, this, that, and this
other thing, and you're done.  Call me if something doesn't work as
expected."

> Your idea of coming out of a database I think is touching on the
> right concept - that database describes 'our network' and things like
> dhcpd.conf and zone files are really just expressions of that
> information in the format a particular program likes for input.  It
> brings up all kinds of thoughts on how to store the data, how to
> interact with that data, multiple levels of users, how much can be
> auto-discovered, etc.

The auto-discovery aspect is a very interesting angle.  Each host is
capable of telling you a tremendous amount about itself, if only you
had a means to access that introspection.  We make extensive use of
'athinfo', which happens to be a very extensible means of tapping a
host introspective ability.  athinfo is a client/server means of
poking at a system.  The server is triggered via (x)inetd and can run
anything on a host you want as root with the caveat that the
script/command it runs may not be passed any arguments.  It was
designed to be a secure means of accessing system information quickly,
and comes out of MIT's Athena project (which also produced Hesiod and
Kerberos).

One of the thins we do is periodically scan all the hosts we know
about and check that the MAC addresses or hard drives they report
(which are canonical) are the ones we have in the database.  If we
note a difference, we update the database and make a note of it.  We
can then (in theory) do things like track hard drives as the migrate
from one system to another and ask questions like, have we seen this
hard drive report failures in the past, if so, how many other systems
has it been in, what types of failures did it report, etc.

Obviously this isn't what you need, rather, it's just one of those
things that's possible provided as an example.

> One thing I do know is that I never want to use a binary database
> store as my primary source of data - I've seen far too much
> corruption when people think they can pull this off (Mac, Windows).
> Oh, sure, a corruption proof-database would be fine (har, har, har).

We're using PostgreSQL.  Mostly because we track a lot more dynamic
data than just basic network configurations (e.g. drive failure
statistics across 400+ systems each with between 1 and 4 drives).
Though that is, in a sense, our "primary, authoritative" source for
network stuff, we very frequently (i.e. nightly) dump the database and
back it up.  We also have all the derived config files under revision
control.

Our system is also very far from perfect, and we've had some failures
in various ways, but they're known weaknesses, and we'll eventually
fix them.  All that being said, I can't say at this point I'm adverse
to using something like MySQL or PostgreSQL for this, but I also
recognize that it might be overkill for many smaller sites.  In that
case, a text-based relational db like I proposed earlier might work
just fine.

> I'm coming at this from a Lazy perspective, but I think the capital L
> is justified.

The lazy is justified if it is being applied to tedious, repetitive,
error-prone tasks, which the computer would be more competant to
mindlessly carry out without complaint :)
-- 
Seeya,
Paul