GFS and SANs
Jared Watkins
jared at watkins.net
Wed Aug 4 22:52:01 EDT 2004
Jeff Macdonald wrote:
>Hi,
>And now for something Linux related. Earlier this year Redhat released
>GFS as GPL'd stuff. I understand that GFS is a distributed file system
>with redundancy and all that. What I don't understand is what is meant
>by SAN. I believe it stands for Storage Area Network. In some
>documentation I've read it seems that a SAN is a box with disks and
>high speed connectors to those disks. In some cases it seems to be a
>collection of machines on a common high speed network that have disks
>that look like a single entity. Come someone help explain what GFS is
>and what is meant by SAN?
>
>
>
Depending on who you ask.. you are bound to get all different answers to
this question. Before I tell you what I think a SAN is.. I'll tell you
a bit about where I'm coming from. At my current company I was tasked
with designing.. testing.. and implementing a mid sized SAN. As this
was a high budget item we needed to get it right the first time. So..
with all the normal delays of dealing with upper management types and
multiple vendors... I spent the better part of two years evaluating
(bake off style mostly) about a dozen storage vendors and three software
SAN virtualization systems. This included two actual complete setups
for testing... all hardware in the same room... set it up.. try to break
it and see what happens sort of testing. I'm now about 6 months post
install and managing the daily maintenance and growth of the SAN.
The simplest definition of a SAN... is that you have disk arrays
connected in some sort of network.. loops in the 'old' days and
point-to-point fabrics more recently. This network can use copper (old)
or optical cables. Optical is less error prone... and right now runs
at either 1Gb or 2Gb with 10Gb coming. There are usually switches (or
hubs).. where you plug in your storage and any servers that need access
to that storage. The idea is that you make raid sets... and divide them
into scsi luns which are presented out to the fabric for systems to
use. You have issues of lun masking to deal with.. so servers only have
access to the luns they 'own' and have permission to access.
If GFS were used in a SAN environment... you would assign the same LUN
to multiple machines... and GFS would prevent them from stepping on each
other as they do IO to the same shared disk.
It gets more complicated than that of course.. but that's the basic
idea... most of the mid+ level storage arrays have advanced features
like snapshots.. cross cabinet mirroring... long distance replication
either over FC or IP. One key difference between FC and IP networks is
that FC is not routable... it is simply a collection of point to point
connections.
One common problem you run into when dealing with this stuff.. is that
each vendor tries to lock you into using only their storage. They all
have features and ways of accomplishing the same functional tasks that
will not interoperate with other hardware. So what my company pursued
is a way around this with software.. The system I just deployed uses a
load balanced pair of massive linux boxes that sit inband.. in the
middle between the storage and the servers. From this vantage point..
they are able to see and control access to all the storage.. and
abstract the access to it. With this setup... storage is storage... the
servers only know what these datamanagers show it. The backend storage
does not need any special (read expensive) software features.. it only
needs to present its disks/arrays out to the management boxes. That
buys you vendor independence.. and a richer feature set than any single
storage box can offer. Downtime due to disk upgrades/growth is
eliminated completely.. and you can build a fully redundant.. no SPF
system that even includes long distance.. block level replication over a
network.
That's just my take on what a SAN is.. and should be... but there are
lots of simpler setups that can be called a SAN.
Jared
More information about the gnhlug-discuss
mailing list