The Quest for the Perfect Cloud Storage

Fri Dec 18 16:45:01 EST 2009

I'm currently using XENServer and iSCSI.  The iSCSI is setup on a Debian
Linux box running iscsitarget (box is called SAN1) and several NICs.
All drives on SAN1 are configured via RAID and added to an LVM2 pool.
The pool is carved up and exported via iSCSI to form various SRs for XEN
as necessary.  If the XEN servers are in a resource pool and the SR is
attached to the pool (not the host), then live migration and similar
seem to work fine.

I'm currently running three XEN pools, several SRs, some storage on SAN1
exported as NFS for common utilities and scripts (mounted on the XEN
servers directly).  I also have several CIFS mounts to another file
server that hosts various ISO images.  All seems to work fine, including
migration.  I have about 45 VMs used for test beds, low volume internal
web servers, print/file servers, network monitoring, development build
machines, remote desktop boxes, and various other tasks configured right
now.  About 35 of those are currently running (other test beds and
monitors are down right now).

Upside is, it works.  I make sure I have N+1 Xen servers for N load so I
can migrate a server clear to reboot, install hardware, etc.  Live
migration works great, update/upgrade happens, server goes back on line,
we live migrate VMs back.

Down side is the iSCSI box is a potential single point of failure.
Current plan is, RAID storage in external cabinet + cold spare + UPS +
backups will save my butt with some down time on power
supply/motherboard/etc issues.  Medium term plan is to investigate DRBD
and a cluster for the SAN box.  If that pans out I'd be migrating the
cold spare to a hot spare at the cost of additional disk space and a
gain of a potential reduction in down time.  I could also run them off
two different power panels/UPS/breakers (but alas a single feed from the
pole) if I want.  (BTW, anyone using DRBD ???)

Additional issues to watch out for include disk I/O volume (those "write
heavy VMs" you mention).  I'm using 1 gig NICs for the SRs but that
obviously limits me to 1 Gbps throughput per SR.  As a result I tend to
avoid I/O intensive VMs in this configuration (no high volume DB
servers).  I guess you could go to fiber or 10Gbe to help but I'm not
there yet (small company, small budget).  Unfortunately, the DRBD plan
doesn't make the I/O bandwidth look better.  It is a trade off between
I/O bandwidth and hardware failure modes.  In my case, I THINK I may be
OK, that's why I said 'investigate' DRBD.  I may stick with the cold
spare as my VMs are internal use only and mostly test beds.  If you have
"write heavy VMs" you may want/need better than 1 gig NIC connectivity.
 Several 2-3 Gbps SCSI/SAS/SATA drives sucking data over the a couple of
1 Gbps NIC straws can get old fast under heavy load.

Another choice is to do the same thing but use NFS instead of iSCSI
(allowing ZFS underneath).  I did some tests and iSCSI out performed
current NFS solutions (on Debian at least).  The NFS network
chatter/latency costs on 1 Gbps NICS out weighed any potential ZFS disk
advantages for me.  Consequently I opted for iSCSI in my environment,
your mileage or your NICs may vary.

Supposedly, a lot of this gets simpler with those "ridiculous $/GB"
solutions. Especially if they have lots of built in high speed NICs,
slots for several hundred drives, built in management and monitoring
software, and redundant hardware.  Alas they DO exceed my capital
budget.  :)

On 12/18/2009 13:53, Alan Johnson wrote:
> So, I'm trying to build clouds these days, and I'm sold on Citrix XenServer
> for all the VM management, but it doesn't provide much in the way of
> storage.  It will let you use many kinds of nice third party options out
> there for your storage, but it can only provide local storage to VMs itself,
> and as such, will not do live migrations without your providing some kind of
> network storage for it to run VMs on.  Some other features are impeded
> without network storage as well, but no need to digress to such specifics
> here.
> 
> Anyway, the typical solution is to pay ridiculous $/GB for some proprietary
> hardware SAN solution to provide a node-redundant network storage.  You pay
> even more to get multilevel storage.  Money aside, I don't like this because
> it introduces new potential bottlenecks that are not present in the
> alternative I am about to describe, and because is it is a big fat waste of
> hardware resources.
> 
> My desired solutions revolves around the questions of why can't storage be
> treated like the rest of the resources in a cloud?  You've already got all
> this redundant hardware providing processor and memory resources to the
> cloud.  Why can't you pool the storage resources of the same physical
> hardware the same way?  So, my idea of the Perfect Cloud Storage would meet
> the following requirements:
> 
>    1. multi-node network storage (SAN?)
>    2. Ideally, n+2 redundant (like RAID6), but n+1 and mirroring are worth
>    considering. In fact mirroring would be a nice option to have for some write
>    heavy VMs.  Even stripping would be useful in some instances.
>    3. The node software would run on the dom0 of XenServer physical nodes
>    giving it direct access to the block devices within.  From what I can tell,
>    XenServer is a custom distro of Linux with with RPM/YUM package management.
>    4. Multilevel storage support within the nodes, so for example, a set of
>    256GB SSD, 300GB 10K, 500GB 2.7K, and 750GB 5.4K drives will all be used
>    intelligently without need for human interaction after setup.
>    5. Multilevel storage across nodes would be a neat concept, but some
>    intelligence about load balancing across identical nodes is certainly
>    desired.
> 
> 
> The closest I can come up with so far is to run one FreeBSD VM on each
> physical node, expose the block devices directly to that VM so it can put
> them in a ZFS pool for multilevel functionality (and maybe some local
> redundancy).  This gives me a file server for each node that can provide
> iSCSI targes to the VMs which then can be mounted in any software RAID
> configuration that makes sense for the needs of the VM, mostly RAID6 so that
> if I take a physical node down for maintenance, the network storage persists
> with n+1 redundancy.
> 
> This is not terribly elegant, not as easy to manage as I would like, and
> does not meet all the requirements above, but it does get the major ones.
> The biggest problem is that I don't have any way to testing this wacky idea
> until I order and receive a hardware configuration that depends on it
> working!
> 
> I'm happy to take ideas from the crowd, but I'd be happier to find some
> vendor or consultant with some experience and/or access to testing resources
> who can vet this or some other solution and then stand by it.  Feel free to
> contact me off list if you think you might fill this role.  Dave Clifton,
> I'm thinking of you because Mike Diehn suggested you have a good amount of
> SAN experience.  Bill McGonigle, I'm thinking of you because of your ZFS
> experience.  Ben Scott, I'm thinking of you because you're the man. =)
> 
> Thanks in advance for all the insite I've come to expect from this wonderful
> community!
> 
> _______________
> Alan Johnson
> p: alan at datdec.com
> c: 603-252-8451
> w: alan.johnson at rightthinginc.com
> w: 603-442-5330 x1524
> 
> 
> 
> 
> _______________________________________________
> gnhlug-discuss mailing list
> gnhlug-discuss at mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/